
Per-Client AI Agents for Agencies: LoRA + Tool Calling Playbook
Every agency client gets the same GPT wrapper. That's the problem. With per-client LoRA adapters and custom tool schemas, you can deliver AI agents that know each client's CRM, workflows, and terminology — at 50-200MB per client. Here's the full playbook.
The AI agency market in 2026 has a differentiation problem. Nine out of ten agencies sell the same thing: a GPT-4 wrapper connected to the client's tools via Zapier or Make.com. The client gets a chatbot that sort of works, breaks on edge cases, and costs $0.03 per query in API fees that somebody has to absorb.
The clients know this. They have talked to three agencies and gotten three identical pitches. Price becomes the only differentiator, and price competition kills margins.
Here is the alternative: per-client AI agents built on a shared base model with individual LoRA adapters. Each client's agent knows THEIR tools, THEIR workflows, THEIR terminology. Not generic. Not a wrapper. A model that was trained on their data and their tool schemas.
This is how you charge $3K-8K for setup instead of $500. And how you keep clients on $500-2K/month retainers instead of watching them churn after 3 months.
The Architecture: Shared Base + Per-Client LoRA
The core idea is simple:
Base Model (Qwen 2.5 7B or Llama 3.3 8B)
├── Client A LoRA adapter (HubSpot tools + e-commerce workflow)
├── Client B LoRA adapter (Salesforce tools + SaaS onboarding workflow)
├── Client C LoRA adapter (Pipedrive tools + consulting intake workflow)
├── Client D LoRA adapter (custom CRM API + logistics workflow)
└── Client E LoRA adapter (HubSpot tools + real estate workflow)
One base model. Five adapters. Each adapter is 50-200MB depending on rank and quantization. The base model is ~4GB (Q4 quantized). Total storage for 5 clients: 4GB + 0.25-1GB = under 5GB.
At inference time, you load the base model once and hot-swap the LoRA adapter per request. Adapter swap takes 50-200ms — invisible to the end user.
What Makes Each Client's Agent Different
Tool Schemas
Client A uses HubSpot. Client B uses Salesforce. Client C uses Pipedrive. The function signatures are completely different:
Client A (HubSpot):
{
"name": "create_deal",
"params": {"dealname": "string", "pipeline": "string", "dealstage": "string", "amount": "number"}
}
Client B (Salesforce):
{
"name": "create_opportunity",
"params": {"Name": "string", "StageName": "string", "CloseDate": "date", "Amount": "number"}
}
Same business intent (create a sales deal), completely different schemas. A generic model guesses at parameter names and gets them wrong 20-30% of the time. A fine-tuned adapter gets them right 95%+ because it has seen hundreds of examples of YOUR client's exact schema.
Workflow Patterns
Client A's sales process: Lead → Qualification Call → Proposal → Negotiation → Closed. Client C's consulting intake: Inquiry → Needs Assessment → SOW Draft → Contract → Kickoff. The agent needs to know which step comes next, what data to collect at each stage, and when to escalate to a human.
Generic models have no idea about these flows. Fine-tuned adapters do, because you trained them on the client's actual workflow data.
Terminology and Tone
Client A calls their customers "accounts." Client D calls them "shippers." Client A wants formal communication. Client C wants casual. The adapter absorbs these nuances from the training data without explicit rules.
Example: 5 Agency Clients
Here is what a real 5-client deployment looks like:
| Client | CRM | Key Tools | Adapter Size | Training Examples | Monthly Queries |
|---|---|---|---|---|---|
| Acme E-commerce | HubSpot | create_deal, update_contact, log_activity, check_inventory | 85MB | 450 | 3,200 |
| CloudStack SaaS | Salesforce | create_opportunity, update_case, assign_task, check_usage | 120MB | 620 | 5,100 |
| Summit Consulting | Pipedrive | create_deal, add_note, schedule_meeting, generate_sow | 75MB | 380 | 1,800 |
| FastFreight Logistics | Custom API | create_shipment, track_package, update_route, notify_customer | 140MB | 550 | 8,400 |
| Metro Realty | HubSpot | create_deal, schedule_showing, update_listing, send_followup | 90MB | 410 | 2,600 |
Total adapter storage: 510MB. Total base model: 4.2GB. Everything runs on a single server with 16GB RAM.
Building the Training Data
For each client, you need 300-700 training examples that cover:
1. Tool Selection (40% of examples)
User message paired with the correct tool call:
{
"messages": [
{"role": "system", "content": "You are Acme's sales assistant. Available tools: [create_deal, update_contact, log_activity, check_inventory]"},
{"role": "user", "content": "New deal from Johnson Corp, $45K, they're in the proposal stage"},
{"role": "assistant", "content": null, "tool_calls": [{"function": {"name": "create_deal", "arguments": "{\"dealname\": \"Johnson Corp\", \"pipeline\": \"default\", \"dealstage\": \"proposalmaker\", \"amount\": 45000}"}}]}
]
}
2. Multi-Tool Sequences (25% of examples)
Workflows that require 2-3 tool calls in sequence:
"Log a call with Johnson Corp — we discussed the proposal, they want a revised quote by Friday, and move the deal to negotiation stage."
→ log_activity (call notes) → update_contact (next follow-up: Friday) → create_deal (update stage to negotiation)
3. Clarification and Refusal (20% of examples)
When the user's request is ambiguous or outside scope:
"Delete all the old leads" → "I can help clean up leads, but I need to confirm: should I archive leads older than 90 days with no activity, or do you have different criteria?"
4. Error Handling (15% of examples)
When a tool call fails and the model needs to recover:
Tool result: {"error": "deal_stage 'proposal' not found. Valid stages: proposalmaker, decisionmaker, closedwon, closedlost"}
→ Retry with corrected stage name
Where the Data Comes From
The best source: the client's existing chat logs, support tickets, and CRM activity history. Export 6 months of data, filter for the workflows you are automating, and format into training pairs. For new clients without history, build synthetic examples based on their tool schemas and workflow documentation — synthetic data generation covers this in detail.
The Fine-Tuning Process
Per client, fine-tuning takes:
- Data prep: 2-4 hours (mostly formatting and deduplication)
- Fine-tuning: 20-40 minutes on a single GPU (LoRA rank 16, 3 epochs)
- Evaluation: 1-2 hours (run test suite, check accuracy by tool and workflow)
- Total: Half a day per client
With Ertas, the workflow is: upload formatted dataset, select base model, configure LoRA parameters, click train. No ML infrastructure to manage. No CUDA debugging.
Pricing Model
This architecture supports premium pricing because the deliverable is genuinely custom:
Setup Fee: $3,000 - $8,000
Covers:
- Discovery (map client's tools, workflows, terminology) — 4-8 hours
- Data collection and formatting — 4-8 hours
- Fine-tuning and evaluation — 4-6 hours
- Integration and testing — 4-8 hours
- Total agency labor: 16-30 hours at $150-250/hr
The client gets an agent that demonstrably knows their tools and workflows. You show them side-by-side: generic GPT vs their fine-tuned agent on 10 real requests. The difference sells itself.
Monthly Retainer: $500 - $2,000
Covers:
- Hosting and inference ($50-150 actual cost for shared infrastructure)
- Monitoring and maintenance (2-4 hours/month)
- Monthly retraining on new data (1-2 hours/month)
- Performance reporting
Margin Math
| Item | Revenue | Cost | Margin |
|---|---|---|---|
| Setup (per client) | $5,000 | $2,000 (labor) | $3,000 |
| Monthly retainer (per client) | $1,000 | $300 (infra + labor) | $700 |
| Year 1 per client | $17,000 | $5,600 | $11,400 (67%) |
Compare this to reselling GPT-4 API access where your margin is the markup on API costs — which clients eventually discover and cut you out.
Differentiation: Why This Beats GPT Wrappers
When you pitch "we build custom AI agents," every agency says the same thing. Here is how per-client LoRA changes the conversation:
Demo 1: Tool accuracy. Show the client 10 tool calls. Your agent gets 9-10 right. The GPT wrapper gets 7-8 right (and 2-3 of those need parameter corrections).
Demo 2: Workflow knowledge. Ask both agents "what's the next step for this deal?" Your agent knows the client's specific pipeline stages. The GPT wrapper gives a generic answer.
Demo 3: Terminology. Use the client's jargon in a request. Your agent responds naturally. The GPT wrapper asks for clarification or misinterprets.
Demo 4: Cost projection. Show the client: "At your query volume, GPT-4 API costs $X/month and that goes up as you scale. Our agent runs on fixed infrastructure — $Y/month whether you send 1,000 or 10,000 queries."
Storage and Infrastructure
Per-Client Storage
- LoRA adapter (rank 16, Q4): 50-100MB
- LoRA adapter (rank 32, Q4): 100-200MB
- Training data archive: 10-50MB
- Evaluation results and logs: 5-10MB
Total per client: 65-360MB. Call it 200MB average.
Scaling the Infrastructure
- 5 clients: Single server, 16GB RAM, 1 GPU. All adapters in memory. ~$150/month cloud or $3K one-time hardware.
- 20 clients: Single server, 32GB RAM, 1 GPU. Hot-swap adapters. 20 x 200MB = 4GB adapter storage. ~$300/month cloud.
- 50+ clients: Two servers for redundancy. Load balancer routes by client. ~$600/month cloud.
The base model loads once. Adapter swap is near-instant. You do not need 50 separate model instances — you need one model and 50 small adapter files.
Scaling Playbook: From First Client to Productized
Phase 1: First 3 Clients (Manual)
Everything is bespoke. You sit with each client, map their workflows by hand, build training data manually, and fine-tune individually. This is where you learn what works and build your templates.
Revenue target: $15K-24K setup + $1.5K-6K/month recurring.
Phase 2: Clients 4-10 (Templated)
You have seen enough patterns to create templates. "CRM agent" template covers HubSpot, Salesforce, and Pipedrive with pre-built tool schemas. Client onboarding drops from 30 hours to 12 hours. You create an intake questionnaire that captures 80% of what you need.
Revenue target: $30K-60K setup + $4K-16K/month recurring.
Phase 3: Clients 10+ (Productized)
Build a self-serve portal. Client connects their CRM, uploads sample interactions, selects their workflow type. The system generates training data from templates, fine-tunes automatically, and deploys the adapter. You review quality before going live.
Setup fee drops to $1K-3K (mostly automated). Monthly retainer stays at $500-1K. Volume makes up for lower per-client revenue.
Revenue target: $20K-60K setup + $10K-30K/month recurring.
The Moat
By Phase 3, you have something no GPT-wrapper agency has: a library of domain-specific training templates, a deployment pipeline that takes days instead of weeks, and per-client adapters that your competitors cannot replicate by signing up for an API key.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Further Reading
- LoRA Adapters Per Law Firm: One Model, Many Clients — a vertical-specific example of the per-client adapter architecture
- White-Label AI Platform for Agencies — how to package per-client agents under your agency's brand
- AI Agency Differentiation: Beyond the GPT Wrapper — strategic positioning for agencies that build real AI products
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

OpenClaw for Agencies: Per-Client AI Agents Without the API Bill
AI agencies are adopting OpenClaw for client work, but cloud API costs scale per client. Here's how to deploy per-client agents using fine-tuned local models with LoRA adapters.

White-Label AI: Build Custom Models for Every Client
How AI agencies can use fine-tuned LoRA adapters to deliver white-label AI solutions — one base model, dozens of client-specific adapters, premium pricing.

Multi-Tenant AI Deployment: One Base Model, Dozens of Client Adapters
How AI agencies can serve dozens of clients from a single base model using LoRA adapter hot-swapping — the architecture behind scalable, cost-effective multi-tenant AI.