Per-Client AI Agents for Agencies: LoRA + Tool Calling Playbook

The AI agency market in 2026 has a differentiation problem. Nine out of ten agencies sell the same thing: a GPT-4 wrapper connected to the client's tools via Zapier or Make.com. The client gets a chatbot that sort of works, breaks on edge cases, and costs $0.03 per query in API fees that somebody has to absorb.

The clients know this. They have talked to three agencies and gotten three identical pitches. Price becomes the only differentiator, and price competition kills margins.

Here is the alternative: per-client AI agents built on a shared base model with individual LoRA adapters. Each client's agent knows THEIR tools, THEIR workflows, THEIR terminology. Not generic. Not a wrapper. A model that was trained on their data and their tool schemas.

This is how you charge $3K-8K for setup instead of $500. And how you keep clients on $500-2K/month retainers instead of watching them churn after 3 months.

The Architecture: Shared Base + Per-Client LoRA

The core idea is simple:

Base Model (Qwen 2.5 7B or Llama 3.3 8B)
  ├── Client A LoRA adapter (HubSpot tools + e-commerce workflow)
  ├── Client B LoRA adapter (Salesforce tools + SaaS onboarding workflow)
  ├── Client C LoRA adapter (Pipedrive tools + consulting intake workflow)
  ├── Client D LoRA adapter (custom CRM API + logistics workflow)
  └── Client E LoRA adapter (HubSpot tools + real estate workflow)

One base model. Five adapters. Each adapter is 50-200MB depending on rank and quantization. The base model is ~4GB (Q4 quantized). Total storage for 5 clients: 4GB + 0.25-1GB = under 5GB.

At inference time, you load the base model once and hot-swap the LoRA adapter per request. Adapter swap takes 50-200ms — invisible to the end user.

What Makes Each Client's Agent Different

Tool Schemas

Client A uses HubSpot. Client B uses Salesforce. Client C uses Pipedrive. The function signatures are completely different:

Client A (HubSpot):

{
  "name": "create_deal",
  "params": {"dealname": "string", "pipeline": "string", "dealstage": "string", "amount": "number"}
}

Client B (Salesforce):

{
  "name": "create_opportunity",
  "params": {"Name": "string", "StageName": "string", "CloseDate": "date", "Amount": "number"}
}

Same business intent (create a sales deal), completely different schemas. A generic model guesses at parameter names and gets them wrong 20-30% of the time. A fine-tuned adapter gets them right 95%+ because it has seen hundreds of examples of YOUR client's exact schema.

Workflow Patterns

Client A's sales process: Lead → Qualification Call → Proposal → Negotiation → Closed. Client C's consulting intake: Inquiry → Needs Assessment → SOW Draft → Contract → Kickoff. The agent needs to know which step comes next, what data to collect at each stage, and when to escalate to a human.

Generic models have no idea about these flows. Fine-tuned adapters do, because you trained them on the client's actual workflow data.

Terminology and Tone

Client A calls their customers "accounts." Client D calls them "shippers." Client A wants formal communication. Client C wants casual. The adapter absorbs these nuances from the training data without explicit rules.

Example: 5 Agency Clients

Here is what a real 5-client deployment looks like:

Client	CRM	Key Tools	Adapter Size	Training Examples	Monthly Queries
Acme E-commerce	HubSpot	create_deal, update_contact, log_activity, check_inventory	85MB	450	3,200
CloudStack SaaS	Salesforce	create_opportunity, update_case, assign_task, check_usage	120MB	620	5,100
Summit Consulting	Pipedrive	create_deal, add_note, schedule_meeting, generate_sow	75MB	380	1,800
FastFreight Logistics	Custom API	create_shipment, track_package, update_route, notify_customer	140MB	550	8,400
Metro Realty	HubSpot	create_deal, schedule_showing, update_listing, send_followup	90MB	410	2,600

Total adapter storage: 510MB. Total base model: 4.2GB. Everything runs on a single server with 16GB RAM.

Building the Training Data

For each client, you need 300-700 training examples that cover:

1. Tool Selection (40% of examples)

User message paired with the correct tool call:

{
  "messages": [
    {"role": "system", "content": "You are Acme's sales assistant. Available tools: [create_deal, update_contact, log_activity, check_inventory]"},
    {"role": "user", "content": "New deal from Johnson Corp, $45K, they're in the proposal stage"},
    {"role": "assistant", "content": null, "tool_calls": [{"function": {"name": "create_deal", "arguments": "{\"dealname\": \"Johnson Corp\", \"pipeline\": \"default\", \"dealstage\": \"proposalmaker\", \"amount\": 45000}"}}]}
  ]
}

2. Multi-Tool Sequences (25% of examples)

Workflows that require 2-3 tool calls in sequence:

"Log a call with Johnson Corp — we discussed the proposal, they want a revised quote by Friday, and move the deal to negotiation stage."
→ log_activity (call notes) → update_contact (next follow-up: Friday) → create_deal (update stage to negotiation)

3. Clarification and Refusal (20% of examples)

When the user's request is ambiguous or outside scope:

"Delete all the old leads" → "I can help clean up leads, but I need to confirm: should I archive leads older than 90 days with no activity, or do you have different criteria?"

4. Error Handling (15% of examples)

When a tool call fails and the model needs to recover:

Tool result: {"error": "deal_stage 'proposal' not found. Valid stages: proposalmaker, decisionmaker, closedwon, closedlost"}
→ Retry with corrected stage name

Where the Data Comes From

The best source: the client's existing chat logs, support tickets, and CRM activity history. Export 6 months of data, filter for the workflows you are automating, and format into training pairs. For new clients without history, build synthetic examples based on their tool schemas and workflow documentation — synthetic data generation covers this in detail.

The Fine-Tuning Process

Per client, fine-tuning takes:

Data prep: 2-4 hours (mostly formatting and deduplication)
Fine-tuning: 20-40 minutes on a single GPU (LoRA rank 16, 3 epochs)
Evaluation: 1-2 hours (run test suite, check accuracy by tool and workflow)
Total: Half a day per client

With Ertas, the workflow is: upload formatted dataset, select base model, configure LoRA parameters, click train. No ML infrastructure to manage. No CUDA debugging.

Pricing Model

This architecture supports premium pricing because the deliverable is genuinely custom:

Setup Fee: $3,000 - $8,000

Covers:

Discovery (map client's tools, workflows, terminology) — 4-8 hours
Data collection and formatting — 4-8 hours
Fine-tuning and evaluation — 4-6 hours
Integration and testing — 4-8 hours
Total agency labor: 16-30 hours at $150-250/hr

The client gets an agent that demonstrably knows their tools and workflows. You show them side-by-side: generic GPT vs their fine-tuned agent on 10 real requests. The difference sells itself.

Monthly Retainer: $500 - $2,000

Covers:

Hosting and inference ($50-150 actual cost for shared infrastructure)
Monitoring and maintenance (2-4 hours/month)
Monthly retraining on new data (1-2 hours/month)
Performance reporting

Margin Math

Item	Revenue	Cost	Margin
Setup (per client)	$5,000	$2,000 (labor)	$3,000
Monthly retainer (per client)	$1,000	$300 (infra + labor)	$700
Year 1 per client	$17,000	$5,600	$11,400 (67%)

Compare this to reselling GPT-4 API access where your margin is the markup on API costs — which clients eventually discover and cut you out.

Differentiation: Why This Beats GPT Wrappers

When you pitch "we build custom AI agents," every agency says the same thing. Here is how per-client LoRA changes the conversation:

Demo 1: Tool accuracy. Show the client 10 tool calls. Your agent gets 9-10 right. The GPT wrapper gets 7-8 right (and 2-3 of those need parameter corrections).

Demo 2: Workflow knowledge. Ask both agents "what's the next step for this deal?" Your agent knows the client's specific pipeline stages. The GPT wrapper gives a generic answer.

Demo 3: Terminology. Use the client's jargon in a request. Your agent responds naturally. The GPT wrapper asks for clarification or misinterprets.

Demo 4: Cost projection. Show the client: "At your query volume, GPT-4 API costs $X/month and that goes up as you scale. Our agent runs on fixed infrastructure — $Y/month whether you send 1,000 or 10,000 queries."

Storage and Infrastructure

Per-Client Storage

LoRA adapter (rank 16, Q4): 50-100MB
LoRA adapter (rank 32, Q4): 100-200MB
Training data archive: 10-50MB
Evaluation results and logs: 5-10MB

Total per client: 65-360MB. Call it 200MB average.

Scaling the Infrastructure

5 clients: Single server, 16GB RAM, 1 GPU. All adapters in memory. ~$150/month cloud or $3K one-time hardware.
20 clients: Single server, 32GB RAM, 1 GPU. Hot-swap adapters. 20 x 200MB = 4GB adapter storage. ~$300/month cloud.
50+ clients: Two servers for redundancy. Load balancer routes by client. ~$600/month cloud.

The base model loads once. Adapter swap is near-instant. You do not need 50 separate model instances — you need one model and 50 small adapter files.

Scaling Playbook: From First Client to Productized

Phase 1: First 3 Clients (Manual)

Everything is bespoke. You sit with each client, map their workflows by hand, build training data manually, and fine-tune individually. This is where you learn what works and build your templates.

Revenue target: $15K-24K setup + $1.5K-6K/month recurring.

Phase 2: Clients 4-10 (Templated)

You have seen enough patterns to create templates. "CRM agent" template covers HubSpot, Salesforce, and Pipedrive with pre-built tool schemas. Client onboarding drops from 30 hours to 12 hours. You create an intake questionnaire that captures 80% of what you need.

Revenue target: $30K-60K setup + $4K-16K/month recurring.

Phase 3: Clients 10+ (Productized)

Build a self-serve portal. Client connects their CRM, uploads sample interactions, selects their workflow type. The system generates training data from templates, fine-tunes automatically, and deploys the adapter. You review quality before going live.

Setup fee drops to $1K-3K (mostly automated). Monthly retainer stays at $500-1K. Volume makes up for lower per-client revenue.

Revenue target: $20K-60K setup + $10K-30K/month recurring.

The Moat

By Phase 3, you have something no GPT-wrapper agency has: a library of domain-specific training templates, a deployment pipeline that takes days instead of weeks, and per-client adapters that your competitors cannot replicate by signing up for an API key.

Ship AI that runs on your users' devices.

Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →