Back to blog
    OpenClaw for Agencies: Per-Client AI Agents Without the API Bill
    openclawagencyloracost-reductionlocal-inferencesegment:agency

    OpenClaw for Agencies: Per-Client AI Agents Without the API Bill

    AI agencies are adopting OpenClaw for client work, but cloud API costs scale per client. Here's how to deploy per-client agents using fine-tuned local models with LoRA adapters.

    EErtas Team·

    OpenClaw is the most exciting tool to hit AI agencies in months. A fully autonomous agent that connects through WhatsApp, Telegram, Slack, and Discord — the channels your clients already use. It can monitor inboxes, generate reports, manage files, control a browser, and respond to natural language instructions. For agencies building chatbots, voice agents, and automation workflows, OpenClaw is a natural fit.

    But there is a familiar problem lurking underneath.

    OpenClaw routes inference through cloud APIs by default. That means every client interaction, every email it triages, every report it generates, every message it sends — all of it generates per-token API charges. And for agencies managing multiple clients, those charges add up fast.

    You already know this story. It is the same margin problem you have with every API-dependent tool. OpenClaw just makes it worse because it is so capable — which means it processes more tokens per interaction than a simple chatbot.

    The Agency Cost Problem with OpenClaw

    Let's look at real numbers for a typical agency deployment:

    Per-Client OpenClaw Cost on Cloud APIs

    Client TypeMonthly InteractionsAvg Tokens/InteractionMonthly TokensCost (GPT-4o)
    E-commerce support3,000 conversations2,5007.5MAU$225
    Real estate agent1,500 conversations3,0004.5MAU$135
    Marketing reporting500 reports8,0004MAU$120
    Email triage2,000 emails1,5003MAU$90

    A single active client costs AU$90-225/month in API pass-through. Across 10-15 clients, you are looking at AU$1,500-3,000/month — and that assumes moderate usage. A viral moment for one client's chatbot can spike costs unpredictably.

    Meanwhile, your retainer is fixed. Your margin shrinks with every token.

    The Per-Client LoRA Adapter Model

    Here is the approach that eliminates the API cost entirely while delivering better results per client:

    One base model. Per-client LoRA adapters. Local inference.

    How It Works

    1. Choose a single base model that handles agent tasks well — Llama 3.3 8B or Qwen 2.5 7B for most workloads. Download it once.

    2. Fine-tune a LoRA adapter for each client. Each adapter is trained on that client's specific data: their conversation history, product catalogue, FAQ corpus, brand voice, and domain terminology. An adapter is 50-200MB — lightweight enough to store dozens on a single machine.

    3. Deploy via Ollama. Run the base model on a Mac Studio, Mac Mini M4 Pro, or GPU server. Load the appropriate client adapter at inference time. Ollama handles adapter switching seamlessly.

    4. Point each client's OpenClaw instance at the local Ollama endpoint with their specific model.

    The Cost Comparison

    Cloud API (15 clients)Local Fine-Tuned (15 clients)
    Monthly API costAU$2,250-3,375AU$0
    HardwareNoneAU$2,500-4,000 (one-time)
    Per-client marginal costAU$150-225/mo~AU$0
    12-month totalAU$27,000-40,500AU$3,500
    Break-even~1-2 months

    After the hardware pays for itself (usually within 4-6 weeks), every client's OpenClaw agent runs for free. Your margin on each client is the full retainer minus negligible electricity costs.

    Why Fine-Tuned Models Outperform Generic APIs for Agency Work

    The performance argument is just as strong as the cost argument. Here is why:

    1. Client-Specific Accuracy

    A generic GPT-4o processes your client's support query using its general training data. A fine-tuned model processes it using knowledge of that client's products, policies, and communication style.

    Example: A dental practice client receives an inquiry about "composite bonding pricing." GPT-4o gives a generic dental answer. A model fine-tuned on that practice's actual pricing, service descriptions, and patient communication style gives the correct, specific answer — because it has seen hundreds of similar interactions during training.

    2. Tone and Brand Consistency

    Every client has a different voice. A real estate agency uses different language than a SaaS startup. Fine-tuning captures these nuances automatically — the model absorbs the client's writing style from the training data. No more lengthy system prompts trying to coerce a generic model into matching a brand voice.

    3. Reduced Hallucination

    Fine-tuned models hallucinate less on domain-specific questions because the answers are in their weights, not approximated from a general prompt. When a fine-tuned model does not know something, it tends to say so rather than fabricating plausible-sounding but incorrect answers.

    4. Consistent Output Format

    If your client's OpenClaw agent needs to generate reports in a specific format, classify tickets into specific categories, or extract data into specific schemas — fine-tuning enforces this consistency far more reliably than prompt engineering.

    Building the Per-Client Pipeline

    Here is the workflow for onboarding a new agency client onto OpenClaw with a fine-tuned model:

    Week 1: Data Collection

    Export the client's existing interaction data:

    • Chatbot conversation logs (if migrating from an existing bot)
    • Email threads (for email triage use cases)
    • Report templates and examples (for reporting use cases)
    • FAQ documents and knowledge base articles

    Format as JSONL: instruction/context/response triples. Aim for 500-2,000 high-quality examples.

    Week 2: Fine-Tuning

    Upload the dataset to Ertas Studio. Select your agency's standard base model. Configure a LoRA fine-tuning run — rank 16, 3 epochs is a solid starting point. Training typically takes 30-90 minutes depending on dataset size.

    Evaluate the trained model against a held-out test set. If accuracy is below your threshold, iterate — add more examples, clean up noisy data, adjust hyperparameters.

    Export as GGUF.

    Week 3: Deployment and Testing

    Deploy the GGUF model to Ollama on your infrastructure. Configure the client's OpenClaw instance to point to the local endpoint. Run parallel testing — route real interactions to both the new fine-tuned model and the existing cloud API, compare quality.

    Week 4: Cutover

    Switch the client to the local model. Monitor for quality regressions. Collect interactions that the model handles poorly for the next fine-tuning iteration.

    Scaling the Model

    The per-client LoRA architecture scales linearly with minimal overhead:

    • 5 clients: One Mac Mini M4 Pro handles all inference comfortably
    • 15 clients: Mac Studio or a single RTX 4090 server with adapter hot-swapping
    • 50+ clients: Two servers with load balancing, or Ertas Cloud for managed multi-tenant deployment

    Each new client is an incremental LoRA adapter — 50-200MB of storage and a fine-tuning run. Not another API subscription, not another line item on the P&L, not another variable cost that erodes margin.

    Data Isolation and Privacy

    Running per-client models locally solves the data privacy problem that enterprise clients increasingly raise:

    • Client data never leaves your infrastructure. No third-party API sees the client's emails, customer data, or business information.
    • Per-client adapter isolation. Each client's fine-tuned knowledge is stored in a separate adapter file. No cross-contamination between clients.
    • Audit trail. You control the logs. You can tell clients exactly where their data is processed and stored.
    • Compliance-ready. Meets GDPR, Australian Privacy Act, and most enterprise data sovereignty requirements without additional configuration.

    When an enterprise client asks "where does our data go?" — you can answer "nowhere. It stays on our infrastructure" and mean it.

    Ship AI that runs on your users' devices.

    Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    The Competitive Moat

    Here is the strategic advantage no one talks about: agencies running per-client fine-tuned models on OpenClaw have a moat that API resellers do not.

    When your competitor deploys OpenClaw for a client using GPT-4o, the client can eventually realise they could run OpenClaw themselves with the same API. There is no switching cost, no proprietary value.

    When you deploy a fine-tuned model for a client, the model is the moat. It contains months of domain knowledge, tone calibration, and performance optimisation. The client cannot replicate it by signing up for an API key. Your expertise in fine-tuning, evaluating, and iterating on the model is the value — not the API pass-through.

    That is a business worth building.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading