Back to blog
    White-Label AI: Build Custom Models for Every Client
    agencywhite-labelloramulti-tenantsegment:agency

    White-Label AI: Build Custom Models for Every Client

    How AI agencies can use fine-tuned LoRA adapters to deliver white-label AI solutions — one base model, dozens of client-specific adapters, premium pricing.

    EErtas Team·

    If your agency is reselling GPT access with a wrapper UI, you already know the problem. Your clients are one Google search away from doing exactly what you do — signing up for ChatGPT, pasting the same prompts, and cutting you out entirely. There is no moat in prompt engineering alone. The pricing is a race to the bottom, and the margins shrink every time OpenAI drops their per-token cost.

    Reselling commodity AI is not a business. It is arbitrage with an expiry date.

    White-Label AI Is the Alternative

    The agencies that will thrive are the ones delivering something clients genuinely cannot replicate on their own: custom models trained on each client's domain data, deployed under their brand, running on infrastructure they control.

    A white-label AI model does not just answer generic questions well. It speaks the client's language. It knows their product catalogue, their internal terminology, their compliance constraints. It produces outputs that feel native to their business — because it was literally trained on their business.

    This is not science fiction. With modern fine-tuning techniques, building client-specific models is now a repeatable, scalable agency workflow.

    How LoRA Adapters Make This Practical

    The key technology enabling white-label AI at agency scale is LoRA (Low-Rank Adaptation). Instead of training a full model for every client — which would be prohibitively expensive in both compute and storage — you train a small adapter that modifies a shared base model's behavior.

    Think of it this way: you maintain one base model (say, Qwen 2.5 7B or Llama 3.3 8B). For each client, you train a LoRA adapter that is typically just 50–200MB in size. That adapter encodes everything specific to that client — their tone, their domain knowledge, their output formatting preferences.

    At inference time, you load the base model once and swap adapters per request. Twenty clients do not mean twenty models. They mean one model and twenty tiny adapter files.

    The Workflow

    Here is how a white-label engagement typically looks:

    1. Collect client data. This might be support transcripts, product documentation, internal knowledge bases, example inputs and desired outputs. The client provides it; you curate it into training-ready format.

    2. Fine-tune a LoRA adapter. Using the curated dataset, you train an adapter on top of your chosen base model. Training a 7B model adapter on 5,000 examples takes roughly 30–60 minutes on a single GPU.

    3. Export to GGUF. Once training is complete, you merge the adapter with the base model and export it in GGUF format — the standard for local and edge deployment.

    4. Deploy. The model can run on the client's own infrastructure via Ollama, on a VPS you manage, or on any platform that supports GGUF. The client gets an API endpoint that is fully compatible with the OpenAI SDK — their existing code just works.

    5. Iterate. As the client provides feedback and new data, you retrain the adapter. The base model stays the same. Turnaround for an updated adapter can be hours, not weeks.

    The Economics

    This is where things get compelling for agency business models.

    Running twenty clients on OpenAI's API at moderate usage (say, 500K tokens/day per client) costs roughly $280/month per client at GPT-4o pricing. That is $5,600/month in API costs alone, before your margin.

    Running twenty clients on a self-hosted base model with LoRA adapters costs the base inference infrastructure (one capable GPU server at $200–400/month) plus adapter storage ($50 total for all twenty adapters). Your total infrastructure cost is under $500/month for all twenty clients.

    The savings are not incremental. They are an order of magnitude. And the margin you recapture becomes your actual product differentiation: you are not reselling someone else's API. You are delivering proprietary models.

    How Ertas Studio Enables This

    Building this workflow from scratch requires stitching together training scripts, dataset pipelines, model registries, and deployment tooling. Ertas is designed to make this a managed experience.

    Per-project workspaces let you isolate each client's data and training runs. Your agency team sees all projects; each client only sees theirs.

    Vault handles client data ingestion and versioning. Upload documents, structured data, or conversation logs. Vault handles preprocessing and ensures data isolation between clients — critical for agencies where client confidentiality is non-negotiable.

    Studio provides a visual pipeline for LoRA training. Configure base model selection, hyperparameters, and evaluation criteria through the UI. Your project managers and junior staff can kick off training runs without writing Python scripts. Experiment tracking shows exactly which adapter version performs best.

    GGUF export is built in. One click to produce a deployment-ready model file, ready for Ollama or any compatible runtime.

    Start Building Your White-Label Practice

    Ertas early-access pricing is locked at $14.50/month — less than the cost of a single client's daily API usage. For agencies building a white-label AI practice, the ROI is measured in days, not months.

    Join the waitlist and start converting commodity AI reselling into a defensible, high-margin service.


    Further Reading

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading