
90% Gross Margin AI Services: The Agency Model That Beats SaaS Economics
Most AI agencies run 50-60% gross margins because they're reselling API calls. Agencies using fine-tuned models on owned infrastructure hit 90%+ margins. Here's how the economics work.
Traditional consulting firms run 70-80% gross margins. SaaS companies target 80-85%. Most AI agencies? They're stuck at 50-60% -- and the reason is embarrassingly simple: they're reselling someone else's API calls and calling it a service.
Every client interaction that hits GPT-4o or Claude 3.5 Sonnet generates a variable cost. Every support ticket answered, every document summarized, every lead scored -- it all shows up on your OpenAI invoice as COGS. The more successful your deployments are, the more they cost you. That is the opposite of how a healthy service business should work.
There is another model. Agencies that fine-tune per-client models on owned or rented infrastructure are running 88-92% gross margins consistently. The math is not complicated, but it requires rethinking what you are actually selling.
The Margin Problem: Why API Reselling Kills Your Economics
Let's start with what most agencies are doing today. You sign a client for $1,500/month to manage their AI chatbot. You deploy it on GPT-4o because it's the easiest path to production. The client's chatbot handles 3,000 conversations per month, averaging 800 tokens per interaction.
Your API cost for that single client: roughly $180-320/month depending on how many retries, context window expansions, and edge cases come through. That's 12-21% of revenue gone to a single line item you cannot negotiate or optimize.
Now multiply that across your client roster.
The API Margin Math at Scale
| Clients | Avg Monthly Revenue | Avg API Cost/Client | Total API COGS | Gross Margin |
|---|---|---|---|---|
| 5 | $7,500 | $280 | $1,400 | 81% |
| 10 | $15,000 | $280 | $2,800 | 81% |
| 15 | $22,500 | $280 | $4,200 | 81% |
| 25 | $37,500 | $280 | $7,000 | 81% |
At first glance, 81% looks decent. But $280/month is an average -- your high-volume clients are burning $400-600/month in API costs. And those numbers assume no growth in usage. When a client's chatbot goes from 3,000 to 8,000 conversations per month because it's actually working, your API bill scales linearly while your retainer stays flat.
The real-world gross margin for established agencies with growing clients tends to land at 55-65% after accounting for usage growth, rate limit workarounds, and the occasional client who decides to run batch processing through your chatbot endpoint.
There is also a structural problem: you do not control your largest cost input. OpenAI can raise prices, deprecate models, or change rate limits at any time. Your margin is someone else's pricing decision.
The Fine-Tuned Model Shift
Here is the alternative architecture: instead of routing every client request through a cloud API, you fine-tune per-client LoRA adapters on a base model and deploy them on infrastructure you control.
A LoRA adapter is a lightweight layer (typically 50-200MB) that modifies a base model's behavior for a specific client's domain. One base model -- say Llama 3.3 8B or Qwen 2.5 7B -- serves as the foundation. Each client gets their own adapter trained on their data: support tickets, product documentation, sales conversations, whatever the use case requires.
The inference runs on a VPS with a GPU or on dedicated hardware. There are no per-token charges. Your cost is fixed regardless of how many requests a client sends.
The New Cost Structure
| Cost Item | Monthly Cost |
|---|---|
| Ertas Agency Pro plan | $14.50/seat |
| VPS with GPU (e.g., Hetzner, Lambda) | $50-80/mo |
| Total infrastructure (3-person team) | $93.50-123.50 |
That is the total cost to serve all your clients. Five clients or twenty-five clients -- the infrastructure cost stays roughly the same until you saturate the GPU, at which point you add a second VPS for another $50-80/month.
Margin Comparison: API vs Fine-Tuned
| Clients | API Model Revenue | API COGS | API Margin | Fine-Tuned Revenue | Fine-Tuned COGS | Fine-Tuned Margin |
|---|---|---|---|---|---|---|
| 5 | $7,500 | $1,400 | 81% | $7,500 | $94 | 98.7% |
| 10 | $15,000 | $2,800 | 81% | $15,000 | $94 | 99.4% |
| 15 | $22,500 | $4,200 | 81% | $22,500 | $94 | 99.6% |
| 25 | $37,500 | $7,000 | 81% | $37,500 | $144 | 99.6% |
Even if we conservatively call it 90% gross margin after factoring in occasional retraining compute, electricity, and bandwidth -- you are still running margins that most SaaS companies would envy. And unlike SaaS, you are not building and maintaining a product. You are deploying and managing models.
Why 90% Margins Are Structurally Sustainable
The key insight is that fine-tuned model costs are fixed, not variable. Adding a new client does not meaningfully increase your infrastructure spend. A LoRA adapter swap takes milliseconds. A single 7B parameter model running on an RTX 4090 or A10G handles 30-60 requests per second depending on context length -- more than enough for most agency workloads.
This creates a flywheel:
- Fixed costs don't scale with clients. Your 15th client costs you almost nothing to serve.
- Models improve with data. Each retraining cycle makes the client's model better, which increases perceived value, which reduces churn.
- Switching costs are high. A client whose AI is trained on their specific data, terminology, and processes is not going to switch to a competitor running generic GPT-4o.
- Usage growth is free. When a client doubles their chatbot traffic, your cost stays the same. Their satisfaction goes up because the model handles it without degradation.
Compare this to the API model, where client success directly erodes your margins.
Service Tiers That Protect Margins
The 90% margin only works if you price correctly. Here is a tiering structure that aligns incentives:
Setup Fee: $2,000-5,000 (One-Time)
This covers the initial data collection, cleaning, fine-tuning, evaluation, and deployment. It should be profitable on its own -- do not subsidize setup to win the retainer. The setup fee establishes the value of the custom model and covers your time investment.
Deliverables: cleaned training dataset, fine-tuned adapter, evaluation benchmarks, deployed API endpoint, documentation.
Monthly Retainer: $500-2,000/month
This is where your margins live. The retainer covers:
- Model monitoring and quality sampling (2-4 hours/month)
- Monthly performance reports to the client
- Minor prompt and system prompt adjustments
- Infrastructure upkeep and uptime guarantees
- Priority support for production issues
At $1,000/month with $6/month in marginal infrastructure cost per client, you are at 99.4% gross margin on the retainer. Even after allocating 4 hours of labor at $50/hour internal cost, you are still at 79.4% -- well above the API model.
Quarterly Model Refresh: $500-1,500
Every 90 days, retrain the adapter on new data. This is a separate line item because it involves real work: data collection, cleaning, fine-tuning, evaluation. But the compute cost on Ertas is negligible -- the value is in your expertise, not the GPU time.
Quarterly refreshes also serve as churn prevention. Each refresh makes the model more accurate, which makes the client more dependent on your service. A model trained on 18 months of cumulative data is substantially better than one trained on the initial dataset alone.
Reinvesting the Margin Advantage
The real power of 90% margins is what you can do with the excess. At 60% margins, most of your revenue goes to covering costs and paying salaries. At 90% margins, you have 30 percentage points of additional gross profit to deploy.
Smart agencies reinvest in three areas:
Client acquisition. You can afford to spend more to acquire a client because each client is worth more over their lifetime. If your LTV is $24,000 (2 years × $1,000/month) at 90% gross margin, you can spend $3,000-5,000 on acquisition and still have excellent unit economics.
Talent. Higher margins let you hire better people and pay them well, which improves service quality, which reduces churn, which improves LTV. This is the virtuous cycle that API-dependent agencies cannot access.
R&D. Experiment with new model architectures, build internal tooling, develop proprietary evaluation frameworks. These compound over time and create defensibility that "we use GPT-4o" never will.
The Transition Path
If you are currently running an API-dependent agency, here is the migration order:
- Identify your simplest client workload. Pick a client with a straightforward classification or Q&A task.
- Fine-tune a LoRA adapter on their data. Use Ertas to go from raw data to deployed model without writing training scripts.
- Run both systems in parallel for 30 days. Compare quality, latency, and cost side by side.
- Cut over and measure. Track the cost difference for one billing cycle.
- Repeat for the next client. Each migration gets faster because you are reusing the same base model and infrastructure.
Most agencies complete the first migration in under a week. By the fifth client, the process is down to a day or two including data cleaning.
The Bottom Line
The difference between a 60% margin agency and a 90% margin agency is not revenue -- it is cost structure. Both can charge the same rates. Both can serve the same clients. But the agency running fine-tuned models on fixed-cost infrastructure keeps an extra $0.30 of every dollar earned.
Over 12 months with 15 clients at $1,500/month average retainer, that margin difference is worth roughly $48,600 in additional gross profit. That is a second full-time hire, or an aggressive marketing budget, or a six-month runway extension.
The agencies building on fine-tuned models are not doing anything exotic. They are doing the same work -- deploying AI solutions for business clients -- with a fundamentally better cost structure. The API-dependent agencies will either adopt this model or watch their margins compress as competition increases and clients become more price-sensitive.
The math is clear. The tooling exists. The only question is whether you make the switch now or later.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Further Reading
- How to Cut Your AI Agency Costs by 90% with Fine-Tuned Local Models -- The detailed cost breakdown and migration playbook for switching from API to local inference.
- How to Price AI Services When You Fine-Tune Instead of Resell -- Pricing strategies that capture the value of custom models without leaving money on the table.
- Who Is the Ertas Agency Plan For? -- How Ertas Agency Pro supports multi-client model management at fixed cost.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

White-Label AI Agents: How Agencies Ship Custom Models Under Client Brands
Your clients want AI that feels like theirs, not yours. White-label AI agents — custom fine-tuned models deployed under client branding — let agencies deliver differentiated products at scale.

The AI Agency Margin Calculator: API Costs vs Fine-Tuned Economics
Stop guessing your margins. This calculator breaks down exactly what you're spending per client on API calls vs fine-tuned models — and shows the crossover point where fine-tuning pays for itself.

Client-Specific AI Agents as Recurring Revenue: The Agency Pricing Playbook
The most profitable AI agencies don't sell projects — they sell per-client AI agents on monthly retainers. Here's the pricing playbook that turns one-time builds into $2K-10K/month recurring revenue.