
AI Agency Pricing Strategy: Subscription vs. Per-Token Pass-Through
How to price your AI agency services when the underlying costs are per-token. Compare subscription, per-token pass-through, and hybrid pricing models — and why fine-tuned local models unlock the best option.
If you run an AI agency, you have almost certainly stared at this problem: your clients want predictable monthly invoices, but your costs scale with every token processed through a cloud API. The mismatch between fixed-price expectations and variable-cost infrastructure is the central tension of AI agency economics in 2026.
Get pricing wrong and you either bleed margin on heavy-usage clients or lose deals because your quotes look too expensive. Get it right and you build a compounding, scalable business. This article breaks down the three dominant pricing models, their trade-offs, and why fine-tuned local models fundamentally change the calculus.
The Pricing Dilemma
Traditional software agencies price on time and materials or fixed project fees. Both models work because the marginal cost of running software is near zero — once the code is written, hosting costs are predictable.
AI agencies do not have this luxury. Every inference call to OpenAI, Anthropic, or Google costs real money. A client who sends 10,000 requests per day costs you dramatically more than one who sends 100. Yet both clients expect the same flat monthly rate.
This creates a dangerous dynamic. You either pad your pricing with enough buffer to cover worst-case usage (making you uncompetitive) or you price for average usage and hope no client goes heavy (risking negative margins).
Three Pricing Models Compared
1. Flat Subscription
The client pays a fixed monthly fee for access to your AI-powered product or service. Simple, predictable, and exactly what clients want.
Pros: Easy to sell, predictable revenue, clients love it, high perceived value.
Cons: You absorb all usage variance. A single high-usage client can destroy your margin for the month. Requires accurate usage forecasting, which is nearly impossible for new products.
Typical margin risk: If you price for median usage, roughly 20% of clients will exceed your cost assumptions. At cloud API rates, a 3x usage spike on a single enterprise client can wipe out the profit from five normal clients.
2. Per-Token Pass-Through
You charge the client based on actual token consumption, usually with a markup. Transparent and fair, but operationally complex.
Pros: Zero margin risk, costs always covered, scales naturally.
Cons: Clients hate unpredictable bills. Requires metering infrastructure. Creates friction — clients hesitate to use the product because every query costs money. Kills adoption and engagement.
Typical margin: 30-50% markup on API costs, but total revenue is capped by client willingness to use the product.
3. Hybrid (Base + Overage)
A base subscription covers a usage tier, with per-token charges above the threshold. The compromise approach.
Pros: Predictable base revenue, protection against extreme usage, clients get some cost certainty.
Cons: Complex to explain and sell. Overage charges create negative surprise. Requires the same metering infrastructure as pass-through. Clients still feel penalised for using the product.
Why Subscription Wins — If You Can Make It Work
Every SaaS pricing expert will tell you the same thing: subscription pricing with unlimited usage drives the highest adoption, the lowest churn, and the best lifetime value. When clients do not worry about per-query costs, they integrate your AI deeper into their workflows. Deeper integration means higher switching costs and lower churn.
The only reason AI agencies avoid subscription pricing is cost risk. If your per-inference cost is variable and unpredictable, offering unlimited usage is a gamble.
This is where the model ownership shift changes everything.
How Fine-Tuned Local Models Make Subscription Safe
When you fine-tune a smaller open-source model — say a 7B or 8B parameter model trained on your client's specific domain — and deploy it on fixed-cost infrastructure, your cost structure transforms completely.
Cloud API cost structure: Variable. You pay per token. More usage means more cost. No ceiling.
Self-hosted fine-tuned model cost structure: Fixed. You pay for the server (or reserved GPU instance). Whether you run 100 inferences or 100,000, the monthly infrastructure cost stays the same.
This is the unlock. With fixed infrastructure costs, subscription pricing becomes not just viable but optimal. Your margin actually improves as clients use the product more, because the infrastructure cost is amortised across more queries.
Margin Analysis
Consider a concrete example. An AI agency serves 10 clients at $2,000/month each — $20,000 monthly revenue.
With cloud APIs: Average API cost per client is $800/month, but ranges from $200 to $3,000. Total API costs average $8,000 but can spike to $15,000. Gross margin swings between 25% and 60% month to month.
With self-hosted fine-tuned models: A single GPU server costs $1,500/month and handles all 10 clients comfortably. Gross margin is a stable 92.5% every month. No variance. No surprises.
The fine-tuned model does not need to match GPT-4 on general benchmarks. It needs to be excellent at the specific tasks your clients need — classification, extraction, generation within their domain. A well-tuned 8B model consistently outperforms a general-purpose 70B model on narrow, domain-specific tasks.
How Ertas Enables Fixed-Cost AI Infrastructure
Ertas is built for exactly this workflow. Use Ertas Studio to fine-tune domain-specific models on your client data, export optimised GGUF files, and deploy them on your own infrastructure or through Ertas Cloud.
The platform handles experiment tracking, model evaluation, and format conversion — the operational overhead that normally makes self-hosting impractical for agencies. You focus on client delivery while Ertas handles the ML engineering pipeline.
For agencies, this means you can confidently offer flat subscription pricing, knowing your costs are fixed and your margins are protected. No more spreadsheet gymnastics trying to forecast token usage. No more awkward overage conversations with clients.
The Bottom Line
The pricing model you choose shapes your entire business. Per-token pass-through protects your margin but limits your growth. Subscription pricing drives adoption and retention but requires cost certainty. Fine-tuned local models give you that cost certainty.
The agencies that will dominate the next phase of AI services are the ones that own their model infrastructure, offer simple subscription pricing, and reinvest the margin advantage into better client outcomes.
Ready to make subscription pricing viable for your agency? Join the Ertas waitlist and start building on fixed-cost AI infrastructure.
Further Reading
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Building a Recurring Revenue AI Service with Fine-Tuned Models
How to structure an AI agency offering around fine-tuned models that generates predictable monthly recurring revenue — covering service tiers, pricing models, and the retraining loop.

How to Scope a Custom AI Model Project (and What to Charge)
The discovery questions, project types, price ranges, and scope management strategies for custom AI model projects. How to scope correctly before you quote anything.

Niche AI Agency vs Generalist: Which Wins Clients in 2026
The data is clear: niche AI agencies close faster, charge more, and retain clients longer. Here's why niching works in AI specifically and how to find yours.