AI Agency Pricing Strategy: Subscription vs. Per-Token Pass-Through

If you run an AI agency, you have almost certainly stared at this problem: your clients want predictable monthly invoices, but your costs scale with every token processed through a cloud API. The mismatch between fixed-price expectations and variable-cost infrastructure is the central tension of AI agency economics in 2026.

Get pricing wrong and you either bleed margin on heavy-usage clients or lose deals because your quotes look too expensive. Get it right and you build a compounding, scalable business. This article breaks down the three dominant pricing models, their trade-offs, and why fine-tuned local models fundamentally change the calculus.

The Pricing Dilemma

Traditional software agencies price on time and materials or fixed project fees. Both models work because the marginal cost of running software is near zero — once the code is written, hosting costs are predictable.

AI agencies do not have this luxury. Every inference call to OpenAI, Anthropic, or Google costs real money. A client who sends 10,000 requests per day costs you dramatically more than one who sends 100. Yet both clients expect the same flat monthly rate.

This creates a dangerous dynamic. You either pad your pricing with enough buffer to cover worst-case usage (making you uncompetitive) or you price for average usage and hope no client goes heavy (risking negative margins).

Three Pricing Models Compared

1. Flat Subscription

The client pays a fixed monthly fee for access to your AI-powered product or service. Simple, predictable, and exactly what clients want.

Pros: Easy to sell, predictable revenue, clients love it, high perceived value.

Cons: You absorb all usage variance. A single high-usage client can destroy your margin for the month. Requires accurate usage forecasting, which is nearly impossible for new products.

Typical margin risk: If you price for median usage, roughly 20% of clients will exceed your cost assumptions. At cloud API rates, a 3x usage spike on a single enterprise client can wipe out the profit from five normal clients.

2. Per-Token Pass-Through

You charge the client based on actual token consumption, usually with a markup. Transparent and fair, but operationally complex.

Pros: Zero margin risk, costs always covered, scales naturally.

Cons: Clients hate unpredictable bills. Requires metering infrastructure. Creates friction — clients hesitate to use the product because every query costs money. Kills adoption and engagement.

Typical margin: 30-50% markup on API costs, but total revenue is capped by client willingness to use the product.

3. Hybrid (Base + Overage)

A base subscription covers a usage tier, with per-token charges above the threshold. The compromise approach.

Pros: Predictable base revenue, protection against extreme usage, clients get some cost certainty.

Cons: Complex to explain and sell. Overage charges create negative surprise. Requires the same metering infrastructure as pass-through. Clients still feel penalised for using the product.

Why Subscription Wins — If You Can Make It Work

Every SaaS pricing expert will tell you the same thing: subscription pricing with unlimited usage drives the highest adoption, the lowest churn, and the best lifetime value. When clients do not worry about per-query costs, they integrate your AI deeper into their workflows. Deeper integration means higher switching costs and lower churn.

The only reason AI agencies avoid subscription pricing is cost risk. If your per-inference cost is variable and unpredictable, offering unlimited usage is a gamble.

This is where the model ownership shift changes everything.

How Fine-Tuned Local Models Make Subscription Safe

When you fine-tune a smaller open-source model — say a 7B or 8B parameter model trained on your client's specific domain — and deploy it on fixed-cost infrastructure, your cost structure transforms completely.

Cloud API cost structure: Variable. You pay per token. More usage means more cost. No ceiling.

Self-hosted fine-tuned model cost structure: Fixed. You pay for the server (or reserved GPU instance). Whether you run 100 inferences or 100,000, the monthly infrastructure cost stays the same.

This is the unlock. With fixed infrastructure costs, subscription pricing becomes not just viable but optimal. Your margin actually improves as clients use the product more, because the infrastructure cost is amortised across more queries.

Margin Analysis

Consider a concrete example. An AI agency serves 10 clients at $2,000/month each — $20,000 monthly revenue.

With cloud APIs: Average API cost per client is $800/month, but ranges from $200 to $3,000. Total API costs average $8,000 but can spike to $15,000. Gross margin swings between 25% and 60% month to month.

With self-hosted fine-tuned models: A single GPU server costs $1,500/month and handles all 10 clients comfortably. Gross margin is a stable 92.5% every month. No variance. No surprises.

The fine-tuned model does not need to match GPT-4 on general benchmarks. It needs to be excellent at the specific tasks your clients need — classification, extraction, generation within their domain. A well-tuned 8B model consistently outperforms a general-purpose 70B model on narrow, domain-specific tasks.

How Ertas Enables Fixed-Cost AI Infrastructure

Ertas is built for exactly this workflow. Use Ertas Studio to fine-tune domain-specific models on your client data, export optimised GGUF files, and deploy them on your own infrastructure or through Ertas Cloud.

The platform handles experiment tracking, model evaluation, and format conversion — the operational overhead that normally makes self-hosting impractical for agencies. You focus on client delivery while Ertas handles the ML engineering pipeline.

For agencies, this means you can confidently offer flat subscription pricing, knowing your costs are fixed and your margins are protected. No more spreadsheet gymnastics trying to forecast token usage. No more awkward overage conversations with clients.

The Bottom Line

The pricing model you choose shapes your entire business. Per-token pass-through protects your margin but limits your growth. Subscription pricing drives adoption and retention but requires cost certainty. Fine-tuned local models give you that cost certainty.

The agencies that will dominate the next phase of AI services are the ones that own their model infrastructure, offer simple subscription pricing, and reinvest the margin advantage into better client outcomes.

Ready to make subscription pricing viable for your agency? See Ertas pricing and start building on fixed-cost AI infrastructure.

AI Agency Pricing Strategy: Subscription vs. Per-Token Pass-Through

The Pricing Dilemma

Three Pricing Models Compared

1. Flat Subscription

2. Per-Token Pass-Through

3. Hybrid (Base + Overage)

Why Subscription Wins — If You Can Make It Work

How Fine-Tuned Local Models Make Subscription Safe

Margin Analysis

How Ertas Enables Fixed-Cost AI Infrastructure

The Bottom Line

Further Reading

Ship AI that runs on your users' devices.

Keep reading

Building a Recurring Revenue AI Service with Fine-Tuned Models

Niche AI Agency vs Generalist: Which Wins Clients in 2026

How to Scope a Custom AI Model Project (and What to Charge)