Pricing Your AI Agency Services: Flat-Rate vs. Per-Token When Using Self-Hosted Models

Most AI agencies inherited their pricing model from the API era: charge clients based on usage, pass through API costs with a markup. It works, but it caps your margins and makes revenue unpredictable.

Self-hosted models break this dynamic. Your cost is a fixed GPU expense, not a per-token variable. This creates pricing opportunities that API-dependent agencies cannot match.

This article extends the AI agency pricing strategy guide with specific pricing models for agencies running self-hosted fine-tuned models.

The Step-Function Insight

API costs are linear: more tokens, more cost. Self-hosted costs are step functions: fixed cost per GPU tier, zero marginal cost within that tier.

This single fact changes everything about how you should price:

Pricing Model	API-Based Agency	Self-Hosted Agency
Cost structure	Variable (per token)	Fixed (per GPU tier)
Margin on high-usage clients	Thin or negative	Excellent
Revenue predictability	Low	High
Pricing flexibility	Limited by COGS	Wide margin range
Client preference	Unpredictable bills	Predictable budgets

When your costs are fixed, every pricing model that charges more than your fixed cost produces margin. The question is not "can I afford to serve this client?" but "which pricing model maximises the value I capture?"

Pricing Model 1: Flat-Rate Monthly Retainer

How it works: Client pays a fixed monthly fee for unlimited AI usage within defined scope.

Example:

Contract review AI for a law firm: $5,000/month flat
Includes: unlimited contract reviews, monthly model retraining, support
Your cost: ~$200/month allocated (share of GPU, electricity, Ertas Studio seat)
Gross margin: 96%

When to use:

Clients with predictable, moderate-to-high usage
Enterprise clients who prefer budget certainty
Engagements where usage growth benefits you (client uses more → they get more value → they stay longer)

Risks:

A single client with extreme usage could saturate your GPU capacity
Mitigate by defining "unlimited within reasonable use" or setting a soft cap

Margin analysis at different client counts (1 × RTX 5090, $42/month operational):

Clients	Revenue (at $3,000/mo each)	GPU Cost	Gross Margin
3	$9,000	$42	99.5%
5	$15,000	$42	99.7%
10	$30,000	$42	99.9%

Even at conservative pricing, margins are extraordinary once the GPU is paid off.

Pricing Model 2: Per-Seat Pricing

How it works: Client pays per user who has access to the AI tools.

Example:

AI-powered legal research assistant: $200/user/month
Law firm with 15 associates: $3,000/month
Your cost: ~$200/month allocated
Gross margin: 93%

When to use:

Products where usage scales with headcount
Clients who think in terms of per-employee software costs
When you want pricing to scale naturally as the client grows

Advantages:

Familiar pricing model for enterprise buyers (like SaaS)
Revenue grows automatically as the client adds users
Easy for clients to budget and approve

Margin analysis:

Per-seat price	10-person firm	50-person firm	200-person firm
$100/seat	$1,000/mo	$5,000/mo	$20,000/mo
$200/seat	$2,000/mo	$10,000/mo	$40,000/mo
$500/seat	$5,000/mo	$25,000/mo	$100,000/mo

Your GPU cost is the same regardless of seat count (until you hit capacity limits). Per-seat pricing at large firms is wildly profitable.

Pricing Model 3: Per-Project or Per-Engagement

How it works: Client pays a fixed fee for a defined project (e.g., review a specific set of documents).

Example:

Due diligence review for an M&A transaction: $15,000 per deal
Includes: AI-assisted review of up to 5,000 documents, summary report, risk analysis
Your cost: 2-3 days of agency time + negligible compute
Gross margin: 70-80% (lower than retainer because it includes labour)

When to use:

Transaction-based work (M&A, litigation document review)
Clients who are not ready for a monthly commitment
High-value engagements where the output is clearly tied to a business outcome

Advantages:

Aligns pricing with value delivered (a $50M M&A deal justifies $15K for AI review)
No ongoing commitment required (lower barrier to entry)
Can lead to retainer engagements after proving value

Pricing Model 4: Hybrid (Base + Usage)

How it works: Client pays a base retainer for the platform/access, plus a per-unit fee for heavy usage.

Example:

Base: $2,000/month (includes platform access, model hosting, standard support)
Per-review: $25 per contract review beyond 100/month
Most clients stay within the base tier — the per-unit pricing is insurance against extreme usage

When to use:

When you need to protect against outlier usage patterns
When clients have variable but somewhat predictable workloads
As a middle ground for clients hesitant to commit to flat-rate

Worked Margin Examples at Each GPU Tier

Tier 1: Single RTX 5090 ($2,000 hardware, $42/month operation)

Scenario	Monthly Revenue	Monthly Cost	Gross Margin	Annual Profit
3 clients × $3,000 flat	$9,000	$42	99.5%	$107,496
5 clients × $2,000 flat	$10,000	$42	99.6%	$119,496
10 clients × $1,500 flat	$15,000	$42	99.7%	$179,496

Hardware ROI: 1-2 months.

Tier 2: Dual RTX 5090 ($4,000 hardware, $84/month operation)

Scenario	Monthly Revenue	Monthly Cost	Gross Margin	Annual Profit
10 clients × $3,000 flat	$30,000	$84	99.7%	$359,808
15 clients × $2,000 flat	$30,000	$84	99.7%	$359,808
20 per-seat at $200, avg 10 seats	$40,000	$84	99.8%	$479,808

Tier 3: A6000 ($4,500 hardware, $22/month operation)

Better for agencies needing 48 GB VRAM (larger models, more concurrent adapters):

Scenario	Monthly Revenue	Monthly Cost	Gross Margin	Annual Profit
15 clients × $2,500 flat	$37,500	$22	99.9%	$449,736
5 enterprise clients × $10,000 flat	$50,000	$22	100.0%	$599,736

Note: These are gross margins on compute. Total agency margins include labour, software subscriptions, overhead, and client acquisition costs. Realistic net margins for a well-run agency: 40-60%.

Pricing for Regulated Industries

Legal and healthcare clients pay a compliance premium. They are not comparing your price to ChatGPT — they are comparing it to the cost of non-compliance (fines, malpractice risk, reputational damage).

Compliance premium guidelines:

Industry	Standard AI Pricing	With Compliance Premium
General business	$1,500-3,000/month	—
Legal services	—	$3,000-8,000/month
Healthcare	—	$4,000-10,000/month
Financial services	—	$5,000-12,000/month
Government/defence	—	$8,000-20,000/month

The compliance premium is justified because:

On-premise deployment requires more setup and maintenance
Compliance documentation and audit support add ongoing value
The alternative (cloud AI with compliance risk) is not actually an option for these clients
Data sovereignty guarantees have real, quantifiable value

The Pricing Conversation

When presenting pricing to a prospective client:

Lead with value, not cost. "This solution saves your associates 8 hours per week" is a stronger frame than "this costs $5,000/month."

Anchor to the alternative. "Hiring an ML team to build this in-house would cost $500K/year. Our solution delivers the same outcome for $60K/year."

Make the ROI obvious. "At $400/hour billing rates, saving 8 associate-hours per week = $166K/year in additional billable time. Our $60K annual fee delivers a 2.8x return."

Offer a pilot. "Start with a 3-month pilot at $X/month. If the ROI is not clear by month 3, we will part ways." This de-risks the decision for the client.

Ship AI that runs on your users' devices.

Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →

Pricing Your AI Agency Services: Flat-Rate vs. Per-Token When Using Self-Hosted Models

The Step-Function Insight

Pricing Model 1: Flat-Rate Monthly Retainer

Pricing Model 2: Per-Seat Pricing

Pricing Model 3: Per-Project or Per-Engagement

Pricing Model 4: Hybrid (Base + Usage)

Worked Margin Examples at Each GPU Tier

Tier 1: Single RTX 5090 ($2,000 hardware, $42/month operation)

Tier 2: Dual RTX 5090 ($4,000 hardware, $84/month operation)

Tier 3: A6000 ($4,500 hardware, $22/month operation)

Pricing for Regulated Industries

The Pricing Conversation

Further Reading

Ship AI that runs on your users' devices.

Keep reading

Building a Recurring Revenue AI Service with Fine-Tuned Models

How to Price Fine-Tuning Services Profitably (Agency Rate Card)

How to Scope a Custom AI Model Project (and What to Charge)