
Pricing Your AI Agency Services: Flat-Rate vs. Per-Token When Using Self-Hosted Models
How self-hosted AI models change agency pricing strategy. Flat-rate, per-seat, and hybrid pricing models with worked margin examples at each GPU tier.
Most AI agencies inherited their pricing model from the API era: charge clients based on usage, pass through API costs with a markup. It works, but it caps your margins and makes revenue unpredictable.
Self-hosted models break this dynamic. Your cost is a fixed GPU expense, not a per-token variable. This creates pricing opportunities that API-dependent agencies cannot match.
This article extends the AI agency pricing strategy guide with specific pricing models for agencies running self-hosted fine-tuned models.
The Step-Function Insight
API costs are linear: more tokens, more cost. Self-hosted costs are step functions: fixed cost per GPU tier, zero marginal cost within that tier.
This single fact changes everything about how you should price:
| Pricing Model | API-Based Agency | Self-Hosted Agency |
|---|---|---|
| Cost structure | Variable (per token) | Fixed (per GPU tier) |
| Margin on high-usage clients | Thin or negative | Excellent |
| Revenue predictability | Low | High |
| Pricing flexibility | Limited by COGS | Wide margin range |
| Client preference | Unpredictable bills | Predictable budgets |
When your costs are fixed, every pricing model that charges more than your fixed cost produces margin. The question is not "can I afford to serve this client?" but "which pricing model maximises the value I capture?"
Pricing Model 1: Flat-Rate Monthly Retainer
How it works: Client pays a fixed monthly fee for unlimited AI usage within defined scope.
Example:
- Contract review AI for a law firm: $5,000/month flat
- Includes: unlimited contract reviews, monthly model retraining, support
- Your cost: ~$200/month allocated (share of GPU, electricity, Ertas Studio seat)
- Gross margin: 96%
When to use:
- Clients with predictable, moderate-to-high usage
- Enterprise clients who prefer budget certainty
- Engagements where usage growth benefits you (client uses more → they get more value → they stay longer)
Risks:
- A single client with extreme usage could saturate your GPU capacity
- Mitigate by defining "unlimited within reasonable use" or setting a soft cap
Margin analysis at different client counts (1 × RTX 5090, $42/month operational):
| Clients | Revenue (at $3,000/mo each) | GPU Cost | Gross Margin |
|---|---|---|---|
| 3 | $9,000 | $42 | 99.5% |
| 5 | $15,000 | $42 | 99.7% |
| 10 | $30,000 | $42 | 99.9% |
Even at conservative pricing, margins are extraordinary once the GPU is paid off.
Pricing Model 2: Per-Seat Pricing
How it works: Client pays per user who has access to the AI tools.
Example:
- AI-powered legal research assistant: $200/user/month
- Law firm with 15 associates: $3,000/month
- Your cost: ~$200/month allocated
- Gross margin: 93%
When to use:
- Products where usage scales with headcount
- Clients who think in terms of per-employee software costs
- When you want pricing to scale naturally as the client grows
Advantages:
- Familiar pricing model for enterprise buyers (like SaaS)
- Revenue grows automatically as the client adds users
- Easy for clients to budget and approve
Margin analysis:
| Per-seat price | 10-person firm | 50-person firm | 200-person firm |
|---|---|---|---|
| $100/seat | $1,000/mo | $5,000/mo | $20,000/mo |
| $200/seat | $2,000/mo | $10,000/mo | $40,000/mo |
| $500/seat | $5,000/mo | $25,000/mo | $100,000/mo |
Your GPU cost is the same regardless of seat count (until you hit capacity limits). Per-seat pricing at large firms is wildly profitable.
Pricing Model 3: Per-Project or Per-Engagement
How it works: Client pays a fixed fee for a defined project (e.g., review a specific set of documents).
Example:
- Due diligence review for an M&A transaction: $15,000 per deal
- Includes: AI-assisted review of up to 5,000 documents, summary report, risk analysis
- Your cost: 2-3 days of agency time + negligible compute
- Gross margin: 70-80% (lower than retainer because it includes labour)
When to use:
- Transaction-based work (M&A, litigation document review)
- Clients who are not ready for a monthly commitment
- High-value engagements where the output is clearly tied to a business outcome
Advantages:
- Aligns pricing with value delivered (a $50M M&A deal justifies $15K for AI review)
- No ongoing commitment required (lower barrier to entry)
- Can lead to retainer engagements after proving value
Pricing Model 4: Hybrid (Base + Usage)
How it works: Client pays a base retainer for the platform/access, plus a per-unit fee for heavy usage.
Example:
- Base: $2,000/month (includes platform access, model hosting, standard support)
- Per-review: $25 per contract review beyond 100/month
- Most clients stay within the base tier — the per-unit pricing is insurance against extreme usage
When to use:
- When you need to protect against outlier usage patterns
- When clients have variable but somewhat predictable workloads
- As a middle ground for clients hesitant to commit to flat-rate
Worked Margin Examples at Each GPU Tier
Tier 1: Single RTX 5090 ($2,000 hardware, $42/month operation)
| Scenario | Monthly Revenue | Monthly Cost | Gross Margin | Annual Profit |
|---|---|---|---|---|
| 3 clients × $3,000 flat | $9,000 | $42 | 99.5% | $107,496 |
| 5 clients × $2,000 flat | $10,000 | $42 | 99.6% | $119,496 |
| 10 clients × $1,500 flat | $15,000 | $42 | 99.7% | $179,496 |
Hardware ROI: 1-2 months.
Tier 2: Dual RTX 5090 ($4,000 hardware, $84/month operation)
| Scenario | Monthly Revenue | Monthly Cost | Gross Margin | Annual Profit |
|---|---|---|---|---|
| 10 clients × $3,000 flat | $30,000 | $84 | 99.7% | $359,808 |
| 15 clients × $2,000 flat | $30,000 | $84 | 99.7% | $359,808 |
| 20 per-seat at $200, avg 10 seats | $40,000 | $84 | 99.8% | $479,808 |
Tier 3: A6000 ($4,500 hardware, $22/month operation)
Better for agencies needing 48 GB VRAM (larger models, more concurrent adapters):
| Scenario | Monthly Revenue | Monthly Cost | Gross Margin | Annual Profit |
|---|---|---|---|---|
| 15 clients × $2,500 flat | $37,500 | $22 | 99.9% | $449,736 |
| 5 enterprise clients × $10,000 flat | $50,000 | $22 | 100.0% | $599,736 |
Note: These are gross margins on compute. Total agency margins include labour, software subscriptions, overhead, and client acquisition costs. Realistic net margins for a well-run agency: 40-60%.
Pricing for Regulated Industries
Legal and healthcare clients pay a compliance premium. They are not comparing your price to ChatGPT — they are comparing it to the cost of non-compliance (fines, malpractice risk, reputational damage).
Compliance premium guidelines:
| Industry | Standard AI Pricing | With Compliance Premium |
|---|---|---|
| General business | $1,500-3,000/month | — |
| Legal services | — | $3,000-8,000/month |
| Healthcare | — | $4,000-10,000/month |
| Financial services | — | $5,000-12,000/month |
| Government/defence | — | $8,000-20,000/month |
The compliance premium is justified because:
- On-premise deployment requires more setup and maintenance
- Compliance documentation and audit support add ongoing value
- The alternative (cloud AI with compliance risk) is not actually an option for these clients
- Data sovereignty guarantees have real, quantifiable value
The Pricing Conversation
When presenting pricing to a prospective client:
Lead with value, not cost. "This solution saves your associates 8 hours per week" is a stronger frame than "this costs $5,000/month."
Anchor to the alternative. "Hiring an ML team to build this in-house would cost $500K/year. Our solution delivers the same outcome for $60K/year."
Make the ROI obvious. "At $400/hour billing rates, saving 8 associate-hours per week = $166K/year in additional billable time. Our $60K annual fee delivers a 2.8x return."
Offer a pilot. "Start with a 3-month pilot at $X/month. If the ROI is not clear by month 3, we will part ways." This de-risks the decision for the client.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Further Reading
- AI Agency Pricing Strategy — Comprehensive pricing frameworks for AI agencies
- The Real Cost of Self-Hosting AI Models — GPU pricing breakdown to inform your cost basis
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Building a Recurring Revenue AI Service with Fine-Tuned Models
How to structure an AI agency offering around fine-tuned models that generates predictable monthly recurring revenue — covering service tiers, pricing models, and the retraining loop.

How to Price Fine-Tuning Services Profitably (Agency Rate Card)
A concrete rate card and pricing methodology for AI agencies offering fine-tuning services. Stop guessing on price — here's what to charge and how to explain it.

How to Scope a Custom AI Model Project (and What to Charge)
The discovery questions, project types, price ranges, and scope management strategies for custom AI model projects. How to scope correctly before you quote anything.