
Building a Recurring Revenue AI Service with Fine-Tuned Models
How to structure an AI agency offering around fine-tuned models that generates predictable monthly recurring revenue — covering service tiers, pricing models, and the retraining loop.
Most AI agencies are stuck in project mode. A client shows up, you build them something custom over 4-8 weeks, deliver it, send the final invoice, and then start hunting for the next client. Revenue looks like a saw-tooth wave — spike, drop, spike, drop. You're always three months from going broke.
This isn't a business model. It's freelancing with a fancier title.
The agencies that build real businesses — the ones that hit $50K, $100K, $500K in monthly revenue and stay there — figured out the same thing SaaS companies figured out a decade ago: recurring revenue changes everything. Predictable income means you can hire, invest in tooling, and stop treating every month like a survival exercise.
Fine-tuned models are uniquely suited to a recurring revenue model. Here's why and how to build it.
Why Fine-Tuned Models Are Natural Recurring Revenue
A fine-tuned model isn't a one-time product. It's a living system that needs:
- Ongoing hosting and serving. The model runs 24/7. Someone needs to keep it running.
- Regular retraining. The world changes. Client needs evolve. Production data reveals gaps. Models that aren't retrained degrade.
- Monitoring. Accuracy drift, latency spikes, usage patterns — someone needs to watch the dashboards.
- Optimization. New base models release. Better fine-tuning techniques emerge. Inference costs drop. Clients want the benefits.
Every one of these is a service you can charge for monthly. The client can't easily do these things themselves (that's why they hired you), and stopping any of them degrades the value they're getting.
Compare this to a one-time API integration or a prompt engineering engagement. Those are projects with clear endpoints. A fine-tuned model is infrastructure that requires ongoing care. That's your retainer.
The Revenue Model
The structure is straightforward:
Initial setup fee: One-time payment for dataset creation, first fine-tune, evaluation, and deployment. This covers your upfront labor and gets the client to value.
Monthly retainer: Ongoing fee for hosting, monitoring, retraining, and optimization. This is your MRR.
The setup fee typically ranges from $5,000 to $15,000 depending on complexity:
- Simple (single task, clean data, standard model): $5K-$7K. Example: a customer support classifier fine-tuned on the client's ticket categories.
- Moderate (multiple tasks, data cleaning needed, custom evaluation): $8K-$12K. Example: a legal document review model that classifies, extracts entities, and flags risks.
- Complex (multi-model pipeline, custom data collection, extensive evaluation): $12K-$15K+. Example: a financial analysis system with separate models for data extraction, analysis, and report generation.
The monthly retainer is where the business lives. Let's talk about how to structure it.
Service Tier Structure
Three tiers work well for most agencies. Fewer than three and you leave money on the table. More than three and you create operational complexity that eats your margins.
Tier 1: Essentials — $2,000/month
What the client gets:
- Model hosting and serving (99.5% uptime SLA)
- Basic monitoring (uptime, latency, error rates)
- Quarterly retraining with production data
- Monthly usage report
- Email support (next business day response)
What it costs you:
- Ertas platform: ~$14.50/month
- GPU compute share: ~$50-150/month (shared infrastructure across clients)
- Staff time: ~2-3 hours/quarter for retraining, ~1 hour/month for monitoring
- Effective monthly labor cost: ~$200-400
Your margin: 75-85%
Who buys this: Small businesses with straightforward use cases. They need the model to work and don't want to think about it. Low-touch, high-margin.
Tier 2: Growth — $5,000/month
What the client gets:
- Everything in Tier 1, plus:
- Dedicated model instance (not shared infrastructure)
- Monthly retraining with performance analysis
- Detailed evaluation reports with improvement metrics
- Priority support (4-hour response SLA)
- Quarterly strategy call to discuss model performance and opportunities
What it costs you:
- Ertas platform: ~$14.50/month
- Dedicated GPU allocation: ~$200-400/month
- Staff time: ~4-6 hours/month for retraining, eval, reports
- Effective monthly labor cost: ~$600-1,000
Your margin: 70-80%
Who buys this: Mid-market companies where the model is a meaningful part of their operation. They care about performance improvement over time and want proof it's getting better.
Tier 3: Enterprise — $10,000+/month
What the client gets:
- Everything in Tier 2, plus:
- Multiple models or multi-model pipeline
- Continuous improvement (bi-weekly retraining cycles)
- Custom evaluation frameworks
- Dedicated account manager
- Monthly performance review with stakeholders
- Priority deployment for urgent changes
- Custom integration support
What it costs you:
- Ertas platform: ~$14.50/month (per model instance)
- GPU allocation: ~$500-1,500/month
- Staff time: ~15-25 hours/month (account management, retraining, custom work)
- Effective monthly labor cost: ~$2,000-4,000
Your margin: 60-75%
Who buys this: Larger companies or companies where AI is core to their business. High-touch, but the absolute margin per client is the largest.
The Retraining Loop as Value Driver
Here's the critical insight that makes the recurring model work: the retraining loop is what justifies the retainer.
Without retraining, a client could argue they should just pay for hosting (which is cheap) and call it a day. The retraining loop creates compounding value:
-
Collect production data. Every inference request generates data — inputs, outputs, and (ideally) feedback on whether the output was useful.
-
Analyze performance. Identify categories where the model underperforms, new patterns in user queries, and edge cases that weren't in the original training set.
-
Retrain. Incorporate the new data into the training set. Fine-tune the adapter on the expanded dataset. Evaluate against the golden test set.
-
Show improvement. Present the client with concrete metrics: "Last month, accuracy on refund-related queries improved from 89% to 94% after incorporating 342 new training examples from production."
-
Justify the retainer. The client sees measurable improvement every month. Their model gets better over time, not worse. That's worth $2K-$10K/month.
This creates a flywheel. More production usage generates more training data, which enables better retraining, which improves the model, which drives more production usage. The client's model gets better the more they use it — and they need your service to make that happen.
Quantifying the Improvement
Always show the numbers. Not vague statements like "the model improved." Concrete metrics:
Monthly Performance Report — March 2026
Retraining Summary:
- New training examples added: 487
- Source: production data with human feedback labels
- Training time: 2.3 hours on A100
Performance Comparison:
| Metric | Feb 2026 | Mar 2026 | Change |
|---------------------|----------|----------|---------|
| Overall accuracy | 92.3% | 94.1% | +1.8% |
| Refund queries | 89.1% | 94.2% | +5.1% |
| Shipping queries | 95.7% | 96.1% | +0.4% |
| Hallucination rate | 3.4% | 2.1% | -1.3% |
| Avg response time | 1.4s | 1.2s | -0.2s |
Cost Impact:
- Estimated API cost equivalent: $4,200/month
- Your current cost: $2,000/month (Tier 1)
- Monthly savings: $2,200
- Cumulative savings since deployment: $14,800
That last section — cost comparison — is the strongest retention tool you have. Every month, show the client how much they'd be paying for equivalent API access. The gap between API cost and your retainer is the value you're delivering.
Pricing the Initial Build
The setup fee needs to cover your costs and communicate value. Here's how to think about it:
Your actual costs for initial build:
- Dataset creation/cleaning: 10-30 hours of labor
- Fine-tuning runs (experimentation + final): $20-100 in compute
- Evaluation and QA: 4-8 hours of labor
- Deployment and integration: 4-8 hours of labor
- Total labor: 20-50 hours
At a loaded cost of $75-100/hour for a skilled engineer, your cost is $1,500-$5,000. Pricing at $5K-$15K gives you a healthy margin on the build and doesn't create sticker shock relative to the ongoing retainer.
Don't Under-price the Build
Some agencies price the initial build at cost (or even below) to "get the client in the door" for the retainer. This is a mistake for two reasons:
- It attracts clients who are price-sensitive, and those clients will fight you on the retainer too.
- It signals that your work isn't valuable, which makes the retainer harder to justify.
Charge a fair price for the build. If the client balks at $8K for the setup, they're going to balk at $5K/month for the retainer. Better to find out now.
Your Cost Structure
Let's be transparent about what this business actually costs to run:
Per-client variable costs:
- Ertas platform: $14.50/month
- GPU compute (shared): $50-400/month depending on tier
- Staff time: 1-25 hours/month depending on tier
Fixed costs (doesn't scale per-client):
- Your infrastructure (monitoring tools, CI/CD, etc.): $200-500/month
- Your time managing the business: priceless (and uncounted in margin calculations)
The math at 10 clients:
| Metric | Amount |
|---|---|
| 4 × Tier 1 @ $2K | $8,000 |
| 4 × Tier 2 @ $5K | $20,000 |
| 2 × Tier 3 @ $10K | $20,000 |
| Monthly revenue | $48,000 |
| Total Ertas fees | $145 |
| Total GPU compute | $3,000 |
| Total staff hours | ~80 hours |
| Staff cost (@ $80/hr) | $6,400 |
| Fixed overhead | $500 |
| Total costs | $10,045 |
| Gross margin | $37,955 (79%) |
79% gross margin at $48K MRR with 10 clients. That's with a full-time person doing nothing but model operations. In practice, many agencies run leaner than this in the early stages.
The Sales Pitch Framework
When you're sitting across from a potential client, here's the framework that closes deals:
Step 1: Quantify their current spend. "How many API calls are you making per month? At what cost?" Most businesses spending $3K+/month on API calls are candidates.
Step 2: Show the savings. "You're spending $5,000/month on OpenAI API calls for customer support. We can build you a fine-tuned model that runs locally for $2,000/month — same quality, no data leaving your infrastructure, and it gets better every month."
Step 3: Add the privacy angle. "Right now, every customer conversation goes through a third-party API. With a local model, your data stays on your infrastructure. For a company handling [healthcare/financial/legal] data, that's not just a nice-to-have."
Step 4: Show the improvement curve. "API models don't get better at your specific task over time. Ours do. Every month, we retrain on your production data and show you the metrics. Three months in, the model knows your business better than any general-purpose API ever will."
Step 5: Make the ask. "The setup is $[X] and the monthly service is $[Y]. We can have your first model in production within 3 weeks."
Retention Strategies
Getting clients is hard. Keeping them should be easy if you do these things:
Monthly performance reports. Not optional. Every client, every month, gets a report showing how their model is performing and how it's improving. This is the single most effective retention tool.
Expand to additional use cases. Your Tier 1 client with a customer support model? Three months in, ask them: "Would you like to add a product recommendation model? Same infrastructure, same monthly cadence. We can add it to your plan for an additional $1,500/month." Expansion revenue is easier than new client acquisition.
Make yourself indispensable. The more production data flows through your system, the harder it is for the client to leave. Not because you're locking them in — they own their model and data — but because the switching cost of rebuilding the training pipeline, evaluation framework, and monitoring infrastructure with another provider is real.
Quarterly business reviews. For Tier 2 and Tier 3 clients, schedule a quarterly call with their decision-maker (not just their technical contact). Show the cumulative value delivered. Discuss their roadmap. Propose ways AI can help with their next challenge. This is how $5K/month clients become $15K/month clients.
Annual pricing. Offer a 10-15% discount for annual commitments. This reduces churn and gives you predictable revenue for planning purposes. A client paying $54K/year on an annual contract is worth more than one paying $5K/month who might churn at month 7.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
The Compounding Advantage
The beautiful thing about this model is that it compounds. Each month your operations get more efficient:
- Retraining pipelines get automated further
- Evaluation frameworks get reused across clients
- You develop domain expertise that makes new client onboarding faster
- Your base model infrastructure serves more clients without proportional cost increases
Month 1, you spend 20 hours on a client's retraining cycle. Month 6, it's 4 hours because you've automated everything. But the retainer stays the same. Your margin improves without you raising prices.
At 20 clients, your fixed costs are spread thin, your per-client variable costs are optimized, and your gross margin approaches 85%. That's the economics of a productized service built on fine-tuned models.
Stop trading hours for dollars. Start building recurring revenue.
For more on agency pricing, read our guides on productized AI fine-tuning services, AI agency pricing strategy, and pricing self-hosted models for clients.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Build Recurring Revenue: The AI Agency Model Maintenance Retainer
Fine-tuned models create a natural retainer structure. Here's how to build $500-2,000/month per client retainers around model maintenance, with the pitch, the pricing, and the infrastructure.

Fine-Tune Once, Charge Monthly: The Productized AI Service Model
How to turn a one-time fine-tuning engagement into a recurring monthly revenue stream. The service model, pricing, and client conversation that makes it work.

How to Price Fine-Tuning Services Profitably (Agency Rate Card)
A concrete rate card and pricing methodology for AI agencies offering fine-tuning services. Stop guessing on price — here's what to charge and how to explain it.