
Building AI Features in Your SaaS: When to Stop Calling the OpenAI API
Adding AI features to your SaaS via OpenAI is fast to ship. But at some usage level, the economics break. Here's how to identify that threshold and what to do about it.
The OpenAI API is the right starting point for almost every SaaS AI feature. It ships fast, requires no infrastructure, and GPT-4o's quality is hard to argue with during the prototyping phase.
But there is a point — a specific point that you can calculate — where calling the OpenAI API is the wrong architecture. This is the analysis you need to run before that point arrives.
Why You Start With the OpenAI API (And Should)
For new AI features, the OpenAI API is nearly always the right first step:
- No ML expertise required
- No infrastructure to manage
- Quality is excellent out of the box
- Iteration is fast — change the prompt, deploy, done
This is not a mistake. It is correct engineering. Building on the OpenAI API validates that the feature is worth building before you invest in more complex infrastructure.
The mistake is treating the OpenAI API as a permanent architecture when your usage grows.
The Economic Inflection Point
OpenAI API costs scale linearly with usage. Your infrastructure costs (servers, databases) typically scale sub-linearly — a server that costs AU$200/month can handle 10x more traffic than a server that costs AU$20/month.
AI costs via API do not work this way. Every additional query costs the same marginal amount. At low usage, this is invisible. At high usage, it becomes a significant fraction of your COGS.
The calculation:
For each AI feature, estimate:
- Average tokens per request (input + output)
- Average requests per active user per month
- Current MAU (monthly active users)
- Projected MAU in 12 months
Example:
- AI email draft feature: 800 tokens per draft
- Users generate 15 drafts/month on average
- 1,000 MAU today, 10,000 MAU in 12 months
- GPT-4o cost: ~AU$0.02 per draft
| MAU | Monthly drafts | GPT-4o cost |
|---|---|---|
| 1,000 | 15,000 | AU$300/month |
| 5,000 | 75,000 | AU$1,500/month |
| 10,000 | 150,000 | AU$3,000/month |
| 50,000 | 750,000 | AU$15,000/month |
At 1,000 MAU, AU$300/month is fine. At 50,000 MAU, AU$15,000/month in a single API line item is a real COGS problem.
Now run the same math with a self-hosted fine-tuned 7B model:
- Server cost (GPU instance or owned hardware): AU$500-1,500/month fixed
- Per-request cost after infrastructure: ~AU$0
| MAU | Monthly drafts | Local model cost |
|---|---|---|
| 1,000 | 15,000 | AU$800/month (infrastructure) |
| 5,000 | 75,000 | AU$800/month |
| 10,000 | 150,000 | AU$800/month |
| 50,000 | 750,000 | AU$1,200/month (larger server) |
The crossover for this example is around 3,000-4,000 MAU. Below that, OpenAI API is cheaper (lower fixed costs, low volume). Above it, self-hosted model is cheaper.
Your crossover point is the date you should start planning the migration, not executing it. Migration takes 4-8 weeks. Start planning at 40% of your crossover MAU.
What the Migration Actually Involves
Migrating from OpenAI API to a self-hosted model is not as complex as it sounds, because:
- Ollama and similar tools expose an OpenAI-compatible API
- Your application code changes only the base URL and model name
- No application architecture changes required
What actually takes time:
- Model selection and evaluation — choosing a base model that handles your task well
- Fine-tuning — training on your feature-specific use case (if prompt-only, this step may be skippable for simple tasks)
- Infrastructure setup — deploying the inference server with proper scaling and monitoring
- Quality validation — verifying the local model matches OpenAI output quality on your specific task
- Gradual rollout — routing a small percentage of traffic to the local model before full migration
Quality: The Real Concern
The concern most SaaS tech leads have about migrating off OpenAI is quality. "GPT-4o is better, our users will notice."
This concern is valid for some tasks and overstated for others.
Tasks where quality concern is valid:
- Open-ended creative generation (GPT-4o genuinely has better creativity at zero-shot)
- Complex multi-step reasoning tasks
- Tasks requiring broad world knowledge
Tasks where a fine-tuned 7B model matches or beats GPT-4o:
- Tasks within a well-defined, narrow domain
- Repetitive formatting and extraction tasks
- Content generation for a consistent style/voice
- Classification and routing
For SaaS AI features, most tasks fall in the second category. An email drafting feature for a specific CRM will produce better results with a fine-tuned 7B model trained on good email examples than GPT-4o with a generic prompt — because the fine-tuned model has learned the specific patterns that make emails good in that context.
Run an evaluation before committing to the migration decision. Generate 100 examples with GPT-4o (your current output) and 100 examples with your candidate local model. Score them blind against your quality criteria. The results usually surprise people.
The Hybrid Architecture
You do not have to choose one or the other permanently. A practical hybrid architecture:
- Tier 1 requests (high volume, well-defined tasks): local fine-tuned model
- Tier 2 requests (lower volume, complex or unusual tasks): GPT-4o API
Implement routing logic based on query complexity signals (length, topic classification, user tier). Simple, high-volume requests go local. Complex edge cases go to OpenAI. This captures most of the cost savings while protecting quality on complex queries.
Infrastructure Options for Self-Hosting
Lowest friction: Ertas cloud deployment. Your fine-tuned model deployed to managed infrastructure, served via OpenAI-compatible API. No server management, no scaling concerns.
Moderate complexity: GPU cloud instance (Lambda Labs, Vast.ai, or a major cloud GPU VM). You manage the Ollama installation and model loading, but the underlying hardware is managed. Good for teams with some ops capability.
Maximum control: Owned hardware. An RTX 4090 workstation or Mac Studio deployed on-premise or in a colocation facility. Highest fixed cost, lowest per-query cost at high volume, full control over the inference environment.
The right choice depends on your team's ops expertise and volume. Most SaaS teams at the migration threshold should start with managed deployment and migrate to owned hardware when the volume justifies the operational investment.
When Not to Migrate
Some SaaS AI features should stay on OpenAI indefinitely:
- Features used by < 1% of your user base (not worth the migration effort)
- Features where the complexity and novelty of tasks genuinely requires frontier model capabilities
- Features that are being replaced or sunset in the next 12 months
- Features where quality risk outweighs cost savings (high-stakes user-facing decisions)
Be honest about which category each feature falls into before investing migration effort.
The Business Case to Present Internally
If you need to justify this migration internally, the calculation is straightforward:
- Current monthly AI API cost at projected scale: [X]
- Projected monthly infrastructure cost after migration: [Y]
- Migration effort: [Z person-weeks] × average developer cost
- Break-even: migration effort cost ÷ (X - Y) monthly savings
- 12-month net savings: (X - Y) × 12 − migration cost
At typical SaaS growth trajectories, a migration that starts at the crossover point pays for itself in 3-6 months and produces 5-8x cost savings over 24 months.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Further Reading
- The Hidden Cost of Per-Token AI Pricing — The full economics of API pricing at scale
- Fine-Tune Once, Charge Monthly: The Productized AI Service Model — For agencies building AI into client products
- Running AI Models Locally — Deployment options and hardware guide
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

When Your SaaS Should Graduate from API Calls to Fine-Tuning
Your AI features work. Your API bill is growing faster than revenue. Here's the decision framework, cost math, and migration path for moving from per-token APIs to fine-tuned models — with real numbers at every step.

Multi-Tenant Fine-Tuning: Per-Customer AI Models in Your SaaS
Your SaaS customers want AI that understands their data, not generic responses. Here's how to architect per-tenant fine-tuned models using LoRA adapters — with real storage math, cost breakdowns, and a serving architecture that scales to hundreds of tenants.

Adding AI Features to Your SaaS Without an ML Team
Your customers expect AI features but you don't have ML engineers. Here's how SaaS product teams can fine-tune domain-specific models using their existing product data — no Python, no ML expertise, no API cost cliff.