Building AI Features in Your SaaS: When to Stop Calling the OpenAI API

The OpenAI API is the right starting point for almost every SaaS AI feature. It ships fast, requires no infrastructure, and GPT-4o's quality is hard to argue with during the prototyping phase.

But there is a point — a specific point that you can calculate — where calling the OpenAI API is the wrong architecture. This is the analysis you need to run before that point arrives.

Why You Start With the OpenAI API (And Should)

For new AI features, the OpenAI API is nearly always the right first step:

No ML expertise required
No infrastructure to manage
Quality is excellent out of the box
Iteration is fast — change the prompt, deploy, done

This is not a mistake. It is correct engineering. Building on the OpenAI API validates that the feature is worth building before you invest in more complex infrastructure.

The mistake is treating the OpenAI API as a permanent architecture when your usage grows.

The Economic Inflection Point

OpenAI API costs scale linearly with usage. Your infrastructure costs (servers, databases) typically scale sub-linearly — a server that costs AU$200/month can handle 10x more traffic than a server that costs AU$20/month.

AI costs via API do not work this way. Every additional query costs the same marginal amount. At low usage, this is invisible. At high usage, it becomes a significant fraction of your COGS.

The calculation:

For each AI feature, estimate:

Average tokens per request (input + output)
Average requests per active user per month
Current MAU (monthly active users)
Projected MAU in 12 months

Example:

AI email draft feature: 800 tokens per draft
Users generate 15 drafts/month on average
1,000 MAU today, 10,000 MAU in 12 months
GPT-4o cost: ~AU$0.02 per draft

MAU	Monthly drafts	GPT-4o cost
1,000	15,000	AU$300/month
5,000	75,000	AU$1,500/month
10,000	150,000	AU$3,000/month
50,000	750,000	AU$15,000/month

At 1,000 MAU, AU$300/month is fine. At 50,000 MAU, AU$15,000/month in a single API line item is a real COGS problem.

Now run the same math with a self-hosted fine-tuned 7B model:

Server cost (GPU instance or owned hardware): AU$500-1,500/month fixed
Per-request cost after infrastructure: ~AU$0

MAU	Monthly drafts	Local model cost
1,000	15,000	AU$800/month (infrastructure)
5,000	75,000	AU$800/month
10,000	150,000	AU$800/month
50,000	750,000	AU$1,200/month (larger server)

The crossover for this example is around 3,000-4,000 MAU. Below that, OpenAI API is cheaper (lower fixed costs, low volume). Above it, self-hosted model is cheaper.

Your crossover point is the date you should start planning the migration, not executing it. Migration takes 4-8 weeks. Start planning at 40% of your crossover MAU.

What the Migration Actually Involves

Migrating from OpenAI API to a self-hosted model is not as complex as it sounds, because:

Ollama and similar tools expose an OpenAI-compatible API
Your application code changes only the base URL and model name
No application architecture changes required

What actually takes time:

Model selection and evaluation — choosing a base model that handles your task well
Fine-tuning — training on your feature-specific use case (if prompt-only, this step may be skippable for simple tasks)
Infrastructure setup — deploying the inference server with proper scaling and monitoring
Quality validation — verifying the local model matches OpenAI output quality on your specific task
Gradual rollout — routing a small percentage of traffic to the local model before full migration

Quality: The Real Concern

The concern most SaaS tech leads have about migrating off OpenAI is quality. "GPT-4o is better, our users will notice."

This concern is valid for some tasks and overstated for others.

Tasks where quality concern is valid:

Open-ended creative generation (GPT-4o genuinely has better creativity at zero-shot)
Complex multi-step reasoning tasks
Tasks requiring broad world knowledge

Tasks where a fine-tuned 7B model matches or beats GPT-4o:

Tasks within a well-defined, narrow domain
Repetitive formatting and extraction tasks
Content generation for a consistent style/voice
Classification and routing

For SaaS AI features, most tasks fall in the second category. An email drafting feature for a specific CRM will produce better results with a fine-tuned 7B model trained on good email examples than GPT-4o with a generic prompt — because the fine-tuned model has learned the specific patterns that make emails good in that context.

Run an evaluation before committing to the migration decision. Generate 100 examples with GPT-4o (your current output) and 100 examples with your candidate local model. Score them blind against your quality criteria. The results usually surprise people.

The Hybrid Architecture

You do not have to choose one or the other permanently. A practical hybrid architecture:

Tier 1 requests (high volume, well-defined tasks): local fine-tuned model
Tier 2 requests (lower volume, complex or unusual tasks): GPT-4o API

Implement routing logic based on query complexity signals (length, topic classification, user tier). Simple, high-volume requests go local. Complex edge cases go to OpenAI. This captures most of the cost savings while protecting quality on complex queries.

Infrastructure Options for Self-Hosting

Lowest friction: Ertas cloud deployment. Your fine-tuned model deployed to managed infrastructure, served via OpenAI-compatible API. No server management, no scaling concerns.

Moderate complexity: GPU cloud instance (Lambda Labs, Vast.ai, or a major cloud GPU VM). You manage the Ollama installation and model loading, but the underlying hardware is managed. Good for teams with some ops capability.

Maximum control: Owned hardware. An RTX 4090 workstation or Mac Studio deployed on-premise or in a colocation facility. Highest fixed cost, lowest per-query cost at high volume, full control over the inference environment.

The right choice depends on your team's ops expertise and volume. Most SaaS teams at the migration threshold should start with managed deployment and migrate to owned hardware when the volume justifies the operational investment.

When Not to Migrate

Some SaaS AI features should stay on OpenAI indefinitely:

Features used by < 1% of your user base (not worth the migration effort)
Features where the complexity and novelty of tasks genuinely requires frontier model capabilities
Features that are being replaced or sunset in the next 12 months
Features where quality risk outweighs cost savings (high-stakes user-facing decisions)

Be honest about which category each feature falls into before investing migration effort.

The Business Case to Present Internally

If you need to justify this migration internally, the calculation is straightforward:

Current monthly AI API cost at projected scale: [X]
Projected monthly infrastructure cost after migration: [Y]
Migration effort: [Z person-weeks] × average developer cost
Break-even: migration effort cost ÷ (X - Y) monthly savings
12-month net savings: (X - Y) × 12 − migration cost

At typical SaaS growth trajectories, a migration that starts at the crossover point pays for itself in 3-6 months and produces 5-8x cost savings over 24 months.

Ship AI that runs on your users' devices.

Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →