
The Indie Dev's Guide to AI Model Costs in 2026
A comprehensive comparison of AI model costs in 2026 — from cloud APIs to self-hosted open-source models. Find the cheapest way to add AI to your indie app.
Adding AI to your indie app has never been easier. The tooling is mature, the models are capable, and every tutorial makes it look like plugging in an API key is all you need. What those tutorials do not cover is the bill that arrives at the end of the month — and how it scales as your app grows.
This guide is the cost comparison I wish I had when I started. It covers every major option available to indie developers in 2026, from cloud APIs to self-hosted open-source models, with real numbers at real scale.
The Landscape of AI Pricing in 2026
AI pricing has evolved significantly. Cloud API prices have dropped from their 2023-2024 peaks, but they are still per-token — meaning your costs scale linearly with usage. Meanwhile, open-source models have reached a quality level where a fine-tuned 7-8B parameter model can match or beat cloud APIs on specific tasks.
The choice is no longer "cloud vs. bad open-source." It is "cloud convenience vs. self-hosted economics." Both are viable. The right answer depends on your scale.
Cloud API Tier Comparison
Here is what the major cloud APIs cost per million tokens in early 2026 for their most commonly used tiers.
| Provider | Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|---|
| OpenAI | GPT-4o | $2.50 | $10.00 |
| OpenAI | GPT-4o-mini | $0.15 | $0.60 |
| Anthropic | Claude 3.5 Sonnet | $3.00 | $15.00 |
| Anthropic | Claude 3.5 Haiku | $0.80 | $4.00 |
| Gemini 1.5 Pro | $1.25 | $5.00 | |
| Gemini 1.5 Flash | $0.075 | $0.30 | |
| Together AI | Llama 3.3 70B | $0.88 | $0.88 |
| Together AI | Llama 3.3 8B | $0.18 | $0.18 |
These prices look small until you do the multiplication. A typical AI-powered app interaction involves 500-1,000 input tokens and 200-500 output tokens. At 1,000 daily active users making 5 requests each, you are processing roughly 5 million input tokens and 2 million output tokens per day.
With GPT-4o, that is $12.50 + $20.00 = $32.50 per day, or roughly $975 per month. With GPT-4o-mini, it drops to about $1.95 per day, or $58.50 per month. The cheaper models are dramatically more affordable, but you trade capability for cost.
Self-Hosted Options
Self-hosting means running open-source models on your own hardware or rented GPU servers. The two most common approaches in 2026 are Ollama and raw llama.cpp.
Ollama provides a clean interface for running quantised models. It handles model management, serves an OpenAI-compatible API, and works on consumer hardware. A MacBook Pro with 32GB RAM can run an 8B model at useful speeds. A $50/month cloud GPU (RTX 4090 or equivalent) can serve hundreds of concurrent users.
llama.cpp is the lower-level option. More configuration, more performance tuning, but maximum control over inference parameters and memory usage.
The key cost difference: self-hosted pricing is per-server, not per-token. Whether you run 1,000 inferences or 1,000,000, the server costs the same.
| Setup | Monthly Cost | Capacity (req/day) | Cost at 5K req/day |
|---|---|---|---|
| Cloud GPU (RTX 4090) | $50-80 | 10,000-50,000 | $50-80 |
| Cloud GPU (A100 40GB) | $150-300 | 50,000-200,000 | $150-300 |
| Mac Mini M4 Pro (own) | ~$15 electricity | 5,000-15,000 | ~$15 |
| Consumer PC + RTX 4090 (own) | ~$20 electricity | 15,000-50,000 | ~$20 |
At 5,000 requests per day with an 8B model, self-hosting costs between $15 and $80 per month. The equivalent cloud API cost with GPT-4o-mini would be roughly $58.50 per month. The crossover point where self-hosting becomes cheaper depends on your specific usage pattern, but it generally happens around 2,000-3,000 daily requests.
The Fine-Tuning Sweet Spot
Here is the insight that changes the economics entirely: a fine-tuned small model outperforms a general-purpose large model on your specific tasks.
A general-purpose model like GPT-4o is designed to handle everything — creative writing, code generation, mathematical reasoning, casual conversation. Your app probably needs it to do one or two things well. Classification, entity extraction, structured output generation, domain-specific Q&A.
When you fine-tune a 7-8B model on examples of exactly what your app needs, it learns to do that specific task with high accuracy. You trade general capability (which you do not need) for specialised performance (which you do) at a fraction of the cost.
The practical result: a fine-tuned Llama 3.3 8B or Qwen 2.5 7B running on a $50/month GPU server outperforms GPT-4o on your specific task while costing 90% less at scale.
Cost-Per-User Analysis at Different Scales
Let's map this out across growth stages, assuming a typical app with 5 AI interactions per user per day.
| Users (DAU) | Cloud API (GPT-4o-mini) | Self-Hosted (8B, cloud GPU) | Cost per User (Cloud) | Cost per User (Self-Hosted) |
|---|---|---|---|---|
| 100 | $5.85/mo | $50/mo | $0.059 | $0.500 |
| 500 | $29.25/mo | $50/mo | $0.059 | $0.100 |
| 1,000 | $58.50/mo | $50/mo | $0.059 | $0.050 |
| 5,000 | $292.50/mo | $80/mo | $0.059 | $0.016 |
| 10,000 | $585.00/mo | $150/mo | $0.059 | $0.015 |
| 50,000 | $2,925/mo | $300/mo | $0.059 | $0.006 |
The pattern is clear. Cloud API costs scale linearly — your per-user cost is constant regardless of scale. Self-hosted costs are front-loaded — expensive per user at low scale, dramatically cheaper at high scale.
When Cloud APIs Still Make Sense
Cloud APIs are not always the wrong choice. They are the right choice when:
- You have fewer than 100 daily users. The operational overhead of self-hosting is not worth the savings.
- You are still prototyping. Use cloud APIs to validate that AI adds value before investing in infrastructure.
- You need frontier-level capability. For tasks that genuinely require GPT-4o or Claude 3.5 Sonnet-class reasoning, cloud APIs provide capability that open-source models have not yet matched.
- You have no ML experience and no time to learn. Fine-tuning has a learning curve. If you need to ship this week, use an API.
When to Switch to Self-Hosted
The trigger to switch is usually economic, but not always. Consider self-hosting when:
- Your monthly API bill exceeds $200 and is growing.
- You need predictable costs for pricing your own product.
- Your clients or users require data privacy guarantees.
- You are experiencing rate limiting or latency issues with cloud APIs.
- You want to eliminate a critical single point of failure.
The migration does not have to be all-or-nothing. Start by self-hosting your highest-volume, most cost-sensitive AI task. Keep cloud APIs for low-volume tasks where convenience outweighs cost.
How Ertas Fits In
Ertas makes the transition from cloud APIs to self-hosted models practical for indie developers. Ertas Studio handles fine-tuning without requiring ML expertise, and exports optimised GGUF models ready for deployment with Ollama or llama.cpp.
Ready to cut your AI costs? Join the Ertas waitlist and start building on infrastructure you control.
Further Reading
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Your Vibe-Coded App Hit 1,000 Users — Now What?
You shipped fast with Cursor and Bolt. Users love it. But your OpenAI bill just crossed $200/month and it's climbing. Here's the cost survival guide for vibe-coded apps hitting real scale.

From Prototype to Product: Replacing API Calls with Fine-Tuned Models
Your Lovable/Bolt prototype works. Users are signing up. But every API call eats your margin. Here's the step-by-step playbook for migrating from cloud APIs to fine-tuned local models in production.

The Vibecoder's Guide to AI Unit Economics: When Free Tiers Stop Being Free
OpenAI's free tier got you started. But at scale, you're spending $5K/month on Opus for tasks Haiku could handle. Here's how to think about AI costs like a founder, not a hobbyist.