
Bolt.new Apps and the OpenAI Cost Cliff: What Happens at Scale
Bolt.new makes it easy to add AI features. Here's exactly what happens to your OpenAI bill as users grow — and how to replace it with a fine-tuned local model at flat cost.
Bolt.new is excellent for shipping fast. You describe what you want, Bolt generates the full-stack app, and you are deployed in a few hours. The generated code is clean, the architecture is reasonable, and the AI features work out of the box.
But there is a structural problem baked into every Bolt.new app that uses OpenAI. It does not show up during development. It does not show up at launch. It shows up around month three, when you have a few hundred users and your API dashboard looks worse every week.
The Bolt.new Happy Path
Here is how AI features get into Bolt.new apps. You describe your app: "A writing assistant that helps users improve their content with AI suggestions." Bolt generates the app, including a backend endpoint that calls the OpenAI chat completions API with your system prompt. The code looks something like this:
// Generated by Bolt.new
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: "You are a writing assistant..." },
{ role: "user", content: userContent }
]
});
Clean, functional, and exactly right for validating the idea. You ship it, users try it, feedback is positive. You are off to the races.
The problem is what happens next.
Where the Bill Hits
Let us trace the actual numbers for a Bolt.new writing assistant app.
The assumptions:
gpt-4o-minifor cost efficiency- Average request: 300 tokens input + 400 tokens output = 700 tokens
gpt-4o-minipricing: $0.15/1M input tokens, $0.60/1M output tokens- Cost per request: ~$0.045 + $0.24 = ~$0.000285 per request
- Average user: 40 requests per month
| Users | Monthly API Requests | Monthly OpenAI Cost |
|---|---|---|
| 100 | 4,000 | $1.14 |
| 500 | 20,000 | $5.70 |
| 1,000 | 40,000 | $11.40 |
| 3,000 | 120,000 | $34.20 |
| 5,000 | 200,000 | $57.00 |
| 10,000 | 400,000 | $114.00 |
| 50,000 | 2,000,000 | $570.00 |
These numbers look manageable. The problem is that (a) these are the best-case estimates using the cheapest capable model, and (b) the cost scales linearly while user growth is the goal.
If you are building something with higher-value features (GPT-4o instead of gpt-4o-mini, longer prompts, more frequent calls), multiply these numbers by 10-20x.
For a more realistic production app using GPT-4o at 700 tokens per request:
- $2.50/1M input, $10.00/1M output
- Cost per request: ~$0.00175 + $0.004 = ~$0.0058
- At 10,000 users × 40 requests/month: $2,320/month
That is the cost cliff.
Why Bolt.new Makes This Worse
Bolt.new's speed makes it dangerously easy to add AI features everywhere. You prompt: "Add an AI summary to each dashboard view." Bolt adds it. "Add AI-powered suggestions to the sidebar." Bolt adds it. "Make the search bar use AI to understand intent." Bolt adds it.
Each addition is another API call per user session. By the time your app is polished, you might have 4-6 AI touchpoints per user per session. Each one is another linear scaling cost.
The ease of addition becomes a liability at scale. You have built an app where AI is deeply integrated — which is great for UX, terrible for margins.
The Fix: Fine-Tune Once, Run Locally
The solution is to replace the OpenAI API call with a fine-tuned local model. The quality is equivalent for your specific use case; the cost structure is fundamentally different.
Here is the process:
Step 1: Collect training data from your existing API logs.
If your app has been running for 2-4 weeks with real users, you have the data you need. Export your API call logs and extract the input/output pairs. Filter for cases where users engaged with the AI output (did not immediately retry, continued using the app). Format as JSONL:
{"instruction": "Improve the following paragraph for clarity:", "input": "user paragraph here", "output": "improved paragraph here"}
Aim for 400-800 examples. Quality matters more than quantity.
Step 2: Fine-tune in Ertas (30-90 minutes).
Upload the JSONL to Ertas, select Qwen 2.5 7B as the base model, configure training settings. The visual interface handles the rest. Training takes 45-90 minutes. Download the GGUF file.
Step 3: Deploy Ollama on a VPS.
Spin up a Hetzner CX32 or CX42 ($14-26/month). Install Ollama, create a Modelfile for your GGUF, start serving.
Step 4: Update your Bolt.new app code.
This is the part that surprises most developers: it is often a one-line change. Ollama serves an OpenAI-compatible API. Update the baseURL in your OpenAI client:
// Before (OpenAI):
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// After (Ollama — one line changes):
const openai = new OpenAI({
apiKey: "not-needed", // Ollama doesn't require auth by default
baseURL: "http://your-vps-ip:11434/v1",
});
// The rest of your code stays exactly the same
const response = await openai.chat.completions.create({
model: "my-fine-tuned-model", // your model name in Ollama
messages: [...],
});
Your existing Bolt.new generated code works unchanged. Only the client configuration updates.
Cost After Migration
| Scenario | OpenAI API (Monthly) | Ertas + VPS (Monthly) |
|---|---|---|
| 1,000 users | $11-115 | $40.50 |
| 5,000 users | $57-580 | $40.50 |
| 10,000 users | $114-1,160 | $40.50 |
| 50,000 users | $570-5,800 | $66.50 (larger VPS) |
The fine-tuned local model costs: $14.50/month (Ertas Builder, Early Bird) + $26/month (VPS). Total: $40.50/month regardless of request volume.
Break-even: For an app using gpt-4o-mini at moderate usage, break-even is around 500-700 users. For gpt-4o at higher usage, break-even can be under 100 users.
Will Quality Suffer?
For narrow, domain-specific tasks — which is what Bolt.new AI features almost always are — no. A 7B model fine-tuned on 500-800 examples of your specific task will perform at 90-95% of GPT-4 accuracy on that task.
The caveat: if your app requires broad reasoning, creative writing at a high level, or tasks that genuinely need frontier model intelligence, the trade-off is different. Most Bolt.new AI features are extraction, classification, summarization, or style matching — all tasks where fine-tuned small models excel.
You can verify before committing: use Ertas's evaluation tools to benchmark your fine-tuned model against a held-out test set, with GPT-4 outputs as the reference. If quality is within acceptable range, ship the migration.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Further Reading
- Vibecoder AI Cost Guide: All Platforms — How every major builder platform hits the AI cost cliff
- Lovable App AI Cost Problem — Same problem, different platform
- Vibe-Coded App AI Costs Scaling — The full cost cliff breakdown at 10K users
- 7B Model Beats API Call — When fine-tuned small models match GPT-4 for narrow tasks
- Flat-Cost AI Architecture for Indie Apps — Designing for sub-linear AI costs from the start
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Replit App AI Costs Exploding? Replace OpenAI with a Fine-Tuned Local Model
Replit's always-on deployment and easy AI integration create a specific API cost problem. Here's how to replace OpenAI with a fine-tuned local model and cut costs to flat rate.

Your Lovable App Has a $600/Month Problem
Lovable makes building AI apps effortless — until your API bill arrives. Here's the cost math every Lovable builder needs to see, and the fix that keeps AI costs flat at any scale.

Vibecoder AI Cost Guide: Every Major Builder Platform Covered (2026)
The complete AI cost guide for vibecoders using Bolt.new, Replit, Lovable, Cursor, Windsurf, v0, and Bubble. How each platform hits the API cost cliff and how to fix it.