Bolt.new Apps and the OpenAI Cost Cliff: What Happens at Scale

Bolt.new is excellent for shipping fast. You describe what you want, Bolt generates the full-stack app, and you are deployed in a few hours. The generated code is clean, the architecture is reasonable, and the AI features work out of the box.

But there is a structural problem baked into every Bolt.new app that uses OpenAI. It does not show up during development. It does not show up at launch. It shows up around month three, when you have a few hundred users and your API dashboard looks worse every week.

The Bolt.new Happy Path

Here is how AI features get into Bolt.new apps. You describe your app: "A writing assistant that helps users improve their content with AI suggestions." Bolt generates the app, including a backend endpoint that calls the OpenAI chat completions API with your system prompt. The code looks something like this:

// Generated by Bolt.new
const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    { role: "system", content: "You are a writing assistant..." },
    { role: "user", content: userContent }
  ]
});

Clean, functional, and exactly right for validating the idea. You ship it, users try it, feedback is positive. You are off to the races.

The problem is what happens next.

Where the Bill Hits

Let us trace the actual numbers for a Bolt.new writing assistant app.

The assumptions:

gpt-4o-mini for cost efficiency
Average request: 300 tokens input + 400 tokens output = 700 tokens
gpt-4o-mini pricing: $0.15/1M input tokens, $0.60/1M output tokens
Cost per request: ~$0.045 + $0.24 = ~$0.000285 per request
Average user: 40 requests per month

Users	Monthly API Requests	Monthly OpenAI Cost
100	4,000	$1.14
500	20,000	$5.70
1,000	40,000	$11.40
3,000	120,000	$34.20
5,000	200,000	$57.00
10,000	400,000	$114.00
50,000	2,000,000	$570.00

These numbers look manageable. The problem is that (a) these are the best-case estimates using the cheapest capable model, and (b) the cost scales linearly while user growth is the goal.

If you are building something with higher-value features (GPT-4o instead of gpt-4o-mini, longer prompts, more frequent calls), multiply these numbers by 10-20x.

For a more realistic production app using GPT-4o at 700 tokens per request:

$2.50/1M input, $10.00/1M output
Cost per request: ~$0.00175 + $0.004 = ~$0.0058
At 10,000 users × 40 requests/month: $2,320/month

That is the cost cliff.

Why Bolt.new Makes This Worse

Bolt.new's speed makes it dangerously easy to add AI features everywhere. You prompt: "Add an AI summary to each dashboard view." Bolt adds it. "Add AI-powered suggestions to the sidebar." Bolt adds it. "Make the search bar use AI to understand intent." Bolt adds it.

Each addition is another API call per user session. By the time your app is polished, you might have 4-6 AI touchpoints per user per session. Each one is another linear scaling cost.

The ease of addition becomes a liability at scale. You have built an app where AI is deeply integrated — which is great for UX, terrible for margins.

The Fix: Fine-Tune Once, Run Locally

The solution is to replace the OpenAI API call with a fine-tuned local model. The quality is equivalent for your specific use case; the cost structure is fundamentally different.

Here is the process:

Step 1: Collect training data from your existing API logs.

If your app has been running for 2-4 weeks with real users, you have the data you need. Export your API call logs and extract the input/output pairs. Filter for cases where users engaged with the AI output (did not immediately retry, continued using the app). Format as JSONL:

{"instruction": "Improve the following paragraph for clarity:", "input": "user paragraph here", "output": "improved paragraph here"}

Aim for 400-800 examples. Quality matters more than quantity.

Step 2: Fine-tune in Ertas (30-90 minutes).

Upload the JSONL to Ertas, select Qwen 2.5 7B as the base model, configure training settings. The visual interface handles the rest. Training takes 45-90 minutes. Download the GGUF file.

Step 3: Deploy Ollama on a VPS.

Spin up a Hetzner CX32 or CX42 ($14-26/month). Install Ollama, create a Modelfile for your GGUF, start serving.

Step 4: Update your Bolt.new app code.

This is the part that surprises most developers: it is often a one-line change. Ollama serves an OpenAI-compatible API. Update the baseURL in your OpenAI client:

// Before (OpenAI):
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// After (Ollama — one line changes):
const openai = new OpenAI({
  apiKey: "not-needed", // Ollama doesn't require auth by default
  baseURL: "http://your-vps-ip:11434/v1",
});

// The rest of your code stays exactly the same
const response = await openai.chat.completions.create({
  model: "my-fine-tuned-model", // your model name in Ollama
  messages: [...],
});

Your existing Bolt.new generated code works unchanged. Only the client configuration updates.

Cost After Migration

Scenario	OpenAI API (Monthly)	Ertas + VPS (Monthly)
1,000 users	$11-115	$40.50
5,000 users	$57-580	$40.50
10,000 users	$114-1,160	$40.50
50,000 users	$570-5,800	$66.50 (larger VPS)

The fine-tuned local model costs: $14.50/month (Ertas Builder, Early Bird) + $26/month (VPS). Total: $40.50/month regardless of request volume.

Break-even: For an app using gpt-4o-mini at moderate usage, break-even is around 500-700 users. For gpt-4o at higher usage, break-even can be under 100 users.

Will Quality Suffer?

For narrow, domain-specific tasks — which is what Bolt.new AI features almost always are — no. A 7B model fine-tuned on 500-800 examples of your specific task will perform at 90-95% of GPT-4 accuracy on that task.

The caveat: if your app requires broad reasoning, creative writing at a high level, or tasks that genuinely need frontier model intelligence, the trade-off is different. Most Bolt.new AI features are extraction, classification, summarization, or style matching — all tasks where fine-tuned small models excel.

You can verify before committing: use Ertas's evaluation tools to benchmark your fine-tuned model against a held-out test set, with GPT-4 outputs as the reference. If quality is within acceptable range, ship the migration.

Ship AI that runs on your users' devices.

Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →