
v0 App AI Features at Flat Cost — No Per-Token Pricing
v0 by Vercel makes AI features easy with the Vercel AI SDK. Here's how to replace per-token cloud API costs with a fine-tuned local model at flat monthly cost.
v0 by Vercel generates production-quality React components from natural language. When you add AI features to those components, the natural path is the Vercel AI SDK — which, by default, routes to OpenAI or Anthropic. Per-token. Every request.
For prototypes and early-stage apps, this is fine. For apps with real users, the cost curve becomes a problem quickly, because the Vercel AI SDK makes streaming AI responses easy to add everywhere.
How v0 Apps Typically Use AI
The Vercel AI SDK is the standard for AI features in v0-generated apps. The pattern is clean:
// app/api/chat/route.ts — typical v0 AI feature
import { openai } from "@ai-sdk/openai";
import { streamText } from "ai";
export async function POST(req: Request) {
const { messages } = await req.json();
const result = streamText({
model: openai("gpt-4o-mini"),
messages,
system: "You are a helpful assistant for [your domain].",
});
return result.toDataStreamResponse();
}
This is excellent code. It streams responses, handles errors, works with React's streaming patterns. The v0 generated frontend component consumes this perfectly.
The issue is that every streamText or generateText call is a cloud API request. The SDK abstracts away the cost, which is convenient for development and inconvenient when your billing dashboard arrives.
Vercel AI SDK Costs at Scale
Streaming responses are slightly more expensive than single-call responses because they maintain a connection for the full generation duration and often generate longer outputs (users stop reading when streaming stops, so the model tends to generate more tokens).
Assume a typical v0 AI feature: chat-style interaction, 400 tokens input + 600 tokens output per exchange, using gpt-4o-mini:
| Monthly Active Users | Sessions/User | Exchanges/Session | Monthly API Cost |
|---|---|---|---|
| 200 | 4 | 5 | $9.60 |
| 1,000 | 4 | 5 | $48 |
| 5,000 | 4 | 5 | $240 |
| 20,000 | 4 | 5 | $960 |
| 100,000 | 4 | 5 | $4,800 |
These are gpt-4o-mini estimates. Upgrade to gpt-4o-2024 and multiply by ~14x.
The v0 Deployment Advantage
Here is something specific to v0/Vercel apps: they deploy to Vercel's edge network, with serverless functions handling API routes. This architecture actually helps with the local model migration.
Your AI API route can call any HTTP endpoint. Instead of calling api.openai.com, it calls your Ollama VPS. The serverless function does not care where the request goes — it makes an HTTP call and returns the response to the client.
This means the migration is entirely contained in the API route file. Your React components do not change. Your streaming behavior does not change. Only the model provider changes.
Replacing the AI SDK Backend with a Fine-Tuned Local Model
The Vercel AI SDK has native support for OpenAI-compatible APIs via the createOpenAI function:
// Before — using OpenAI directly:
import { openai } from "@ai-sdk/openai";
const model = openai("gpt-4o-mini");
// After — using your fine-tuned Ollama model:
import { createOpenAI } from "@ai-sdk/openai";
const ollama = createOpenAI({
baseURL: process.env.OLLAMA_BASE_URL, // http://your-vps:11434/v1
apiKey: "not-required",
});
const model = ollama("your-fine-tuned-model-name");
// The rest of your route code stays exactly the same:
const result = streamText({
model, // just this variable changes
messages,
system: "...",
});
Set OLLAMA_BASE_URL as a Vercel environment variable. Your streaming implementation works unchanged — Ollama supports Server-Sent Events streaming in the same format as OpenAI.
Streaming: Does It Work Locally?
Yes. Ollama supports streaming in the OpenAI SSE format. The Vercel AI SDK consumes it correctly. Your frontend streaming component sees no difference — same event format, same data structure.
Performance consideration: local Ollama on a CPU VPS streams at 15-25 tokens/second. Cloud API streams at 50-100+ tokens/second. For most use cases, 15-25 tokens/second is imperceptible to users (it feels like fast typing). For long-form generation (>500 token outputs), the difference becomes noticeable.
If latency matters: a GPU VPS ($60-120/month) streams at 40-80 tokens/second. Still flat cost, significantly faster.
Cost Comparison
| Solution | Monthly Cost at 5K Users | Monthly Cost at 20K Users |
|---|---|---|
| Vercel AI SDK + gpt-4o-mini | $240 | $960 |
| Vercel AI SDK + gpt-4o | $3,360 | $13,440 |
| Ertas fine-tuned + Ollama CPU VPS | $40.50 | $40.50 |
| Ertas fine-tuned + Ollama GPU VPS | $140-190 | $140-190 |
The fine-tuned local model approach is cheaper at 5K users against gpt-4o-mini, and dramatically cheaper against gpt-4o. The GPU VPS option provides significantly better performance while still being far cheaper than cloud API at scale.
Getting Your Training Data from a v0 App
If your v0 app has been running for 2-4 weeks with real users, your API routes are logging requests and responses. Extract these to build your training dataset:
// Add to your API route to log training data:
const result = streamText({ model, messages });
// Log the full interaction for training data
const fullResponse = await result.text;
await db.trainingLogs.create({
input: messages[messages.length - 1].content,
output: fullResponse,
timestamp: new Date(),
accepted: true, // assume accepted until user signals otherwise
});
After 2-4 weeks, you will have 500-2,000 logged interactions. Filter for quality (minimum session time, no immediate retry), export as JSONL, and upload to Ertas.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Further Reading
- Vibecoder AI Cost Guide: All Platforms — Every major platform's AI cost cliff
- Flat-Cost AI Architecture for Indie Apps — Designing for sub-linear AI costs
- Vibe-Coded App AI Costs Scaling — What the cost cliff looks like at 10K users
- Running AI Models Locally — Ollama setup and configuration
- 7B Model Beats API Call — When fine-tuned small models are good enough
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Bubble No-Code App + Local AI: Ship AI Features Without API Bills
Bubble's OpenAI plugin and API connector generate per-token costs at scale. Here's how to replace them with a fine-tuned local model using Ollama's OpenAI-compatible API.

MCP Servers + Local Models: Zero API Costs for Domain-Specific AI Tools
The combination of MCP servers and fine-tuned local models eliminates per-token costs for AI tools built on Claude, Cursor, and other MCP-compatible clients. Here's the cost math and the architecture.

Bolt.new Apps and the OpenAI Cost Cliff: What Happens at Scale
Bolt.new makes it easy to add AI features. Here's exactly what happens to your OpenAI bill as users grow — and how to replace it with a fine-tuned local model at flat cost.