Back to blog
    v0 App AI Features at Flat Cost — No Per-Token Pricing
    v0vercelai-featurescost-reductionfine-tuningsegment:vibecoder

    v0 App AI Features at Flat Cost — No Per-Token Pricing

    v0 by Vercel makes AI features easy with the Vercel AI SDK. Here's how to replace per-token cloud API costs with a fine-tuned local model at flat monthly cost.

    EErtas Team·

    v0 by Vercel generates production-quality React components from natural language. When you add AI features to those components, the natural path is the Vercel AI SDK — which, by default, routes to OpenAI or Anthropic. Per-token. Every request.

    For prototypes and early-stage apps, this is fine. For apps with real users, the cost curve becomes a problem quickly, because the Vercel AI SDK makes streaming AI responses easy to add everywhere.

    How v0 Apps Typically Use AI

    The Vercel AI SDK is the standard for AI features in v0-generated apps. The pattern is clean:

    // app/api/chat/route.ts — typical v0 AI feature
    import { openai } from "@ai-sdk/openai";
    import { streamText } from "ai";
    
    export async function POST(req: Request) {
      const { messages } = await req.json();
    
      const result = streamText({
        model: openai("gpt-4o-mini"),
        messages,
        system: "You are a helpful assistant for [your domain].",
      });
    
      return result.toDataStreamResponse();
    }
    

    This is excellent code. It streams responses, handles errors, works with React's streaming patterns. The v0 generated frontend component consumes this perfectly.

    The issue is that every streamText or generateText call is a cloud API request. The SDK abstracts away the cost, which is convenient for development and inconvenient when your billing dashboard arrives.

    Vercel AI SDK Costs at Scale

    Streaming responses are slightly more expensive than single-call responses because they maintain a connection for the full generation duration and often generate longer outputs (users stop reading when streaming stops, so the model tends to generate more tokens).

    Assume a typical v0 AI feature: chat-style interaction, 400 tokens input + 600 tokens output per exchange, using gpt-4o-mini:

    Monthly Active UsersSessions/UserExchanges/SessionMonthly API Cost
    20045$9.60
    1,00045$48
    5,00045$240
    20,00045$960
    100,00045$4,800

    These are gpt-4o-mini estimates. Upgrade to gpt-4o-2024 and multiply by ~14x.

    The v0 Deployment Advantage

    Here is something specific to v0/Vercel apps: they deploy to Vercel's edge network, with serverless functions handling API routes. This architecture actually helps with the local model migration.

    Your AI API route can call any HTTP endpoint. Instead of calling api.openai.com, it calls your Ollama VPS. The serverless function does not care where the request goes — it makes an HTTP call and returns the response to the client.

    This means the migration is entirely contained in the API route file. Your React components do not change. Your streaming behavior does not change. Only the model provider changes.

    Replacing the AI SDK Backend with a Fine-Tuned Local Model

    The Vercel AI SDK has native support for OpenAI-compatible APIs via the createOpenAI function:

    // Before — using OpenAI directly:
    import { openai } from "@ai-sdk/openai";
    const model = openai("gpt-4o-mini");
    
    // After — using your fine-tuned Ollama model:
    import { createOpenAI } from "@ai-sdk/openai";
    
    const ollama = createOpenAI({
      baseURL: process.env.OLLAMA_BASE_URL, // http://your-vps:11434/v1
      apiKey: "not-required",
    });
    
    const model = ollama("your-fine-tuned-model-name");
    
    // The rest of your route code stays exactly the same:
    const result = streamText({
      model, // just this variable changes
      messages,
      system: "...",
    });
    

    Set OLLAMA_BASE_URL as a Vercel environment variable. Your streaming implementation works unchanged — Ollama supports Server-Sent Events streaming in the same format as OpenAI.

    Streaming: Does It Work Locally?

    Yes. Ollama supports streaming in the OpenAI SSE format. The Vercel AI SDK consumes it correctly. Your frontend streaming component sees no difference — same event format, same data structure.

    Performance consideration: local Ollama on a CPU VPS streams at 15-25 tokens/second. Cloud API streams at 50-100+ tokens/second. For most use cases, 15-25 tokens/second is imperceptible to users (it feels like fast typing). For long-form generation (>500 token outputs), the difference becomes noticeable.

    If latency matters: a GPU VPS ($60-120/month) streams at 40-80 tokens/second. Still flat cost, significantly faster.

    Cost Comparison

    SolutionMonthly Cost at 5K UsersMonthly Cost at 20K Users
    Vercel AI SDK + gpt-4o-mini$240$960
    Vercel AI SDK + gpt-4o$3,360$13,440
    Ertas fine-tuned + Ollama CPU VPS$40.50$40.50
    Ertas fine-tuned + Ollama GPU VPS$140-190$140-190

    The fine-tuned local model approach is cheaper at 5K users against gpt-4o-mini, and dramatically cheaper against gpt-4o. The GPU VPS option provides significantly better performance while still being far cheaper than cloud API at scale.

    Getting Your Training Data from a v0 App

    If your v0 app has been running for 2-4 weeks with real users, your API routes are logging requests and responses. Extract these to build your training dataset:

    // Add to your API route to log training data:
    const result = streamText({ model, messages });
    
    // Log the full interaction for training data
    const fullResponse = await result.text;
    await db.trainingLogs.create({
      input: messages[messages.length - 1].content,
      output: fullResponse,
      timestamp: new Date(),
      accepted: true, // assume accepted until user signals otherwise
    });
    

    After 2-4 weeks, you will have 500-2,000 logged interactions. Filter for quality (minimum session time, no immediate retry), export as JSONL, and upload to Ertas.


    Ship AI that runs on your users' devices.

    Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Further Reading

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading