Bolt.new + Ertas

Deploy fine-tuned models as the AI backend for applications built with Bolt.new, replacing expensive cloud API calls with local inference that scales without per-token costs.

Overview

Bolt.new has transformed how developers bootstrap full-stack applications by letting them describe an app in natural language and receive a complete, deployable project in minutes. From SaaS dashboards to internal tools, Bolt.new generates frontend components, backend API routes, database schemas, and deployment configurations — dramatically compressing the time from concept to working prototype. It has become the tool of choice for indie developers, hackathon teams, and agencies that need to ship MVPs fast.

The AI features baked into Bolt-generated applications almost always depend on cloud API providers like OpenAI or Anthropic. Every chat interface, content generator, or intelligent feature in a Bolt-built app sends requests to these external services, accumulating per-token charges that grow linearly with user adoption. For indie developers launching a product, this creates an uncomfortable dynamic: the more successful the app becomes, the more unsustainable the AI costs. A chatbot handling 10,000 conversations per month can easily consume the entire revenue of an early-stage product in API fees alone.

How Ertas Integrates

Ertas solves the cost scaling problem by letting you fine-tune a model specifically for your application's use case and deploy it on your own infrastructure. Instead of paying for a general-purpose model to handle your app's narrow task — whether that is answering questions about your product, generating specific content formats, or classifying user inputs — you train a smaller, specialized model that performs the same task at a fraction of the computational cost. Ertas Studio handles the entire training pipeline, from dataset preparation to experiment tracking to GGUF export.

The switch in a Bolt-generated codebase is minimal. Bolt typically scaffolds AI features using the OpenAI SDK or direct fetch calls to the chat completions endpoint. Since Ollama exposes an identical API, you only need to change the base URL in your environment configuration — from https://api.openai.com/v1 to http://localhost:11434/v1 — and update the model name. No changes to request formatting, response parsing, or streaming logic are required. Your Bolt-built app continues to work exactly as before, but inference now runs on hardware you control at a fixed cost.

Getting Started

1
Identify the AI task in your Bolt-built app
Determine what your application's AI feature actually does — chat responses, content generation, classification, summarization — and collect representative input-output examples that define the expected behavior.
2
Fine-tune a task-specific model in Ertas Studio
Upload your examples as a training dataset, select an appropriately sized base model for your task complexity, and run a fine-tuning job. A focused 7B or 8B parameter model often matches cloud API quality for narrow tasks.
3
Deploy the model with Ollama
Export the trained model as GGUF, deploy it with Ollama, and verify inference works correctly. Test with the same prompts your Bolt app sends to confirm response format compatibility.
4
Update the Bolt app's endpoint configuration
Change the API base URL in your Bolt-generated code from the cloud provider to your Ollama address. Update the model name in your environment variables. No other code changes are needed since Ollama uses the same API format.
5
Deploy and monitor in production
Deploy your updated Bolt app with the local inference backend. Monitor response latency and quality through Ertas Cloud's dashboard, and scale your inference server independently of your application server as traffic grows.

Benefits

Decouple AI costs from user growth — inference costs stay fixed regardless of how many users adopt your app
Fine-tuned smaller models match cloud API quality for focused tasks at a fraction of the compute
Drop-in replacement requires only a base URL change in your Bolt-generated codebase
No vendor lock-in to cloud AI providers — switch models or providers without application changes
User data stays on your infrastructure instead of being sent to third-party API providers
Predictable monthly infrastructure cost makes unit economics viable for early-stage products