From Cursor to Production: Deploying AI Features Without Vendor Lock-In

You built something real. Maybe you used Cursor, maybe Copilot, maybe you just hammered out code in VS Code with an AI assistant on the side. Regardless of how you got here, your app works and it has AI features that users love. There is just one problem — every AI call goes through OpenAI's API, and you do not control any of it.

This is the vibe coder's trap. The prototyping experience is so smooth that you do not notice the dependency until it matters. Then your API key gets rate-limited during a traffic spike, or OpenAI deprecates the model you fine-tuned against, or your monthly bill doubles because they adjusted pricing. Suddenly the foundation of your product is a service you have zero control over.

Let's fix that.

How Vibe-Coded Apps Get Locked In

The lock-in happens gradually and on multiple levels. Understanding the layers is the first step to escaping them.

When you use Cursor or similar AI coding tools to build your app, the generated code naturally uses the OpenAI SDK. It is the default suggestion, the most documented path, and the one with the most Stack Overflow answers. Within a few sessions, your codebase has openai as a core dependency, your prompts are tuned for GPT-4's specific behaviour, and your error handling is built around OpenAI's response format.

None of this is malicious. It is simply the path of least resistance. But it creates a dependency that is more expensive to unwind with every commit.

The Three Types of Vendor Lock-In

1. API Format Lock-In

Your code is structured around a specific API contract — request format, response schema, error codes, streaming protocol. Switching providers means rewriting every integration point, updating error handling, and testing edge cases you never thought about.

This is the most visible form of lock-in and, fortunately, the easiest to solve.

2. Model Behaviour Lock-In

This is the insidious one. Your prompts, your few-shot examples, your output parsing logic — all of it is tuned for how a specific model responds. GPT-4 has particular tendencies in how it formats output, how it handles ambiguity, and how it follows instructions. Switch to Claude or Gemini and your carefully crafted prompts produce different results.

Every prompt you write without thinking about portability digs this trench deeper.

3. Pricing Lock-In

You have architected your product around a certain cost-per-query assumption. Your free tier, your pricing page, your unit economics all assume OpenAI's current pricing. When they change prices — and they will, in either direction — your business model is at their mercy.

This is the lock-in that kills businesses. Not because the technology fails, but because the economics shift under you.

How Owning Your Model Eliminates All Three

When you run your own fine-tuned model, all three types of lock-in dissolve.

API format: You choose the inference server. Ollama, vLLM, llama.cpp — all support the OpenAI-compatible API format. Your existing code works with minimal changes. You control the API contract and it never changes unless you change it.

Model behaviour: A fine-tuned model is trained on your data, for your specific tasks. Its behaviour is deterministic and under your control. No surprise changes from a provider updating their model. No degradation from a new version that is "better on benchmarks" but worse for your use case.

Pricing: Your costs are your infrastructure costs. A GPU server costs the same whether you run 1,000 or 100,000 inferences per day. Your unit economics are predictable and entirely within your control.

The Migration Path from OpenAI Dependency to Self-Hosted

Here is the practical path for moving an existing app from OpenAI to a self-hosted model.

Step 1: Audit Your AI Integration Points

Catalogue every place your code calls the OpenAI API. Note what each call does — classification, generation, extraction, embedding. Most apps have fewer distinct AI tasks than you think, often just three to five core operations.

Step 2: Collect Your Training Data

Every successful AI interaction in your app is training data. Export your prompt-completion pairs, filter for quality, and format them for fine-tuning. If you have been logging API calls (and you should have been), you already have a dataset.

Step 3: Fine-Tune a Base Model

Take a capable open-source base model — Llama 3.3 8B or Qwen 2.5 7B are excellent starting points — and fine-tune it on your collected data. The model does not need to be a generalist. It needs to be excellent at the specific tasks your app requires.

Step 4: Deploy with OpenAI SDK Compatibility

This is the key insight that makes migration painless. Ollama and similar inference servers expose an OpenAI-compatible API endpoint. You change the base URL in your OpenAI SDK configuration and point it at your local server. Your existing code — the prompts, the response parsing, the error handling — works without modification.

// Before: locked to OpenAI
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// After: your own model, same code
const openai = new OpenAI({
  baseURL: "http://localhost:11434/v1",
  apiKey: "not-needed",
});

That is the entire migration at the SDK level. The rest is ensuring your fine-tuned model handles your specific tasks as well as or better than the cloud model did.

Step 5: Validate and Cut Over

Run your test suite against the self-hosted model. Compare outputs. For most domain-specific tasks, a well-tuned 8B model matches or exceeds GPT-4 performance because it is specialised rather than general-purpose.

OpenAI SDK Compatibility with Ollama

The OpenAI SDK compatibility layer deserves special emphasis because it is what makes this migration practical for indie developers. You do not need to rewrite your application. You do not need a new SDK. You change a URL and optionally an API key.

Ollama supports chat completions, embeddings, and streaming — the three endpoints that cover 99% of indie app AI usage. Response formats match the OpenAI spec, so your existing parsing code works unchanged.

This compatibility is not an accident. The open-source inference ecosystem deliberately adopted the OpenAI API format as a standard, specifically to make migration frictionless.

Making It Real with Ertas

Ertas streamlines the path from cloud dependency to model ownership. Use Ertas Studio to fine-tune a model on your app's specific tasks, export an optimised GGUF file, and deploy it with Ollama or any compatible inference server.

The platform handles the ML engineering complexity — dataset preparation, training configuration, evaluation, and export — so you can focus on what you are good at: building the product.

Ready to own your AI stack? Join the Ertas waitlist and deploy AI features you control.