
Ollama's OpenAI-Compatible API: Drop Your Fine-Tuned Model Into Any OpenAI Integration
Ollama exposes an OpenAI-compatible REST API. Any code written for the OpenAI SDK — Langchain, LlamaIndex, your own app — works with your fine-tuned local model by changing one URL. Here's what to know.
Ollama exposes an OpenAI-compatible REST API at http://localhost:11434/v1. This means every library, framework, and application that integrates with OpenAI can point at your local fine-tuned model with a one-line change.
No new SDKs. No API wrapper code. Just change the baseURL and the model name.
What "OpenAI-Compatible" Actually Means
Ollama implements the OpenAI Chat Completions API format:
POST /v1/chat/completions— the primary endpoint- Request body format identical to OpenAI's (model, messages, temperature, max_tokens, stream, etc.)
- Response format identical (choices, message.content, usage, etc.)
Not every OpenAI feature is implemented. Supported:
- Chat completions (most important)
- Streaming via
stream: true - Embeddings via
POST /v1/embeddings(select models) - Model listing via
GET /v1/models
Not supported:
- Fine-tuning via API (you do this in Ertas instead)
- Image generation
- Voice/audio APIs
- Assistants API (the stateful threads/runs interface)
- OpenAI-specific features like moderation or JSON mode with strict schema
For inference (the vast majority of use cases), the compatibility is complete.
The One-Line Migration
JavaScript / Node.js:
// Before (OpenAI cloud)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY
});
// After (Ollama local model) — one change
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:11434/v1',
apiKey: 'ollama' // Required field but not validated by Ollama
});
// Your generation code is UNCHANGED
const response = await client.chat.completions.create({
model: 'your-fine-tuned-model', // This changes to your Ollama model name
messages: [
{ role: 'user', content: 'Your prompt here' }
]
});
console.log(response.choices[0].message.content);
Python:
# Before
from openai import OpenAI
client = OpenAI(api_key="sk-...")
# After — one change
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # Not validated
)
# Same generation code
response = client.chat.completions.create(
model="your-fine-tuned-model",
messages=[{"role": "user", "content": "Your prompt"}]
)
print(response.choices[0].message.content)
Curl:
# Before
curl https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
# After — change URL and model
curl http://localhost:11434/v1/chat/completions \
-H "Authorization: Bearer ollama" \
-H "Content-Type: application/json" \
-d '{"model": "your-fine-tuned-model", "messages": [{"role": "user", "content": "Hello"}]}'
Frameworks and Libraries That Work Immediately
Because Ollama is OpenAI-compatible, all of these work with zero code changes beyond the baseURL:
| Framework / Library | Configuration |
|---|---|
| LangChain (JS/Python) | ChatOpenAI({ baseUrl: "http://localhost:11434/v1" }) |
| LlamaIndex | OpenAI(api_base="http://localhost:11434/v1") |
| Vercel AI SDK | createOpenAI({ baseURL: "http://localhost:11434/v1" }) |
| OpenAI Agents SDK | Set OPENAI_BASE_URL environment variable |
| Instructor | Pass OpenAI client with Ollama baseURL |
| DSPy | lm = dspy.LM("ollama/your-model") |
| Semantic Kernel | OpenAI connector with custom endpoint |
| Flowise | OpenAI node with base path override |
| n8n | OpenAI node with baseURL override |
Most tools that support "OpenAI with custom base URL" work. Most tools that have hardcoded OpenAI URLs do not.
Remote Ollama Server
When your Ollama runs on a VPS (not localhost), you need to expose it:
On the VPS:
# Ollama listens on 0.0.0.0 by default when OLLAMA_HOST is set
OLLAMA_HOST=0.0.0.0:11434 ollama serve
Security note: Never expose Ollama's port directly to the internet. Put Nginx in front with basic auth or API key validation:
server {
listen 443 ssl;
server_name ollama.yourdomain.com;
location /v1/ {
# Simple API key check
if ($http_authorization != "Bearer your-secret-key") {
return 401 '{"error": "Unauthorized"}';
}
proxy_pass http://localhost:11434/v1/;
}
}
Then in your code:
const client = new OpenAI({
baseURL: 'https://ollama.yourdomain.com/v1',
apiKey: 'your-secret-key'
});
This gives you a fully secured, remotely accessible fine-tuned model API that is OpenAI-compatible. Deploy your Ollama server once, use it from any client.
Streaming Responses
Streaming works identically to OpenAI:
const stream = await client.chat.completions.create({
model: 'your-fine-tuned-model',
messages: [{ role: 'user', content: 'Generate a long document...' }],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
For UI applications (Next.js with Vercel AI SDK, React with streaming):
// Vercel AI SDK + Ollama
import { createOpenAI } from '@ai-sdk/openai';
import { streamText } from 'ai';
const ollama = createOpenAI({
baseURL: 'http://localhost:11434/v1',
apiKey: 'ollama',
});
// In your API route
const result = streamText({
model: ollama('your-fine-tuned-model'),
messages: [...],
});
return result.toDataStreamResponse();
The streaming format is identical to OpenAI's, so your existing streaming UI components work unchanged.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Further Reading
- Claude Desktop Local Model Setup — Using Ollama with Claude Desktop via MCP
- MCP Server Zero API Costs — The cost case for local inference
- LangChain + Fine-Tuned Local Model — LangChain integration in depth
- Bootstrap AI SaaS Without API Costs — The unit economics of local models
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

MCP + Fine-Tuned Local Model: Connect Claude to Your Domain-Specific AI
Model Context Protocol (MCP) lets Claude Desktop talk to any server — including your own Ollama-hosted fine-tuned model. Here's the architecture and setup for routing Claude requests to a custom domain model.

LangChain + Fine-Tuned Local Model: Build Pipelines Without API Costs
LangChain works with any OpenAI-compatible API — including Ollama. Replace the API calls in your LangChain pipelines with a fine-tuned local model. Same chain structure, zero per-token costs.

Replit App AI Costs Exploding? Replace OpenAI with a Fine-Tuned Local Model
Replit's always-on deployment and easy AI integration create a specific API cost problem. Here's how to replace OpenAI with a fine-tuned local model and cut costs to flat rate.