Ollama's OpenAI-Compatible API: Drop Your Fine-Tuned Model Into Any OpenAI Integration

Ollama exposes an OpenAI-compatible REST API at http://localhost:11434/v1. This means every library, framework, and application that integrates with OpenAI can point at your local fine-tuned model with a one-line change.

No new SDKs. No API wrapper code. Just change the baseURL and the model name.

What "OpenAI-Compatible" Actually Means

Ollama implements the OpenAI Chat Completions API format:

POST /v1/chat/completions — the primary endpoint
Request body format identical to OpenAI's (model, messages, temperature, max_tokens, stream, etc.)
Response format identical (choices, message.content, usage, etc.)

Not every OpenAI feature is implemented. Supported:

Chat completions (most important)
Streaming via stream: true
Embeddings via POST /v1/embeddings (select models)
Model listing via GET /v1/models

Not supported:

Fine-tuning via API (you do this in Ertas instead)
Image generation
Voice/audio APIs
Assistants API (the stateful threads/runs interface)
OpenAI-specific features like moderation or JSON mode with strict schema

For inference (the vast majority of use cases), the compatibility is complete.

The One-Line Migration

JavaScript / Node.js:

// Before (OpenAI cloud)
import OpenAI from 'openai';
const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

// After (Ollama local model) — one change
import OpenAI from 'openai';
const client = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama' // Required field but not validated by Ollama
});

// Your generation code is UNCHANGED
const response = await client.chat.completions.create({
  model: 'your-fine-tuned-model', // This changes to your Ollama model name
  messages: [
    { role: 'user', content: 'Your prompt here' }
  ]
});

console.log(response.choices[0].message.content);

Python:

# Before
from openai import OpenAI
client = OpenAI(api_key="sk-...")

# After — one change
from openai import OpenAI
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # Not validated
)

# Same generation code
response = client.chat.completions.create(
    model="your-fine-tuned-model",
    messages=[{"role": "user", "content": "Your prompt"}]
)
print(response.choices[0].message.content)

Curl:

# Before
curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'

# After — change URL and model
curl http://localhost:11434/v1/chat/completions \
  -H "Authorization: Bearer ollama" \
  -H "Content-Type: application/json" \
  -d '{"model": "your-fine-tuned-model", "messages": [{"role": "user", "content": "Hello"}]}'

Frameworks and Libraries That Work Immediately

Because Ollama is OpenAI-compatible, all of these work with zero code changes beyond the baseURL:

Framework / Library	Configuration
LangChain (JS/Python)	`ChatOpenAI({ baseUrl: "http://localhost:11434/v1" })`
LlamaIndex	`OpenAI(api_base="http://localhost:11434/v1")`
Vercel AI SDK	`createOpenAI({ baseURL: "http://localhost:11434/v1" })`
OpenAI Agents SDK	Set `OPENAI_BASE_URL` environment variable
Instructor	Pass OpenAI client with Ollama baseURL
DSPy	`lm = dspy.LM("ollama/your-model")`
Semantic Kernel	OpenAI connector with custom endpoint
Flowise	OpenAI node with base path override
n8n	OpenAI node with `baseURL` override

Most tools that support "OpenAI with custom base URL" work. Most tools that have hardcoded OpenAI URLs do not.

Remote Ollama Server

When your Ollama runs on a VPS (not localhost), you need to expose it:

On the VPS:

# Ollama listens on 0.0.0.0 by default when OLLAMA_HOST is set
OLLAMA_HOST=0.0.0.0:11434 ollama serve

Security note: Never expose Ollama's port directly to the internet. Put Nginx in front with basic auth or API key validation:

server {
    listen 443 ssl;
    server_name ollama.yourdomain.com;

    location /v1/ {
        # Simple API key check
        if ($http_authorization != "Bearer your-secret-key") {
            return 401 '{"error": "Unauthorized"}';
        }
        proxy_pass http://localhost:11434/v1/;
    }
}

Then in your code:

const client = new OpenAI({
  baseURL: 'https://ollama.yourdomain.com/v1',
  apiKey: 'your-secret-key'
});

This gives you a fully secured, remotely accessible fine-tuned model API that is OpenAI-compatible. Deploy your Ollama server once, use it from any client.

Streaming Responses

Streaming works identically to OpenAI:

const stream = await client.chat.completions.create({
  model: 'your-fine-tuned-model',
  messages: [{ role: 'user', content: 'Generate a long document...' }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

For UI applications (Next.js with Vercel AI SDK, React with streaming):

// Vercel AI SDK + Ollama
import { createOpenAI } from '@ai-sdk/openai';
import { streamText } from 'ai';

const ollama = createOpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama',
});

// In your API route
const result = streamText({
  model: ollama('your-fine-tuned-model'),
  messages: [...],
});

return result.toDataStreamResponse();

The streaming format is identical to OpenAI's, so your existing streaming UI components work unchanged.

Ship AI that runs on your users' devices.

Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →