Back to blog
    Ollama's OpenAI-Compatible API: Drop Your Fine-Tuned Model Into Any OpenAI Integration
    ollamaopenai-apilocal-modelintegrationfine-tuningsegment:vibecoder

    Ollama's OpenAI-Compatible API: Drop Your Fine-Tuned Model Into Any OpenAI Integration

    Ollama exposes an OpenAI-compatible REST API. Any code written for the OpenAI SDK — Langchain, LlamaIndex, your own app — works with your fine-tuned local model by changing one URL. Here's what to know.

    EErtas Team·

    Ollama exposes an OpenAI-compatible REST API at http://localhost:11434/v1. This means every library, framework, and application that integrates with OpenAI can point at your local fine-tuned model with a one-line change.

    No new SDKs. No API wrapper code. Just change the baseURL and the model name.

    What "OpenAI-Compatible" Actually Means

    Ollama implements the OpenAI Chat Completions API format:

    • POST /v1/chat/completions — the primary endpoint
    • Request body format identical to OpenAI's (model, messages, temperature, max_tokens, stream, etc.)
    • Response format identical (choices, message.content, usage, etc.)

    Not every OpenAI feature is implemented. Supported:

    • Chat completions (most important)
    • Streaming via stream: true
    • Embeddings via POST /v1/embeddings (select models)
    • Model listing via GET /v1/models

    Not supported:

    • Fine-tuning via API (you do this in Ertas instead)
    • Image generation
    • Voice/audio APIs
    • Assistants API (the stateful threads/runs interface)
    • OpenAI-specific features like moderation or JSON mode with strict schema

    For inference (the vast majority of use cases), the compatibility is complete.

    The One-Line Migration

    JavaScript / Node.js:

    // Before (OpenAI cloud)
    import OpenAI from 'openai';
    const client = new OpenAI({
      apiKey: process.env.OPENAI_API_KEY
    });
    
    // After (Ollama local model) — one change
    import OpenAI from 'openai';
    const client = new OpenAI({
      baseURL: 'http://localhost:11434/v1',
      apiKey: 'ollama' // Required field but not validated by Ollama
    });
    
    // Your generation code is UNCHANGED
    const response = await client.chat.completions.create({
      model: 'your-fine-tuned-model', // This changes to your Ollama model name
      messages: [
        { role: 'user', content: 'Your prompt here' }
      ]
    });
    
    console.log(response.choices[0].message.content);
    

    Python:

    # Before
    from openai import OpenAI
    client = OpenAI(api_key="sk-...")
    
    # After — one change
    from openai import OpenAI
    client = OpenAI(
        base_url="http://localhost:11434/v1",
        api_key="ollama"  # Not validated
    )
    
    # Same generation code
    response = client.chat.completions.create(
        model="your-fine-tuned-model",
        messages=[{"role": "user", "content": "Your prompt"}]
    )
    print(response.choices[0].message.content)
    

    Curl:

    # Before
    curl https://api.openai.com/v1/chat/completions \
      -H "Authorization: Bearer $OPENAI_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
    
    # After — change URL and model
    curl http://localhost:11434/v1/chat/completions \
      -H "Authorization: Bearer ollama" \
      -H "Content-Type: application/json" \
      -d '{"model": "your-fine-tuned-model", "messages": [{"role": "user", "content": "Hello"}]}'
    

    Frameworks and Libraries That Work Immediately

    Because Ollama is OpenAI-compatible, all of these work with zero code changes beyond the baseURL:

    Framework / LibraryConfiguration
    LangChain (JS/Python)ChatOpenAI({ baseUrl: "http://localhost:11434/v1" })
    LlamaIndexOpenAI(api_base="http://localhost:11434/v1")
    Vercel AI SDKcreateOpenAI({ baseURL: "http://localhost:11434/v1" })
    OpenAI Agents SDKSet OPENAI_BASE_URL environment variable
    InstructorPass OpenAI client with Ollama baseURL
    DSPylm = dspy.LM("ollama/your-model")
    Semantic KernelOpenAI connector with custom endpoint
    FlowiseOpenAI node with base path override
    n8nOpenAI node with baseURL override

    Most tools that support "OpenAI with custom base URL" work. Most tools that have hardcoded OpenAI URLs do not.

    Remote Ollama Server

    When your Ollama runs on a VPS (not localhost), you need to expose it:

    On the VPS:

    # Ollama listens on 0.0.0.0 by default when OLLAMA_HOST is set
    OLLAMA_HOST=0.0.0.0:11434 ollama serve
    

    Security note: Never expose Ollama's port directly to the internet. Put Nginx in front with basic auth or API key validation:

    server {
        listen 443 ssl;
        server_name ollama.yourdomain.com;
    
        location /v1/ {
            # Simple API key check
            if ($http_authorization != "Bearer your-secret-key") {
                return 401 '{"error": "Unauthorized"}';
            }
            proxy_pass http://localhost:11434/v1/;
        }
    }
    

    Then in your code:

    const client = new OpenAI({
      baseURL: 'https://ollama.yourdomain.com/v1',
      apiKey: 'your-secret-key'
    });
    

    This gives you a fully secured, remotely accessible fine-tuned model API that is OpenAI-compatible. Deploy your Ollama server once, use it from any client.

    Streaming Responses

    Streaming works identically to OpenAI:

    const stream = await client.chat.completions.create({
      model: 'your-fine-tuned-model',
      messages: [{ role: 'user', content: 'Generate a long document...' }],
      stream: true
    });
    
    for await (const chunk of stream) {
      process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
    }
    

    For UI applications (Next.js with Vercel AI SDK, React with streaming):

    // Vercel AI SDK + Ollama
    import { createOpenAI } from '@ai-sdk/openai';
    import { streamText } from 'ai';
    
    const ollama = createOpenAI({
      baseURL: 'http://localhost:11434/v1',
      apiKey: 'ollama',
    });
    
    // In your API route
    const result = streamText({
      model: ollama('your-fine-tuned-model'),
      messages: [...],
    });
    
    return result.toDataStreamResponse();
    

    The streaming format is identical to OpenAI's, so your existing streaming UI components work unchanged.


    Ship AI that runs on your users' devices.

    Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Further Reading

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading