Ollama + Ertas

    Deploy Ertas-trained models through Ollama for fast, private local inference with a simple CLI and OpenAI-compatible API.

    Overview

    Ollama simplifies local model deployment by packaging model weights, configuration, and runtime into a single streamlined tool. With a familiar CLI inspired by container workflows, Ollama lets developers pull and run large language models on their own hardware without configuring complex inference servers or managing GPU drivers manually. Its built-in OpenAI-compatible REST API means existing application code can switch to local inference with a single endpoint change.

    For teams that have invested in fine-tuning custom models with Ertas, Ollama provides the fastest path from trained weights to a running inference endpoint. The combination of Ertas for training and Ollama for serving creates a fully local AI pipeline where sensitive data never leaves your infrastructure, making it ideal for regulated industries and privacy-conscious organizations.

    How Ertas Integrates

    After a training job completes in Ertas Studio, you can download your fine-tuned model in GGUF format directly from the platform — which Ollama natively supports. Ertas also provides a downloadable Modelfile with the correct template, system prompt, and quantization settings baked in, so you can register the model with Ollama in a single step. The download preserves chat templates, stop tokens, and any custom parameters you configured during training.

    Once deployed, Ertas Cloud can monitor your Ollama instances for health, throughput, and latency metrics. You can manage multiple Ollama endpoints from the Ertas dashboard, route traffic between model versions for A/B testing, and roll back to previous checkpoints without restarting the server. This tight feedback loop between training and serving lets teams iterate on model quality with minimal operational overhead.

    Getting Started

    1. 1

      Download model in GGUF format

      After fine-tuning in Ertas Studio, download the model in GGUF format with your preferred quantization level (Q4_K_M, Q5_K_M, Q8_0, or full precision) from the platform.

    2. 2

      Download the Ollama Modelfile

      Ertas provides a ready-made Modelfile alongside your GGUF download that includes the correct chat template, system prompt, and runtime parameters.

    3. 3

      Register the model with Ollama

      Run a single CLI command to create the Ollama model from the generated Modelfile and GGUF weights.

    4. 4

      Start the inference server

      Launch Ollama to serve your model locally. The OpenAI-compatible API is available immediately at localhost:11434.

    5. 5

      Connect your application

      Point your application to the local Ollama endpoint. Any OpenAI SDK or HTTP client works out of the box with no code changes beyond the base URL.

    bash
    # After downloading the GGUF model and Modelfile from Ertas Studio,
    # create an Ollama model from the downloaded files
    ollama create my-model -f ./models/Modelfile
    
    # Run the model locally
    ollama run my-model "Summarize this patient report"
    
    # Or use the OpenAI-compatible API
    curl http://localhost:11434/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "my-model",
        "messages": [{"role": "user", "content": "Hello"}]
      }'
    After downloading your GGUF model from Ertas Studio, deploy it locally through Ollama with full API compatibility.

    Benefits

    • Deploy fine-tuned models locally with a single CLI command
    • OpenAI-compatible API for drop-in replacement in existing applications
    • No data leaves your infrastructure during inference
    • Automatic Modelfile generation with correct chat templates and parameters
    • Support for multiple quantization levels to balance speed and quality
    • Monitor Ollama instances from the Ertas Cloud dashboard

    Related Resources

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.