Ollama + Ertas
Deploy Ertas-trained models through Ollama for fast, private local inference with a simple CLI and OpenAI-compatible API.
Overview
Ollama simplifies local model deployment by packaging model weights, configuration, and runtime into a single streamlined tool. With a familiar CLI inspired by container workflows, Ollama lets developers pull and run large language models on their own hardware without configuring complex inference servers or managing GPU drivers manually. Its built-in OpenAI-compatible REST API means existing application code can switch to local inference with a single endpoint change.
For teams that have invested in fine-tuning custom models with Ertas, Ollama provides the fastest path from trained weights to a running inference endpoint. The combination of Ertas for training and Ollama for serving creates a fully local AI pipeline where sensitive data never leaves your infrastructure, making it ideal for regulated industries and privacy-conscious organizations.
How Ertas Integrates
After a training job completes in Ertas Studio, you can download your fine-tuned model in GGUF format directly from the platform — which Ollama natively supports. Ertas also provides a downloadable Modelfile with the correct template, system prompt, and quantization settings baked in, so you can register the model with Ollama in a single step. The download preserves chat templates, stop tokens, and any custom parameters you configured during training.
Once deployed, Ertas Cloud can monitor your Ollama instances for health, throughput, and latency metrics. You can manage multiple Ollama endpoints from the Ertas dashboard, route traffic between model versions for A/B testing, and roll back to previous checkpoints without restarting the server. This tight feedback loop between training and serving lets teams iterate on model quality with minimal operational overhead.
Getting Started
- 1
Download model in GGUF format
After fine-tuning in Ertas Studio, download the model in GGUF format with your preferred quantization level (Q4_K_M, Q5_K_M, Q8_0, or full precision) from the platform.
- 2
Download the Ollama Modelfile
Ertas provides a ready-made Modelfile alongside your GGUF download that includes the correct chat template, system prompt, and runtime parameters.
- 3
Register the model with Ollama
Run a single CLI command to create the Ollama model from the generated Modelfile and GGUF weights.
- 4
Start the inference server
Launch Ollama to serve your model locally. The OpenAI-compatible API is available immediately at localhost:11434.
- 5
Connect your application
Point your application to the local Ollama endpoint. Any OpenAI SDK or HTTP client works out of the box with no code changes beyond the base URL.
# After downloading the GGUF model and Modelfile from Ertas Studio,
# create an Ollama model from the downloaded files
ollama create my-model -f ./models/Modelfile
# Run the model locally
ollama run my-model "Summarize this patient report"
# Or use the OpenAI-compatible API
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my-model",
"messages": [{"role": "user", "content": "Hello"}]
}'Benefits
- Deploy fine-tuned models locally with a single CLI command
- OpenAI-compatible API for drop-in replacement in existing applications
- No data leaves your infrastructure during inference
- Automatic Modelfile generation with correct chat templates and parameters
- Support for multiple quantization levels to balance speed and quality
- Monitor Ollama instances from the Ertas Cloud dashboard
Related Resources
Fine-Tuning
GGUF
Inference
LoRA
Getting Started with Ertas: Fine-Tune and Deploy Custom AI Models
Privacy-Conscious AI Development: Fine-Tune in the Cloud, Run on Your Terms
Running AI Models Locally: The Complete Guide to Local LLM Inference
GDPR-Compliant AI: How to Use LLMs Without Sharing User Data
Self-Hosted AI for Indie Apps: Replace GPT-4 with Your Own Model
Hugging Face
Jan
llama.cpp
LM Studio
Open WebUI
Ertas for Healthcare
Ertas for Customer Support
Ertas for Legal
Ertas for Finance
Ertas for Indie Developers & Vibe-Coded Apps
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.