LocalAI + Ertas

Deploy fine-tuned models from Ertas through LocalAI's OpenAI-compatible API server, providing a self-hosted drop-in replacement for OpenAI that works with any application or library expecting the OpenAI API format.

Overview

LocalAI is an open-source, self-hosted API server that provides a drop-in replacement for the OpenAI API specification. It supports text generation, embeddings, audio transcription, image generation, and function calling — all through the same API endpoints and request formats that applications use to talk to OpenAI. This means any application, SDK, or tool built for the OpenAI API can be redirected to LocalAI by simply changing the base URL, with no code changes required.

LocalAI supports multiple inference backends including llama.cpp, whisper.cpp, and diffusion models, running on both CPU and GPU hardware. It handles model management, automatic GGUF downloading from Hugging Face, and serves multiple models concurrently. For organizations that want to migrate from cloud AI APIs to self-hosted models — for cost control, data privacy, or regulatory compliance — LocalAI provides the simplest path: keep your existing application code and swap the API endpoint to a server running on your own infrastructure.

How Ertas Integrates

Ertas Studio produces fine-tuned models optimized for your specific use case, and LocalAI makes those models instantly accessible to every tool and application in your stack that speaks the OpenAI API protocol. After fine-tuning a model on your domain data in Ertas — customer support conversations, coding patterns, document processing examples, or specialized content — you export it in GGUF format and configure it as a model in LocalAI. From that point, any application calling your LocalAI endpoint gets responses from your fine-tuned model.

This combination is particularly powerful for teams replacing OpenAI API usage with self-hosted fine-tuned models. Rather than rewriting application code, you deploy LocalAI with your Ertas-trained model and redirect API calls. Customer support bots, document processors, coding tools, and internal applications all continue working with their existing OpenAI client libraries — but the responses now come from a model specifically trained on your data, running on your hardware, with no per-token costs and complete data privacy. Ertas handles the intelligence customization, and LocalAI handles the seamless API compatibility.

Getting Started

1
Fine-tune a model for your use case in Ertas Studio
Curate a domain-specific dataset and fine-tune a model in Ertas Studio. Whether you're building a customer support bot, a coding assistant, or a content generation tool, train the model on examples that represent your quality standards.
2
Export the model in GGUF format
Export the fine-tuned model from Ertas in GGUF format with an appropriate quantization level. Choose Q4_K_M for memory-constrained environments or Q8_0 for maximum quality on hardware with sufficient RAM.
3
Configure LocalAI with your model
Install LocalAI and add your GGUF model to its models directory. Create a model configuration YAML file specifying context length, prompt template, and inference parameters matching your model's requirements.
4
Redirect existing applications to LocalAI
Update the base URL in your OpenAI client configurations to point to your LocalAI server. Applications using the openai Python package, Node.js SDK, or REST API calls will work without code changes — only the endpoint and model name need updating.
5
Scale and monitor your deployment
Monitor response latency and quality in production. Use LocalAI's multi-model support to serve different fine-tuned models for different tasks. When you improve a model in Ertas, swap the GGUF file to upgrade without changing any application code.

Benefits

Zero application code changes — drop-in replacement for OpenAI API endpoints
Complete data sovereignty with all inference running on your own infrastructure
No per-token API costs regardless of request volume or number of applications
Serve multiple fine-tuned models simultaneously for different use cases
Compatible with every OpenAI SDK, library, and tool in any programming language
Simple model upgrades — swap the GGUF file when new fine-tuned versions are ready