GPT4All + Ertas
Fine-tune models in Ertas Studio and deploy them through GPT4All for private, offline inference with a desktop chat interface, local document retrieval, and Python API.
Overview
GPT4All, developed by Nomic AI, is a privacy-focused desktop application for running large language models entirely on consumer hardware. It supports GGUF models out of the box and provides a curated model library, a chat interface, and a local document retrieval system called LocalDocs that lets you chat with your own files without uploading them anywhere. GPT4All runs on Windows, macOS, and Linux, with optimized inference for CPUs and Apple Silicon — making it one of the most accessible entry points for local AI.
Beyond the desktop app, GPT4All offers a Python SDK that mirrors the OpenAI API, enabling developers to integrate local inference into scripts, pipelines, and applications with minimal code changes. The combination of a user-friendly GUI for everyday use and a developer-ready API for automation makes GPT4All a versatile deployment target for fine-tuned models, particularly in organizations where data privacy is a hard requirement.
How Ertas Integrates
After fine-tuning a model in Ertas Studio, you can export it in GGUF format and load it directly into GPT4All. The process is simple: download the GGUF file from Ertas, place it in GPT4All's models directory, and the model appears in the app's model selector. GPT4All reads the embedded GGUF metadata to configure the chat template and inference parameters, so no manual setup is required. You can then use the model through the chat interface or the Python API for programmatic access.
This workflow is especially powerful when combined with GPT4All's LocalDocs feature. Fine-tune a domain-specific model in Ertas — for example, a medical terminology model or a legal analysis model — then pair it with relevant local documents in GPT4All for retrieval-augmented generation. The model's domain expertise from Ertas fine-tuning combines with real-time document context from LocalDocs, delivering highly accurate responses while keeping all data on the user's machine.
Getting Started
- 1
Fine-tune in Ertas Studio
Upload your JSONL dataset to Ertas Studio and run a fine-tuning job using LoRA or QLoRA. Monitor training metrics and evaluate against your validation set before exporting.
- 2
Export as GGUF
Download the trained model in GGUF format from Ertas Studio. For GPT4All on CPU-only machines, Q4_K_M quantization offers the best balance of speed and quality.
- 3
Add model to GPT4All
Place the downloaded GGUF file in GPT4All's models directory (typically ~/.local/share/nomic.ai/GPT4All/ on Linux or the equivalent on your OS). The model appears in the model selector on next launch.
- 4
Configure LocalDocs (optional)
Point GPT4All's LocalDocs feature at a folder of relevant documents to enable retrieval-augmented generation alongside your fine-tuned model's domain knowledge.
- 5
Chat or use the Python API
Interact with your fine-tuned model through the desktop chat interface, or use the GPT4All Python SDK to integrate local inference into your applications and scripts.
# After downloading the GGUF model from Ertas Studio,
# use the GPT4All Python SDK for local inference
from gpt4all import GPT4All
# Point to your Ertas-exported GGUF model
model = GPT4All(
model_name="my-model-Q4_K_M.gguf",
model_path="./models/",
allow_download=False,
)
with model.chat_session():
response = model.generate(
"Summarize the key findings from this clinical trial",
max_tokens=512,
temp=0.7,
)
print(response)Benefits
- Privacy-first design ensures all data stays on-device during inference
- LocalDocs feature combines fine-tuned model expertise with document retrieval
- Python SDK with OpenAI-compatible interface for easy application integration
- Curated model library for quick benchmarking against your fine-tuned model
- Lightweight and optimized for CPU inference on consumer hardware
- Cross-platform support for Windows, macOS, and Linux
Related Resources
Fine-Tuning
GGUF
Inference
QLoRA
Quantization
Getting Started with Ertas: Fine-Tune and Deploy Custom AI Models
Privacy-Conscious AI Development: Fine-Tune in the Cloud, Run on Your Terms
Self-Hosted AI for Indie Apps: Replace GPT-4 with Your Own Model
Jan
llama.cpp
LM Studio
Ollama
Ertas for Healthcare
Ertas for Customer Support
Ertas for Indie Developers & Vibe-Coded Apps
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.