LM Studio + Ertas

Export fine-tuned GGUF models from Ertas Studio and load them into LM Studio for local inference with an intuitive chat interface, OpenAI-compatible API, and hardware-aware performance tuning.

Overview

LM Studio is a desktop application that makes running large language models locally as simple as using a native chat app. It provides a visual model browser, automatic hardware detection, and a built-in chat interface that rivals cloud-hosted AI assistants in usability. Under the hood, LM Studio uses llama.cpp for inference, supporting a wide range of GGUF-quantized models on CPUs, NVIDIA GPUs, AMD GPUs, and Apple Silicon with automatic GPU offloading and memory management.

Beyond the chat interface, LM Studio exposes a local OpenAI-compatible API server, enabling developers to build applications against their local models using the same SDKs and libraries they would use with cloud APIs. The combination of a user-friendly GUI for exploration and a developer-ready API for integration makes LM Studio one of the most versatile tools in the local AI ecosystem, serving both technical and non-technical users on the same team.

How Ertas Integrates

After fine-tuning a model in Ertas Studio, you can download the trained weights in GGUF format with your preferred quantization level. The exported GGUF file is fully self-contained with embedded tokenizer configuration and chat templates, so LM Studio recognizes the model's capabilities immediately upon import. Simply drag the downloaded GGUF file into LM Studio's models directory or use the file import dialog, and the model appears in the local model list ready for conversation.

This workflow creates a seamless bridge between cloud-based fine-tuning and local deployment. Teams can iterate on model quality in Ertas Studio using cloud GPUs, export the best checkpoint, and distribute the GGUF file to team members who run it locally in LM Studio without needing any ML infrastructure. Non-technical stakeholders can evaluate fine-tuned models through LM Studio's chat UI, providing feedback that informs the next training iteration in Ertas.

Getting Started

1
Fine-tune your model in Ertas Studio
Upload your JSONL training data to Ertas Studio, configure your training run on the visual canvas, and launch fine-tuning on managed cloud GPUs.
2
Export as GGUF
Once training completes, download the model in GGUF format. Choose a quantization level that matches your local hardware — Q4_K_M for most consumer machines, Q8_0 for higher quality on powerful hardware.
3
Import into LM Studio
Open LM Studio and drag the downloaded GGUF file into the models directory, or use File → Import Model. LM Studio detects the architecture, chat template, and parameters automatically.
4
Configure inference settings
Adjust context length, temperature, GPU layer offloading, and thread count in LM Studio's settings panel. LM Studio provides hardware-aware defaults based on your system's available memory and compute.
5
Chat and evaluate
Start a conversation with your fine-tuned model through LM Studio's chat interface. Test domain-specific prompts and compare outputs against your baseline to validate training quality.
6
Enable the local API server
Toggle on LM Studio's local server to expose an OpenAI-compatible endpoint at localhost:1234. Point your applications to this endpoint for fully local, private inference.

bash

# After downloading your GGUF model from Ertas Studio,
# copy it to LM Studio's models directory
cp ./my-model-Q4_K_M.gguf ~/.lmstudio/models/my-model/

# LM Studio auto-detects the model on next launch.
# Once loaded, the local API is available at:
curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-model",
    "messages": [{"role": "user", "content": "Summarize this report"}]
  }'

Copy your Ertas-exported GGUF model into LM Studio's models directory and query it via the local OpenAI-compatible API.

Benefits

Intuitive drag-and-drop model import with zero configuration required
Hardware-aware defaults that automatically optimize GPU offloading and threading
Built-in chat interface for non-technical team members to evaluate fine-tuned models
OpenAI-compatible local API server for seamless application integration
Cross-platform support for Windows, macOS, and Linux with Apple Silicon optimization
No command-line knowledge required for model deployment and testing