LM Studio + Ertas

    Export fine-tuned GGUF models from Ertas Studio and load them into LM Studio for local inference with an intuitive chat interface, OpenAI-compatible API, and hardware-aware performance tuning.

    Overview

    LM Studio is a desktop application that makes running large language models locally as simple as using a native chat app. It provides a visual model browser, automatic hardware detection, and a built-in chat interface that rivals cloud-hosted AI assistants in usability. Under the hood, LM Studio uses llama.cpp for inference, supporting a wide range of GGUF-quantized models on CPUs, NVIDIA GPUs, AMD GPUs, and Apple Silicon with automatic GPU offloading and memory management.

    Beyond the chat interface, LM Studio exposes a local OpenAI-compatible API server, enabling developers to build applications against their local models using the same SDKs and libraries they would use with cloud APIs. The combination of a user-friendly GUI for exploration and a developer-ready API for integration makes LM Studio one of the most versatile tools in the local AI ecosystem, serving both technical and non-technical users on the same team.

    How Ertas Integrates

    After fine-tuning a model in Ertas Studio, you can download the trained weights in GGUF format with your preferred quantization level. The exported GGUF file is fully self-contained with embedded tokenizer configuration and chat templates, so LM Studio recognizes the model's capabilities immediately upon import. Simply drag the downloaded GGUF file into LM Studio's models directory or use the file import dialog, and the model appears in the local model list ready for conversation.

    This workflow creates a seamless bridge between cloud-based fine-tuning and local deployment. Teams can iterate on model quality in Ertas Studio using cloud GPUs, export the best checkpoint, and distribute the GGUF file to team members who run it locally in LM Studio without needing any ML infrastructure. Non-technical stakeholders can evaluate fine-tuned models through LM Studio's chat UI, providing feedback that informs the next training iteration in Ertas.

    Getting Started

    1. 1

      Fine-tune your model in Ertas Studio

      Upload your JSONL training data to Ertas Studio, configure your training run on the visual canvas, and launch fine-tuning on managed cloud GPUs.

    2. 2

      Export as GGUF

      Once training completes, download the model in GGUF format. Choose a quantization level that matches your local hardware — Q4_K_M for most consumer machines, Q8_0 for higher quality on powerful hardware.

    3. 3

      Import into LM Studio

      Open LM Studio and drag the downloaded GGUF file into the models directory, or use File → Import Model. LM Studio detects the architecture, chat template, and parameters automatically.

    4. 4

      Configure inference settings

      Adjust context length, temperature, GPU layer offloading, and thread count in LM Studio's settings panel. LM Studio provides hardware-aware defaults based on your system's available memory and compute.

    5. 5

      Chat and evaluate

      Start a conversation with your fine-tuned model through LM Studio's chat interface. Test domain-specific prompts and compare outputs against your baseline to validate training quality.

    6. 6

      Enable the local API server

      Toggle on LM Studio's local server to expose an OpenAI-compatible endpoint at localhost:1234. Point your applications to this endpoint for fully local, private inference.

    bash
    # After downloading your GGUF model from Ertas Studio,
    # copy it to LM Studio's models directory
    cp ./my-model-Q4_K_M.gguf ~/.lmstudio/models/my-model/
    
    # LM Studio auto-detects the model on next launch.
    # Once loaded, the local API is available at:
    curl http://localhost:1234/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "my-model",
        "messages": [{"role": "user", "content": "Summarize this report"}]
      }'
    Copy your Ertas-exported GGUF model into LM Studio's models directory and query it via the local OpenAI-compatible API.

    Benefits

    • Intuitive drag-and-drop model import with zero configuration required
    • Hardware-aware defaults that automatically optimize GPU offloading and threading
    • Built-in chat interface for non-technical team members to evaluate fine-tuned models
    • OpenAI-compatible local API server for seamless application integration
    • Cross-platform support for Windows, macOS, and Linux with Apple Silicon optimization
    • No command-line knowledge required for model deployment and testing

    Related Resources

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.