LM Studio Server API + Ertas

    Serve Ertas-trained models as local API endpoints using LM Studio's built-in server mode for application integration, development, and testing.

    Overview

    LM Studio is a desktop application for discovering, downloading, and running local language models. While it is widely known for its chat interface, LM Studio's server mode is equally powerful — it turns any loaded model into a fully functional OpenAI-compatible API server running on localhost. This local server mode exposes /v1/chat/completions, /v1/completions, and /v1/embeddings endpoints that are drop-in compatible with the OpenAI SDK, making it trivial to point any application from a cloud API to a local model.

    LM Studio's server mode is particularly valuable for development and testing workflows. Instead of burning API credits while iterating on prompts and application logic, developers can run their fine-tuned model locally through LM Studio and test against the same API contract they will use in production. The server provides request logging, performance metrics, and GPU utilization monitoring — giving developers visibility into how their model performs under different load patterns and context lengths. For teams that need a user-friendly way to serve models locally without managing Docker containers or CLI tools, LM Studio Server provides a one-click solution.

    How Ertas Integrates

    After fine-tuning a model in Ertas Studio, you download the GGUF file and load it directly into LM Studio. From there, enabling server mode is a single toggle — LM Studio immediately starts serving the model on a configurable port with full OpenAI API compatibility. Any application, framework, or tool that supports the OpenAI API can connect to your Ertas-trained model without code changes beyond updating the base URL.

    This integration path is especially useful during the development phase of AI applications. Teams can fine-tune multiple model variants in Ertas Studio — different base models, different LoRA configurations, different quantization levels — and quickly switch between them in LM Studio to compare outputs. LM Studio's conversation view lets you test the model interactively while the server mode simultaneously serves it to your application. Once you have identified the best model configuration, you can deploy it to a production inference server like vLLM or Ertas Cloud while keeping LM Studio as your local development and debugging tool.

    Getting Started

    1. 1

      Export your model from Ertas Studio

      Download the fine-tuned model in GGUF format from Ertas Studio. Choose the quantization level that balances quality and speed for your hardware.

    2. 2

      Load the model in LM Studio

      Open LM Studio and load your GGUF file. Configure the context length, GPU layers, and other inference parameters in the model settings panel.

    3. 3

      Enable server mode

      Toggle the server mode in LM Studio's server tab. The API server starts on localhost:1234 by default, exposing OpenAI-compatible endpoints.

    4. 4

      Connect your application

      Point your application to http://localhost:1234/v1 as the base URL. Use any OpenAI SDK or HTTP client — the API contract is identical to OpenAI's.

    5. 5

      Monitor and iterate

      Use LM Studio's built-in logging and metrics to monitor request latency, token throughput, and GPU utilization. Swap models without restarting the server to compare performance.

    typescript
    import OpenAI from "openai";
    
    // Connect to LM Studio's local server running your Ertas-trained model
    const client = new OpenAI({
      baseURL: "http://localhost:1234/v1",
      apiKey: "lm-studio", // LM Studio doesn't require a real key
    });
    
    async function analyzeContract(text: string) {
      const response = await client.chat.completions.create({
        model: "ertas-legal-7b",
        messages: [
          { role: "system", content: "You are a contract analyst. Extract key terms and obligations." },
          { role: "user", content: `Analyze this contract clause:\n\n${text}` },
        ],
        temperature: 0.1,
        max_tokens: 1024,
      });
    
      return response.choices[0].message.content;
    }
    
    // Works identically to calling OpenAI's API
    const analysis = await analyzeContract("The Licensee shall pay...");
    console.log(analysis);
    Use LM Studio's local server with the standard OpenAI TypeScript SDK to integrate your Ertas-trained model into any application.

    Benefits

    • One-click server mode with zero CLI or Docker configuration
    • Full OpenAI API compatibility for seamless application integration
    • Built-in request logging and performance metrics for debugging
    • Hot-swap models without restarting the server during development
    • GPU layer offloading controls for optimal performance on any hardware
    • Interactive chat and API server running simultaneously for testing

    Related Resources

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.