Text Generation Web UI + Ertas

    Load Ertas-trained GGUF models into oobabooga's Text Generation Web UI for advanced inference with multiple backends, character presets, extension support, and a Gradio-based interface.

    Overview

    Text Generation Web UI (commonly known as oobabooga) is one of the most feature-rich open-source interfaces for running large language models locally. Built on Gradio, it provides a browser-based UI with support for multiple inference backends including llama.cpp, ExLlamaV2, Transformers, and AutoGPTQ. The interface offers chat mode, instruct mode, notebook mode, and a comprehensive set of generation parameters, making it a powerful workbench for model evaluation, prompt engineering, and creative text generation.

    The tool's extension system adds capabilities like long-term memory, web search, voice input/output, multimodal vision, and API endpoints. For teams evaluating fine-tuned models, Text Generation Web UI's ability to load multiple models and switch between them in the same session makes it invaluable for A/B testing and quality comparison. Its rich parameter controls — including samplers, repetition penalties, and grammar constraints — allow thorough testing of model behavior across different generation configurations.

    How Ertas Integrates

    After completing a fine-tuning job in Ertas Studio, you can download the model in GGUF format and load it directly into Text Generation Web UI's llama.cpp backend. Place the GGUF file in the tool's models directory, select it from the Model tab, and configure the inference parameters. The UI automatically detects the model architecture and provides sensible defaults for context length, GPU layer offloading, and thread allocation based on the GGUF metadata embedded by Ertas during export.

    Text Generation Web UI is particularly valuable during the fine-tuning iteration cycle with Ertas. Its side-by-side comparison features let you load a base model and your fine-tuned version simultaneously, running the same prompts through both to directly observe the impact of training. The notebook mode provides a scratchpad for testing complex prompts, while the API extension exposes an OpenAI-compatible endpoint for automated evaluation scripts. This makes the tool an ideal complement to Ertas for teams that need thorough model evaluation before production deployment.

    Getting Started

    1. 1

      Fine-tune your model in Ertas Studio

      Configure and run your training job on the Ertas canvas with your JSONL dataset. Monitor loss curves and validation metrics throughout the training process.

    2. 2

      Export as GGUF

      Download your fine-tuned model in GGUF format from Ertas Studio. Choose a quantization level that matches your evaluation hardware.

    3. 3

      Place model in the models directory

      Copy the downloaded GGUF file into Text Generation Web UI's models/ directory. The tool scans this directory on startup and when you click Refresh in the Model tab.

    4. 4

      Load model with llama.cpp backend

      In the Model tab, select your model from the dropdown and choose the llama.cpp loader. Configure GPU layers, context size, and thread count, then click Load.

    5. 5

      Evaluate in chat and notebook modes

      Switch between chat mode for conversational testing and notebook mode for free-form prompt experimentation. Adjust sampling parameters to explore model behavior under different generation settings.

    6. 6

      Enable the API extension

      Activate the OpenAI-compatible API extension to serve your model over HTTP. Use this endpoint for automated evaluation scripts or to integrate with other development tools.

    bash
    # After downloading the GGUF model from Ertas Studio,
    # copy it to the text-generation-webui models directory
    cp ./my-model-Q4_K_M.gguf ./text-generation-webui/models/
    
    # Launch Text Generation Web UI with the API extension enabled
    cd text-generation-webui
    python server.py --model my-model-Q4_K_M.gguf \
      --loader llama.cpp \
      --n-gpu-layers 35 \
      --api \
      --listen
    
    # The web UI is available at http://localhost:7860
    # The API endpoint is available at http://localhost:5000
    Load your Ertas-exported GGUF model in Text Generation Web UI with the llama.cpp backend and API extension for evaluation and serving.

    Benefits

    • Multiple inference backends (llama.cpp, ExLlamaV2, Transformers) for flexibility
    • Side-by-side model comparison for evaluating fine-tuning improvements
    • Rich sampling parameter controls for thorough model behavior testing
    • Extension ecosystem with long-term memory, web search, and vision support
    • Notebook mode for free-form prompt engineering and experimentation
    • Browser-based UI accessible from any device on the local network

    Related Resources

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.