Text Generation Web UI + Ertas
Load Ertas-trained GGUF models into oobabooga's Text Generation Web UI for advanced inference with multiple backends, character presets, extension support, and a Gradio-based interface.
Overview
Text Generation Web UI (commonly known as oobabooga) is one of the most feature-rich open-source interfaces for running large language models locally. Built on Gradio, it provides a browser-based UI with support for multiple inference backends including llama.cpp, ExLlamaV2, Transformers, and AutoGPTQ. The interface offers chat mode, instruct mode, notebook mode, and a comprehensive set of generation parameters, making it a powerful workbench for model evaluation, prompt engineering, and creative text generation.
The tool's extension system adds capabilities like long-term memory, web search, voice input/output, multimodal vision, and API endpoints. For teams evaluating fine-tuned models, Text Generation Web UI's ability to load multiple models and switch between them in the same session makes it invaluable for A/B testing and quality comparison. Its rich parameter controls — including samplers, repetition penalties, and grammar constraints — allow thorough testing of model behavior across different generation configurations.
How Ertas Integrates
After completing a fine-tuning job in Ertas Studio, you can download the model in GGUF format and load it directly into Text Generation Web UI's llama.cpp backend. Place the GGUF file in the tool's models directory, select it from the Model tab, and configure the inference parameters. The UI automatically detects the model architecture and provides sensible defaults for context length, GPU layer offloading, and thread allocation based on the GGUF metadata embedded by Ertas during export.
Text Generation Web UI is particularly valuable during the fine-tuning iteration cycle with Ertas. Its side-by-side comparison features let you load a base model and your fine-tuned version simultaneously, running the same prompts through both to directly observe the impact of training. The notebook mode provides a scratchpad for testing complex prompts, while the API extension exposes an OpenAI-compatible endpoint for automated evaluation scripts. This makes the tool an ideal complement to Ertas for teams that need thorough model evaluation before production deployment.
Getting Started
- 1
Fine-tune your model in Ertas Studio
Configure and run your training job on the Ertas canvas with your JSONL dataset. Monitor loss curves and validation metrics throughout the training process.
- 2
Export as GGUF
Download your fine-tuned model in GGUF format from Ertas Studio. Choose a quantization level that matches your evaluation hardware.
- 3
Place model in the models directory
Copy the downloaded GGUF file into Text Generation Web UI's models/ directory. The tool scans this directory on startup and when you click Refresh in the Model tab.
- 4
Load model with llama.cpp backend
In the Model tab, select your model from the dropdown and choose the llama.cpp loader. Configure GPU layers, context size, and thread count, then click Load.
- 5
Evaluate in chat and notebook modes
Switch between chat mode for conversational testing and notebook mode for free-form prompt experimentation. Adjust sampling parameters to explore model behavior under different generation settings.
- 6
Enable the API extension
Activate the OpenAI-compatible API extension to serve your model over HTTP. Use this endpoint for automated evaluation scripts or to integrate with other development tools.
# After downloading the GGUF model from Ertas Studio,
# copy it to the text-generation-webui models directory
cp ./my-model-Q4_K_M.gguf ./text-generation-webui/models/
# Launch Text Generation Web UI with the API extension enabled
cd text-generation-webui
python server.py --model my-model-Q4_K_M.gguf \
--loader llama.cpp \
--n-gpu-layers 35 \
--api \
--listen
# The web UI is available at http://localhost:7860
# The API endpoint is available at http://localhost:5000Benefits
- Multiple inference backends (llama.cpp, ExLlamaV2, Transformers) for flexibility
- Side-by-side model comparison for evaluating fine-tuning improvements
- Rich sampling parameter controls for thorough model behavior testing
- Extension ecosystem with long-term memory, web search, and vision support
- Notebook mode for free-form prompt engineering and experimentation
- Browser-based UI accessible from any device on the local network
Related Resources
Fine-Tuning
GGUF
Inference
LoRA
Quantization
Getting Started with Ertas: Fine-Tune and Deploy Custom AI Models
Introducing Ertas Studio: A Visual Canvas for Fine-Tuning AI Models
Self-Hosted AI for Indie Apps: Replace GPT-4 with Your Own Model
KoboldCpp
llama.cpp
Ollama
Ertas for SaaS Product Teams
Ertas for Customer Support
Ertas for ML Engineers & Fine-Tuning Practitioners
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.