Fine-Tune OpenChat with Ertas

    A 7-billion parameter model fine-tuned from Mistral 7B using Conditioned Reinforcement Learning Fine-Tuning (C-RLFT), achieving GPT-3.5-level performance through a novel mixed-quality data training approach.

    7BOpenChat

    Overview

    OpenChat 3.5 is an open-source language model developed by the OpenChat team, fine-tuned from Mistral 7B using an innovative training methodology called Conditioned Reinforcement Learning Fine-Tuning (C-RLFT). Released in November 2023, OpenChat 3.5 was one of the first 7B models to achieve performance comparable to ChatGPT (GPT-3.5 Turbo) on standardized benchmarks, a significant milestone for open-source language models.

    The key innovation behind OpenChat is its approach to handling mixed-quality training data. Rather than requiring all training data to be high quality (expensive to curate) or treating all data equally (which degrades quality), C-RLFT assigns quality conditions to different data sources and trains the model to generate responses conditioned on the desired quality level. During inference, the model is conditioned to produce the highest quality output, effectively learning to distinguish and produce superior responses.

    OpenChat 3.5 was trained on a mixture of data sources including ShareGPT conversations, open-source instruction datasets, and code datasets. The C-RLFT approach allows the model to benefit from all this data — even lower-quality examples — while still generating high-quality outputs when prompted with the appropriate quality conditioning.

    The model uses Mistral 7B's architecture, inheriting sliding window attention, grouped-query attention, and a 32K token context window. It is released under the Apache 2.0 license, making it freely available for commercial use.

    Key Features

    Conditioned Reinforcement Learning Fine-Tuning (C-RLFT) is OpenChat's core methodological contribution. The technique prepends quality-condition tokens to training examples based on their estimated quality level. High-quality examples (e.g., GPT-4-generated responses) receive a positive quality condition, while lower-quality examples receive a different condition. During training, the model learns the association between quality conditions and response quality. At inference time, always using the positive quality condition prompts the model to generate its best responses.

    This approach solves a practical problem: high-quality training data is scarce and expensive, but large amounts of mixed-quality data are readily available. C-RLFT allows models to learn from all available data without being contaminated by lower-quality examples, effectively increasing the useful training data volume without sacrificing output quality.

    OpenChat 3.5 achieves particularly strong performance on conversational and reasoning benchmarks. On MT-Bench (a multi-turn conversation benchmark), it scores competitively with GPT-3.5 and significantly above other 7B models. The model also performs well on coding tasks and mathematical reasoning, benefiting from code and math data included in the training mixture.

    Fine-Tuning with Ertas

    OpenChat 3.5 is an excellent base model for fine-tuning in Ertas Studio. Built on Mistral 7B, it requires the same modest resources: 8-10GB VRAM for QLoRA fine-tuning, achievable on consumer GPUs including the RTX 3080, RTX 4070 Ti, or Apple Silicon with 16GB unified memory. The model's pre-existing high-quality alignment means less fine-tuning data is needed to achieve good results.

    Since OpenChat is already trained to produce high-quality responses, domain-specific fine-tuning in Ertas Studio serves to specialize rather than align the model. Upload your domain dataset, select OpenChat 3.5 as the base model, and configure LoRA parameters. Recommended settings include LoRA rank 16-32, learning rate 2e-4, and 2-3 training epochs. Small datasets of 2,000-10,000 examples typically produce strong results.

    After training, export to GGUF for deployment through Ollama or llama.cpp. OpenChat's 7B size means the resulting models are compact and fast — approximately 4.4GB at Q4_K_M — suitable for deployment on edge devices, laptops, and cost-sensitive cloud instances. The combination of GPT-3.5-class quality with 7B-class resource requirements makes fine-tuned OpenChat models exceptionally cost-effective.

    Use Cases

    OpenChat excels as a general-purpose conversational assistant where cost efficiency matters. Its GPT-3.5-class quality at 7B-class resource requirements makes it ideal for applications that previously required API access to proprietary models: customer-facing chatbots, content generation tools, email and writing assistants, and interactive help systems.

    The model is particularly well-suited for startups and small businesses that want to deploy conversational AI without ongoing API costs. A fine-tuned OpenChat model running locally through Ollama provides unlimited inference at zero marginal cost, with quality sufficient for most conversational applications.

    OpenChat also serves as a strong baseline for comparing training methodologies. The C-RLFT approach is applicable to other base models, and researchers use OpenChat as a reference implementation when developing new fine-tuning techniques. The model's reproducible training pipeline and clear documentation make it valuable for academic research on alignment and fine-tuning strategies.

    Hardware Requirements

    OpenChat 3.5 has identical hardware requirements to Mistral 7B. At Q4_K_M quantization, it requires approximately 4.4GB of RAM — comfortable on any machine with 8GB RAM, any GPU with 6GB+ VRAM, and Apple Silicon with 8GB unified memory. At Q8_0, expect approximately 7.7GB. Full FP16 requires approximately 14.5GB VRAM.

    Inference performance is excellent: 50-70 tokens per second on RTX 4090, 20-35 tokens per second on RTX 4070, and 15-25 tokens per second on M2 Pro. CPU inference yields 5-12 tokens per second on modern hardware, making OpenChat usable without a dedicated GPU for lighter workloads.

    For fine-tuning with QLoRA in Ertas Studio, 8-10GB VRAM is sufficient. Full LoRA requires 16-18GB. The model's small size enables rapid training iteration — expect 30-90 minutes for a complete fine-tuning run with 5,000-10,000 examples on a single consumer GPU. This fast turnaround supports iterative development workflows where you refine the dataset and retrain multiple times.

    Supported Quantizations

    Q4_0Q4_K_MQ5_K_MQ6_KQ8_0F16

    Related Resources

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.