Fine-Tune Yi with Ertas

01.AI's bilingual Chinese-English model family available in 6B, 9B, and 34B sizes, known for strong performance on both Chinese and English benchmarks with excellent instruction-following capabilities.

6B9B34B01.AI

Overview

Yi is a family of bilingual large language models developed by 01.AI, the AI company founded by Dr. Kai-Fu Lee. The Yi series was among the first Chinese-developed open-weight models to achieve globally competitive performance, consistently ranking near the top of independent benchmarks like the Open LLM Leaderboard and the Chatbot Arena.

The current generation includes Yi-1.5 models in 6B, 9B, and 34B sizes, trained on approximately 3.6 trillion tokens of high-quality multilingual data with a strong emphasis on Chinese and English content. The 34B model, in particular, punches well above its weight — it frequently outperforms 70B-class models on Chinese language tasks and competes strongly with them on English tasks as well.

Architecturally, Yi uses a standard dense transformer decoder with grouped-query attention, SwiGLU activations, and RoPE positional embeddings. The models support a 200K token context window through YaRN-based context extension, enabling processing of extremely long documents — one of the longest context windows available in the sub-40B parameter class.

Yi models are released under the Apache 2.0 license (for Yi-1.5), making them fully available for commercial use without restrictions. The models have been particularly popular in Chinese-speaking markets and among developers building bilingual applications serving Chinese and English users.

Key Features

Bilingual excellence is Yi's defining strength. The models were trained with a carefully balanced mixture of Chinese and English data, producing models that are genuinely fluent in both languages rather than being primarily English-centric with Chinese as an afterthought. The tokenizer uses a 64K vocabulary optimized for efficient encoding of both Chinese characters and English text, achieving strong tokenization efficiency in both languages.

The 200K token context window is exceptional for models in this size range. This enables processing of book-length Chinese documents, extensive code repositories, and very long conversation histories. The YaRN-based scaling approach maintains quality even at extreme context lengths, making Yi a strong choice for document-heavy applications.

Yi demonstrates particularly strong performance on tasks requiring cultural understanding and nuanced language use. Chinese language tasks often involve cultural context, idiomatic expressions, and stylistic conventions that English-centric models handle poorly. Yi's training data includes extensive Chinese literary, technical, and conversational content, producing responses that feel natural and culturally appropriate.

Fine-Tuning with Ertas

Yi models are popular fine-tuning targets in Ertas Studio, especially for building bilingual Chinese-English applications. The 6B model requires 6-10GB VRAM with QLoRA, the 9B needs 8-12GB, and the 34B needs 20-24GB — all accessible on standard GPU hardware. The 9B model offers a particularly sweet spot for bilingual fine-tuning, providing strong quality with moderate resource requirements.

For bilingual fine-tuning, prepare your dataset with examples in both Chinese and English. Ertas Studio's data processing pipeline handles the mixed-language tokenization automatically. The Yi tokenizer's balanced vocabulary means both languages train efficiently without one dominating the gradient updates. Include a mix of Chinese-only, English-only, and cross-language tasks (such as translation or bilingual summarization) for the best results.

After training, export to GGUF format for deployment. The Yi 34B at Q4_K_M quantization produces a model of approximately 20GB that delivers exceptional bilingual capability — competitive with much larger models on Chinese tasks. Deploy through Ollama or llama.cpp, both of which support Yi's chat template natively.

Use Cases

Yi is the top choice for applications serving Chinese-speaking users or requiring bilingual Chinese-English capability. Customer service platforms, content generation systems, and conversational AI for the Chinese market all benefit from Yi's natural Chinese fluency. The model understands Chinese cultural context, business etiquette, and communication styles in ways that most Western-developed models do not.

Bilingual applications are a major use case: translation between Chinese and English, cross-language information retrieval, bilingual content creation, and international business communication tools. Fine-tuned Yi models can serve as interpreters that understand domain-specific terminology in both languages.

The 200K context window makes Yi especially valuable for Chinese document processing: analyzing lengthy government documents, legal contracts, technical manuals, and literary works. Combined with RAG systems, Yi can serve as an intelligent assistant for Chinese-language knowledge bases, research archives, and enterprise document management systems.

Hardware Requirements

Yi 6B at Q4_K_M quantization requires approximately 3.8GB of RAM, suitable for laptops and consumer GPUs. The 9B model needs about 5.5GB, and the 34B needs about 20GB. The 34B model at Q4_K_M runs well on RTX 4090 24GB or Apple M-series Macs with 32GB unified memory, delivering 15-25 tokens per second.

At Q8_0 quantization, the 6B needs about 6.5GB, the 9B about 9.7GB, and the 34B about 36GB. Full FP16 inference for the 34B requires approximately 68GB VRAM, fitting on a single A100 80GB. The 6B and 9B models at FP16 require 12GB and 18GB respectively, easily accommodated by consumer GPUs.

For fine-tuning in Ertas Studio, the 6B needs 6-10GB VRAM (QLoRA), the 9B needs 8-12GB, and the 34B needs 20-24GB. The 34B model, despite its higher resource requirements, is highly recommended for production bilingual applications due to its significant quality advantage over the smaller variants.

Supported Quantizations

Q4_0Q4_K_MQ5_K_MQ6_KQ8_0F16

Related Resources

Integration

llama.cpp

Integration

LM Studio

Integration

Ollama

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →