Fine-Tune Command R with Ertas

Cohere's enterprise-focused model family in 35B and 104B sizes, purpose-built for retrieval-augmented generation (RAG) with native citation support, tool use, and multilingual capability across 10+ languages.

35B104BCohere

Overview

Command R is Cohere's family of open-weight enterprise models, designed specifically for retrieval-augmented generation (RAG) and production deployment scenarios. The family includes Command R (35B parameters) and Command R+ (104B parameters), both optimized for tasks that involve grounding model outputs in retrieved documents — a critical requirement for enterprise AI applications where accuracy and traceability are paramount.

Unlike general-purpose models that treat RAG as an afterthought, Command R was architected from the ground up for grounded generation. The models include native citation capabilities — when generating responses based on provided documents, Command R automatically produces inline citations pointing to the specific source passages that support each claim. This built-in grounding mechanism significantly reduces hallucination and provides users with verifiable references.

Command R supports a 128K token context window, enabling processing of many retrieved documents simultaneously. The model was trained on data spanning 10+ languages with particular strength in English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, and Chinese. The 35B model offers an excellent balance of quality and efficiency for production RAG systems.

Both models are released under the CC-BY-NC license for research and non-commercial use, with a separate commercial license available from Cohere. The models have found strong adoption in enterprise environments where RAG quality, citation accuracy, and multilingual support are critical requirements.

Key Features

Native citation generation is Command R's most distinctive feature. When provided with a set of source documents and a query, the model generates responses with inline citations that reference specific passages from the provided documents. This is not a post-processing step — the model was trained to produce citations as an integral part of its generation process, resulting in more accurate and natural citation placement than bolt-on citation systems.

Tool use is deeply integrated into Command R's capabilities. The model can plan multi-step tool interactions, handle tool call results, and synthesize information from multiple tool calls into coherent responses. This is designed for enterprise workflows where the model needs to interact with databases, APIs, search engines, and other business systems.

The grounded generation pipeline supports a specific input format where documents are provided alongside the user query. The model processes both the query and documents, generates a response grounded in the provided information, and produces structured citation metadata alongside the response text. This structured output simplifies integration with enterprise applications that need to display citations and link back to source documents.

Fine-Tuning with Ertas

Command R (35B) is a practical fine-tuning target in Ertas Studio, particularly for organizations building custom RAG systems. QLoRA fine-tuning requires approximately 20-28GB VRAM, achievable on an RTX 4090 24GB (tight) or A6000 48GB (comfortable). The 104B Command R+ requires approximately 60-70GB VRAM with QLoRA, fitting on A100 80GB.

For RAG-focused fine-tuning, prepare your dataset with examples that include source documents, queries, and grounded responses with citations. Ertas Studio supports this structured format, allowing you to fine-tune Command R to cite your organization's specific document types — internal knowledge bases, product documentation, legal documents, or technical manuals. The model's existing citation capability means even small fine-tuning datasets (1,000-5,000 examples) can significantly improve citation accuracy for your specific domain.

After training, export to GGUF format for local deployment. Command R 35B at Q4_K_M produces a model of approximately 20GB. Deploy through Ollama or llama.cpp and integrate with your RAG pipeline. The local deployment ensures that sensitive enterprise documents never leave your infrastructure while benefiting from high-quality grounded generation.

Use Cases

Command R is the premier model for enterprise RAG applications where citation accuracy and document grounding are non-negotiable. Legal firms use it to generate research memos with citations to case law and statutes. Healthcare organizations use it to produce clinical summaries grounded in patient records and medical literature. Financial institutions use it to generate analyst reports with citations to source data and regulatory filings.

Customer support systems benefit from Command R's grounded generation — the model can answer customer questions based on product documentation and knowledge bases, providing citations that support agents can verify. This reduces hallucination risk in customer-facing applications and provides an audit trail for compliance.

Multilingual enterprise deployments are another strong use case. Organizations operating across language regions can use a single Command R deployment to handle RAG queries in 10+ languages, with consistent citation quality across all supported languages. This is particularly valuable for global enterprises with multilingual knowledge bases.

Hardware Requirements

Command R (35B) at Q4_K_M quantization requires approximately 20GB of RAM, suitable for systems with 32GB RAM, GPUs like the RTX 4090 24GB, A5000 24GB, or Apple M-series with 32GB+ unified memory. At Q8_0, expect approximately 37GB. Full FP16 inference requires approximately 70GB, fitting on A100 80GB.

Command R+ (104B) at Q4_K_M requires approximately 60GB, necessitating A100 80GB or multi-GPU setups. At Q8_0, the requirement grows to approximately 110GB, typically requiring 2x A100 80GB. The 104B model delivers significantly higher quality, especially on complex multi-document reasoning, but the 35B model offers better cost-efficiency for most RAG applications.

For fine-tuning in Ertas Studio, Command R 35B needs 20-28GB VRAM with QLoRA (A6000 48GB recommended), and Command R+ 104B needs 60-70GB with QLoRA (A100 80GB). For most organizations, fine-tuning the 35B variant provides the best balance of quality and training efficiency.

Supported Quantizations

Q4_0Q4_K_MQ5_K_MQ6_KQ8_0F16

Related Resources

Integration

llama.cpp

Integration

LM Studio

Integration

Ollama

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →