Fine-Tune DeepSeek-R1 with Ertas

DeepSeek's dedicated reasoning model trained with reinforcement learning to perform extended chain-of-thought reasoning, available in distilled sizes from 1.5B to 70B and the full 671B mixture-of-experts architecture.

1.5B7B8B14B32B70B671BDeepSeek

Overview

DeepSeek-R1, released in January 2025, is a dedicated reasoning model that uses extended chain-of-thought (CoT) processing to solve complex problems. Unlike standard instruction-tuned models that generate answers directly, R1 produces detailed internal reasoning traces — thinking through problems step by step — before arriving at its final answer. This approach yields dramatic improvements on tasks requiring mathematical reasoning, logical deduction, code generation, and scientific problem-solving.

The full DeepSeek-R1 model uses a 671B-parameter mixture-of-experts architecture (the same base as DeepSeek-V3) with approximately 37B parameters active per forward pass. However, DeepSeek also released a series of distilled variants created by training smaller dense models (Qwen and Llama-based) on R1's reasoning traces. These distilled models range from 1.5B to 70B parameters and retain much of the full model's reasoning capability at dramatically lower computational cost.

The training methodology is particularly innovative. DeepSeek-R1 was trained using large-scale reinforcement learning (RL) with minimal supervised fine-tuning, allowing the model to develop its own reasoning strategies rather than imitating human-written chain-of-thought examples. An intermediate version, DeepSeek-R1-Zero, was trained with pure RL and no SFT, demonstrating that reasoning capabilities can emerge from reward signals alone.

DeepSeek-R1 matches or exceeds OpenAI's o1 on several benchmarks including AIME 2024 (math competitions), Codeforces (competitive programming), and GPQA Diamond (graduate-level science questions). The model and its distilled variants are released under the MIT license.

Key Features

Extended chain-of-thought reasoning is R1's defining feature. When given a complex problem, the model generates internal reasoning traces that can span hundreds or thousands of tokens before producing its final answer. These traces include hypothesis generation, self-correction, verification steps, and alternative approach exploration — mimicking how expert humans approach difficult problems. Users can observe the reasoning process in real-time, providing transparency into the model's decision-making.

The distilled model series is exceptionally valuable for the open-source community. DeepSeek distilled R1's reasoning capabilities into six smaller models: R1-Distill-Qwen-1.5B, R1-Distill-Qwen-7B, R1-Distill-Llama-8B, R1-Distill-Qwen-14B, R1-Distill-Qwen-32B, and R1-Distill-Llama-70B. The 32B distilled model, in particular, is a standout — it achieves reasoning performance that rivals much larger models at a fraction of the compute cost.

R1 also demonstrates strong performance on tasks that benefit from deliberative thinking: complex code debugging, multi-step mathematical proofs, scientific hypothesis evaluation, and strategic planning. The model knows when to think longer on harder problems and when to respond quickly on simpler queries.

Fine-Tuning with Ertas

Fine-tuning DeepSeek-R1 distilled models in Ertas Studio is an effective way to create domain-specific reasoning models. The distilled 7B and 8B variants are the most popular starting points, requiring 8-12GB VRAM with QLoRA and fitting on standard consumer GPUs. The 14B distilled model needs approximately 10-14GB VRAM, and the exceptional 32B distilled variant requires 20-28GB VRAM.

For best results when fine-tuning R1 models, include chain-of-thought reasoning traces in your training data. Ertas Studio supports datasets with explicit thinking tokens, where each training example includes both the reasoning process and the final answer. This teaches the model to apply R1-style reasoning to your specific domain — for instance, training on step-by-step medical diagnostic reasoning, legal argument chains, or engineering design rationales.

After fine-tuning, Ertas Studio exports to GGUF format. R1 distilled models work well with standard quantization formats. A Q4_K_M quantized R1-Distill-Qwen-32B at approximately 19GB is a powerful reasoning model that runs on a single 24GB GPU or a Mac with 32GB RAM, delivering sophisticated reasoning capabilities in a locally deployable package.

Use Cases

DeepSeek-R1 excels in any application where accuracy and depth of reasoning matter more than response speed. Mathematical problem-solving is its strongest suit — the model can tackle competition-level math, symbolic computation, and quantitative analysis with high reliability. It is ideal for educational platforms, STEM tutoring systems, and research assistance tools.

Code generation and debugging benefit significantly from R1's reasoning approach. The model can analyze complex codebases, identify subtle bugs, reason about algorithmic complexity, and generate correct implementations for challenging programming problems. Fine-tuned R1 variants make excellent code review assistants that can explain their reasoning for each identified issue.

The distilled variants are suitable for applications requiring on-premise reasoning capability: financial analysis with step-by-step calculation verification, legal document review with explicit reasoning chains, medical decision support with transparent diagnostic logic, and engineering calculations with verifiable derivations.

Hardware Requirements

The distilled R1 models have standard hardware requirements for their parameter counts: the 1.5B at Q4_K_M needs about 1.1GB, the 7B/8B models need about 4.5-5GB, the 14B needs about 8.5GB, the 32B needs about 19GB, and the 70B needs about 40GB. However, note that reasoning tasks generate significantly more tokens than standard tasks (often 5-10x more), so throughput rather than just model loading should be considered.

The full 671B MoE model at Q4_K_M requires approximately 370GB, demanding large multi-GPU setups (e.g., 8x A100 80GB). The 37B active parameter count means generation speed is reasonable once loaded, comparable to a 37B dense model, but the memory footprint is substantial.

For fine-tuning in Ertas Studio, recommended configurations are: 7B/8B distilled variants need 8-12GB VRAM, 14B needs 12-16GB, 32B needs 20-28GB, and 70B needs 40-48GB with QLoRA. The 32B distilled variant offers the best quality-to-resource ratio for reasoning tasks.

Supported Quantizations

Q4_0Q4_K_MQ5_K_MQ6_KQ8_0F16

Related Resources

Integration

llama.cpp

Integration

LM Studio

Integration

Ollama

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →