Fine-Tune Magistral with Ertas

Mistral AI's dedicated reasoning model line — Magistral Medium 1.2 (magistral-medium-2509) and Magistral Small 1.2 (magistral-small-2509) — focused on extended chain-of-thought capability before the lineage was unified into Mistral Small 4.

SmallMediumMistral AI

Overview

Magistral is Mistral AI's dedicated reasoning model line, originally released in 2025 as the company's response to the dedicated-reasoning trend established by DeepSeek-R1 and QwQ-32B. The line includes Magistral Small and Magistral Medium variants, with the latest publicly-documented versions being Magistral Medium 1.2 (`magistral-medium-2509`) and Magistral Small 1.2 (`magistral-small-2509`) released in September 2025.

The Magistral lineage emphasizes extended chain-of-thought reasoning trained with reinforcement learning, similar in spirit to DeepSeek-R1's training methodology but with Mistral's distinct post-training pipeline and European deployment-focused positioning. Magistral models target use cases where reasoning depth matters more than response speed: mathematical problem solving, scientific analysis, complex code generation, and structured deliberation tasks.

In March 2026, Mistral announced the consolidation of its model lineage: Magistral (reasoning), Devstral (coding agents), and Mistral Small (instruct) were unified into a single Mistral Small 4 checkpoint. This consolidation marks the end of Magistral as a separate product line, but Magistral Medium and Small variants remain available for deployment scenarios where teams prefer the dedicated reasoning behavior over Mistral Small 4's hybrid approach.

For teams evaluating Mistral's reasoning capability in 2026, Mistral Small 4 is the recommended forward path. Magistral remains documented and supported for stable production deployments that adopted the line before the consolidation.

Key Features

Dedicated reasoning training is the original Magistral differentiator. Trained with reinforcement learning emphasizing chain-of-thought generation, Magistral models produce extended reasoning traces before final answers — similar to DeepSeek-R1 and QwQ-32B in pattern, with Mistral's specific post-training characteristics.

European deployment positioning is a meaningful advantage for some teams. Mistral AI is EU-headquartered with strong data sovereignty positioning, making Magistral attractive to European organizations subject to regulatory or political preference for non-US/non-Chinese AI providers. This positioning carries forward to Mistral Small 4 as well.

The Small/Medium tier structure gives deployment flexibility. Magistral Small handles general reasoning workloads at single-GPU deployment cost; Magistral Medium delivers higher peak quality at multi-GPU server scale. This range allows teams to match the reasoning model size to their actual deployment infrastructure.

Mistral's lineage of post-training expertise is reflected in Magistral's instruction-following stability and tool-use fidelity. While dedicated reasoning models can sometimes be unstable in agentic deployments (the reasoning mode can interfere with structured outputs), Magistral has been engineered for production reliability rather than purely benchmark performance.

Fine-Tuning with Ertas

Magistral Small fine-tunes well in Ertas Studio with QLoRA on a 24-48GB GPU at typical sequence lengths. Magistral Medium requires multi-GPU server fine-tuning given its larger parameter count.

For reasoning-mode fine-tuning specifically, Ertas Studio supports training data formats with explicit chain-of-thought traces. Including thinking traces in your training data preserves the dedicated reasoning behavior in the fine-tuned model rather than collapsing into direct-response mode.

After training, Ertas Studio exports Magistral fine-tunes to GGUF format with full Mistral chat template preservation. Deployment via Ollama, llama.cpp, or vLLM works straightforwardly with the same configuration patterns as base Mistral models.

For most teams considering new reasoning-focused fine-tuning projects in 2026, Mistral Small 4 is the recommended starting point rather than Magistral — the unified architecture is more operationally efficient and matches or exceeds Magistral on reasoning benchmarks. Magistral fine-tuning remains valid for teams with existing pipelines or deployment investments in the Magistral line.

Use Cases

Magistral's primary use cases in 2026 are stable production deployments running on the line before the Mistral Small 4 consolidation. Teams that fine-tuned on Magistral often value continuity over migration costs, particularly when their downstream evaluation pipelines and prompt patterns are tuned to Magistral-specific behavior.

For European organizations with strict data sovereignty requirements, Magistral (and now Mistral Small 4) remains an attractive choice. Self-hosted deployment on European infrastructure provides full data control while leveraging Mistral's ecosystem and support relationships.

Dedicated reasoning workloads — math, scientific analysis, complex code generation — benefit from Magistral's extended chain-of-thought capability. While the unified-thinking-mode approach in Mistral Small 4 is operationally simpler, dedicated reasoning models still have advantages in specialized scenarios where reasoning is the only task and the latency hit is acceptable.

Hardware Requirements

Magistral Small at Q4_K_M typically requires 12-20GB of memory depending on the specific variant (Small 1.2 is in the mid-range of this band). Fits on a single 24GB GPU with margin.

Magistral Medium at Q4_K_M requires substantially more — typically 60-100GB depending on the variant — and benefits from multi-GPU deployment for production serving.

For fine-tuning in Ertas Studio: Magistral Small QLoRA needs 16-28GB VRAM, fitting on a single 24-32GB GPU. Magistral Medium QLoRA requires 80-120GB VRAM, typically split across two 48GB GPUs or deployed on a single 80GB GPU with aggressive memory management. For new fine-tuning projects, Mistral Small 4 (with its 6B active MoE architecture) offers substantially better training economics.

Supported Quantizations

Q4_0Q4_K_MQ5_K_MQ6_KQ8_0F16

Related Resources

llama.cpp

LM Studio

Ollama

vLLM

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →