Mistral Small 4
Fine-tuning accessibility: Excellent
Mistral Small 4's 6B active parameter MoE architecture makes it exceptionally efficient to fine-tune relative to its 119B total parameters. QLoRA fits comfortably on a single 24GB GPU at typical sequence lengths — substantially more accessible than fine-tuning equivalent-quality dense models in the 30B-70B range, which typically require 48GB+ GPUs. The unified architecture (covering reasoning, coding, and instruction-tuned use cases) means a single fine-tune handles cross-domain tasks. Apache 2.0 license has no usage restrictions or attribution requirements.
Strengths
- QLoRA fine-tuning fits on a single 24GB GPU at full sequence length
- 6B active parameter inference for fast deployment of fine-tuned models
- Apache 2.0 license with no commercial restrictions
- Single fine-tune handles reasoning, coding, and instruction-tuned tasks
Trade-offs
- MoE expert routing requires platform-aware fine-tuning configuration (handled automatically in Ertas Studio)
- Q4_K_M deployment footprint (65GB) larger than active parameter count suggests