Fine-Tune Phi-4 with Ertas

Microsoft's 14-billion parameter small language model that emphasizes reasoning quality through synthetic data training, achieving performance competitive with models several times its size on math and logic benchmarks.

14BMicrosoft

Overview

Phi-4 is Microsoft's latest entry in the Phi small language model series, released in December 2024. With 14 billion parameters, Phi-4 was specifically designed to maximize reasoning capability relative to model size. Microsoft achieved this through a training methodology that heavily emphasizes synthetic data — carefully generated training examples that target specific reasoning patterns, mathematical problem-solving, and logical deduction.

The model demonstrates remarkable benchmark performance for its size class. On mathematical reasoning benchmarks like MATH and GSM8K, Phi-4 competes with models in the 70B+ parameter range and even approaches some frontier models. This makes it particularly valuable for applications where strong reasoning is required but computational resources are limited.

Phi-4 uses a dense transformer architecture with 40 layers, a hidden dimension of 5120, and 40 attention heads. It supports a context window of 16K tokens and uses the tiktoken tokenizer with a 100K vocabulary. The architecture includes standard modern features like RoPE positional embeddings and grouped-query attention.

The model is released under the MIT license, making it one of the most permissively licensed high-quality models available. This has encouraged broad adoption in both research and commercial applications, particularly in domains requiring structured reasoning.

Key Features

Phi-4's standout feature is its reasoning capability, achieved through Microsoft's innovative synthetic data training pipeline. Rather than relying solely on web-scraped text, the training data includes millions of synthetically generated question-answer pairs, step-by-step mathematical proofs, logic puzzles, and code reasoning traces. This targeted training approach produces a model that reasons more reliably than models trained primarily on natural text.

The model demonstrates particularly strong performance on structured tasks: mathematical problem-solving, code generation with logical constraints, scientific reasoning, and formal logic. On the MATH benchmark, Phi-4 achieves scores that rival GPT-4 Turbo, despite being roughly 100x smaller in parameter count.

Phi-4 also shows improved instruction-following compared to Phi-3, with better adherence to output format requirements, more consistent handling of multi-step instructions, and reduced tendency to hallucinate. The chat-tuned variant supports system prompts and multi-turn conversations effectively.

Fine-Tuning with Ertas

Phi-4 is an excellent candidate for fine-tuning in Ertas Studio, particularly for applications requiring domain-specific reasoning. At 14B parameters, it sits in a sweet spot — large enough to capture complex patterns but small enough for efficient QLoRA training on a single 24GB GPU. With 4-bit quantization, fine-tuning requires approximately 10-14GB VRAM, achievable on an RTX 4090, RTX 3090, or A5000.

In Ertas Studio, upload your reasoning-focused dataset (chain-of-thought examples work particularly well with Phi-4), select the model, and configure LoRA parameters. The model responds well to relatively low LoRA ranks (8-32) for reasoning tasks, keeping adapter sizes small and training fast. A typical fine-tuning run on 10,000 examples completes in 1-2 hours on a single GPU.

After training, export to GGUF format. Phi-4's 14B size quantizes efficiently — at Q4_K_M, the resulting model is approximately 8.5GB, small enough to run on a laptop. This makes Phi-4 ideal for creating specialized reasoning models that can be deployed anywhere without cloud dependencies.

Use Cases

Phi-4 excels in applications requiring structured reasoning: mathematical tutoring systems, scientific analysis tools, code review and debugging assistants, and decision-support systems. Its strong performance on logic tasks makes it particularly suitable for rule-based processing, compliance checking, and structured data extraction.

The model is an excellent choice for educational technology applications, where step-by-step problem-solving explanations are valued. Fine-tuned Phi-4 can serve as a math tutor, a science explainer, or a programming instructor, providing detailed reasoning traces that help users understand the solution process.

For enterprise deployments, Phi-4 offers a compelling combination of strong reasoning with manageable resource requirements. It is well-suited for document analysis pipelines that require logical inference, automated report generation with data-driven conclusions, and quality assurance workflows that need to verify logical consistency.

Hardware Requirements

Phi-4 at Q4_K_M quantization requires approximately 8.5GB of RAM, making it comfortable to run on systems with 16GB RAM, most modern GPUs with 10GB+ VRAM, and Apple Silicon Macs with 16GB unified memory. At Q8_0, expect approximately 15GB, still manageable on a 24GB GPU or 32GB system.

Full FP16 inference requires approximately 28GB VRAM, fitting on a single A6000 48GB or A100 40GB. Inference speed on consumer hardware is excellent — expect 30-50 tokens per second on an RTX 4090 at Q4_K_M and 10-20 tokens per second on an M2 Pro MacBook with 32GB RAM.

For fine-tuning with QLoRA in Ertas Studio, 12-16GB VRAM is sufficient (RTX 4070 Ti, RTX 4080, RTX 4090). Full LoRA fine-tuning requires approximately 20-24GB VRAM. The model's moderate size allows for rapid iteration during the fine-tuning process.

Supported Quantizations

Q4_0Q4_K_MQ5_K_MQ6_KQ8_0F16

Related Resources

Integration

llama.cpp

Integration

LM Studio

Integration

Ollama

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →