
微调 Phi-4:微软最佳企业任务小型模型
Phi-4 14B 在数学基准上超越 GPT-4,同时在本地硬件上运行速度快 15 倍。以下是如何为分类、提取和结构化输出任务微调它。
微软 Phi-4 是一个 14B 参数模型,在 MATH 基准上得分 84.8%——高于 GPT-4 的 84.3%。一个小到可以在单个消费级 GPU 上运行的模型,在数学推理上超越了万亿参数模型。
为什么企业选 Phi-4
- **数学推理:**MATH 84.8%,GSM8K 93.2%
- **结构化输出:**开箱即用 96% JSON Schema 合规性
- **指令跟随:**可靠追踪多部分指令
- **代码生成:**HumanEval 82.6%
最佳企业用例
金融文档处理
微调后:96% 收入表行项提取准确率,98% 数值计算准确率。
复杂分类法分类
32 类支持工单分类:Phi-4 94%,Llama 3.3 8B 89%,GPT-4o 87%。
结构化数据提取
合同条款提取:Phi-4 93% 字段级准确率,97% JSON 有效性。
量化推荐
| 量化 | 准确率 | JSON 有效性 | 模型大小 |
|---|---|---|---|
| Q5_K_M | 92.8% | 96.8% | 10 GB |
| Q4_K_M | 92.1% | 96.2% | 8.5 GB |
Q5_K_M 比 FP16 仅损失 0.4% 准确率,快 73%,小 64%。
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
延伸阅读
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Fine-Tuning Qwen 2.5 for Multilingual Applications
Qwen 2.5 covers 29 languages with 18 trillion training tokens. Here's how to fine-tune it for multilingual classification, support, and content generation without separate models per language.

Fine-Tuning Gemma 3: Google's Lightweight Model for On-Device Deployment
Gemma 3 is optimized for on-device inference — phones, tablets, edge hardware. Here's how to fine-tune it for mobile AI features and IoT applications that run without a server.

On-Device Tool Calling 2026: Qwen3-4B vs Gemma 4 E4B vs Phi-4-Mini
We benchmarked the three best on-device tool-calling bases of 2026 — Qwen3-4B, Gemma 4 E4B, and Phi-4-Mini — across BFCL v4, real mobile latency, and post-fine-tune accuracy. Each wins a different scenario; here's how to pick.