
微调 Gemma 3:Google 为端侧部署优化的轻量模型
Gemma 3 为端侧推理优化——手机、平板、边缘硬件。以下是如何为无需服务器运行的移动 AI 功能和 IoT 应用微调它。
在手机、Raspberry Pi 或 IoT 网关上运行 AI——完全不经过服务器——改变了可能性。无网络往返延迟。无随用户增长的 API 费用。无互联网依赖。数据完全不离开设备。
4B 模型是大多数端侧部署的目标。Q4_K_M 量化下不到 3 GB RAM——完全在现代智能手机、Raspberry Pi 5 或浏览器标签页能力范围内。
推理速度对比(Q4_K_M)
| 硬件 | Gemma 3 4B | Llama 3.2 3B |
|---|---|---|
| iPhone 15 Pro (ANE) | 28 t/s | 24 t/s |
| Pixel 8 Pro (GPU) | 22 t/s | 19 t/s |
| Raspberry Pi 5 (CPU) | 6.4 t/s | 5.5 t/s |
| M2 MacBook Air (GPU) | 48 t/s | 41 t/s |
| 浏览器 (WebLLM) | 12 t/s | 10 t/s |
Gemma 3 在所有端侧目标上快 15-30%。
端侧任务的数据集策略
**短输入,简洁输出。**端侧上每个 token 都有延迟和内存成本:
{"instruction": "Classify intent", "input": "Where's my order?", "output": "order_status"}
包含真实设备使用的边缘情况——更多打字错误、缩写、非正式语言。
集成模式
- React Native + llama.rn:分类延迟 80-150ms
- iOS Core ML:ANE 上 32-35 t/s
- Android NNAPI:NPU 推理功耗低 50-60%
- 浏览器 WebLLM:模型下载一次并缓存
- Raspberry Pi llama.cpp:IoT 和边缘部署
真实设备基准
微调 Gemma 3 4B(Q4_K_M)12 类意图分类任务:
| 设备 | 准确率 | 延迟(平均) |
|---|---|---|
| iPhone 15 Pro | 94% | 65ms |
| Pixel 8 Pro | 94% | 85ms |
| Raspberry Pi 5 | 94% | 280ms |
| 浏览器 (Chrome, M2) | 94% | 110ms |
200 美元手机上的端侧模型延迟击败世界最佳 API 10 倍。
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
延伸阅读
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Fine-Tuning Phi-4: Microsoft's Best Small Model for Enterprise Tasks
Phi-4 14B outperforms GPT-4 on math benchmarks while running 15x faster on local hardware. Here's how to fine-tune it for classification, extraction, and structured output tasks.

Fine-Tuning Qwen 2.5 for Multilingual Applications
Qwen 2.5 covers 29 languages with 18 trillion training tokens. Here's how to fine-tune it for multilingual classification, support, and content generation without separate models per language.

SmolLM2 and Sub-3B Models: Fine-Tuning for Edge and Mobile
Sub-3B parameter models run on phones, Raspberry Pis, and browser tabs. Here's how to fine-tune SmolLM2, Phi-3.5 Mini, and Qwen 2.5 0.5B for edge deployment where every megabyte counts.