Apple Silicon 微调:在 M 系列 Mac 上运行自定义模型
在 Apple Silicon Mac 上部署微调 AI 模型的实用指南。涵盖 M4 硬件能力、统一内存优势、Ollama 和 MLX 设置、量化选择以及 Core ML LoRA 适配器支持。
Apple Silicon 在本地 AI 推理方面有一个被大多数人低估的优势:统一内存。CPU、GPU 和 Neural Engine 共享同一内存池——无需在独立 VRAM 和系统 RAM 之间复制数据。
如果您拥有 M 系列 Mac,您已经拥有了可用的 AI 推理硬件。
您的 Mac 能运行什么
| Mac | 统一内存 | 推荐模型 | 预期速度 |
|---|---|---|---|
| M1/M2/M3/M4(基础) | 8-16 GB | 1-3B 量化,7B Q4(紧凑) | ~15-25 tok/s |
| M1/M2/M3/M4 Pro | 18-24 GB | 7-8B Q5/Q8,13B Q4 | ~25-35 tok/s |
| M1/M2/M3/M4 Max | 32-128 GB | 13B Q8,70B Q4 | ~15-30 tok/s |
| M2/M4 Ultra | 64-192 GB | 70B Q8,多模型同时 | ~20-35 tok/s |
部署方案
选项 1:Ollama(最简单)
brew install ollama
ollama create my-model -f Modelfile
ollama run my-model
选项 2:MLX(Apple 原生性能)
Apple 自家 ML 框架,支持 LoRA 适配器原生加载。
选项 3:llama.cpp(最大控制)
自定义 batch 大小、线程配置,支持 Metal GPU 加速。
端到端工作流
- 在 Ertas 云 GPU 上微调
- 导出为 GGUF(Q5_K_M 推荐 24GB+ Mac)
- 加载到 Ollama
- 集成到您的技术栈
- 零边际成本运行
对于独立开发者,云 API 费用($500-2,000/月)vs 本地推理($10-15/月电费)的差异是可行业务和烧钱之间的区别。
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Running AI Models Locally: The Complete Guide to Local LLM Inference
Everything you need to know about running large language models on your own hardware — from hardware requirements and model formats to tools like Ollama, LM Studio, and llama.cpp.

Building Reliable AI Agents with Fine-Tuned Local Models: Complete Guide
Most AI agents are just GPT-4 wrappers — expensive, unreliable at scale, and dependent on cloud APIs. Fine-tuned local models hit 98%+ accuracy on your specific tools at zero per-query cost. Here's the complete architecture.

AI Inference Costs Compared: Cloud APIs vs Self-Hosted vs Dedicated Silicon (2026)
A detailed cost breakdown of running AI inference across cloud APIs (OpenAI, Anthropic), self-hosted GPUs (Ollama, llama.cpp), and dedicated silicon (Taalas HC1). Real numbers for agencies, indie devs, and enterprise teams.