
用 $50/月构建 AI SaaS:微调本地技术栈
你不需要 $10K/月的 API 成本来发布 AI 功能。这里是完整的技术栈——微调模型、Ollama、$30 VPS——让生产 AI SaaS 总成本不到 $50/月。
每个人谈论 AI SaaS 都好像你需要风险资本来支付 API 账单。你不需要。你需要一个微调模型、一台 $30 的服务器,和停止按 token 付费给 OpenAI 的意愿。
完整技术栈
基础模型选择
Llama 3.3 8B — 默认选择。Qwen 2.5 7B — 结构化输出更好。Phi-4(3.8B) — 微软的小而强模型。
推荐:从 Llama 3.3 8B 开始。
用 Ertas 微调
成本:$14.50/月(Builder 方案)
GGUF 导出和量化
Q5_K_M 是最佳点。 质量差异在测量噪声范围内,但模型显著更小更快。
VPS:你的 AI 服务 器
成本:$20-30/月
Hetzner CAX31(ARM,8 vCPU,32 GB RAM) — 约$16/月。
Ollama:推理服务器
成本:免费(开源)
curl -fsSL https://ollama.com/install.sh | sh
连接你的应用
const response = await openai.chat.completions.create({
model: "myapp-model",
messages: [{ role: "user", content: userPrompt }],
}, {
baseURL: "http://your-server-ip:11434/v1",
apiKey: "ollama",
});
改了两行。相同 SDK。相同响应格式。你的应用不知道区别。
完整成本明细
| 项目 | 月成本 |
|---|---|
| Ertas Builder 方案 | $14.50 |
| Hetzner CAX31 VPS | 约$16 |
| Ollama | $0 |
| 总计 | 约$30.50/月 |
$50/月。生产 AI 推理。无按 token 收费。
真正重要的数学
以 $9.99/月订阅、2,000 MAU 和 12% 付费转化率:
API 方式:收入 $2,398/月,AI 成本 $1,200/月,利润 $1,198/月(50%) $50 技术栈:收入 $2,398/月,AI 成本 $50/月,利润 $2,348/月(98%)
那额外的 $1,150/月是"勉强能用的副项目「和」养活你的生意"之间的区别。
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
延伸阅读
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Your Vibe-Coded App Hit 1,000 Users — Now What?
You shipped fast with Cursor and Bolt. Users love it. But your OpenAI bill just crossed $200/month and it's climbing. Here's the cost survival guide for vibe-coded apps hitting real scale.

The Vibecoder's Guide to AI Unit Economics: When Free Tiers Stop Being Free
OpenAI's free tier got you started. But at scale, you're spending $5K/month on Opus for tasks Haiku could handle. Here's how to think about AI costs like a founder, not a hobbyist.

Your Vibe-Coded App Hit 10K Users. Now Your AI Bill Is $3K/Month.
Vibe-coded apps with AI features face a brutal cost cliff at scale. Here's how indie developers are cutting AI costs by 95% with fine-tuned local models — without rewriting their apps.