
OTA 模型更新:保持端侧 AI 始终最新
如何在不经过应用商店发布的情况下向用户推送模型更新。版本检查、后台下载、回滚策略以及 OTA 模型分发的基础设施。
端侧 AI 模型不是静态的。您的训练数据在改进,微调效果在提升,新的基础模型在发布。更新模型不应该需要通过 App Store 进行完整的应用更新。
OTA(Over-the-Air)模型更新让您可以独立于应用二进制文件向用户推送新的 GGUF 文件。应用检查更新,在后台下载新模型,然后无缝切换。
架构
模型清单
在 CDN 上与模型文件一起托管一个 JSON 清单:
{
"current_version": "2.1.0",
"models": {
"1b": {
"url": "https://cdn.example.com/models/v2.1.0/model-1b-q4.gguf",
"size_bytes": 612000000,
"sha256": "a1b2c3d4e5f6...",
"min_app_version": "3.0.0",
"release_notes": "Improved classification accuracy"
},
"3b": {
"url": "https://cdn.example.com/models/v2.1.0/model-3b-q4.gguf",
"size_bytes": 1740000000,
"sha256": "f6e5d4c3b2a1...",
"min_app_version": "3.0.0",
"release_notes": "Better conversation quality"
}
},
"rollback_version": "2.0.0",
"rollback_url_1b": "https://cdn.example.com/models/v2.0.0/model-1b-q4.gguf",
"rollback_url_3b": "https://cdn.example.com/models/v2.0.0/model-3b-q4.gguf"
}
清单告诉应用:最新版本是什么、从哪里下载、如何验证,以及出问题时回退到什么。
更新检查流程
[应用启动] -> [从 CDN 获取清单]
-> [比较本地版本和清单版本]
-> [如果有更新版本]:
-> [检查 WiFi + 足够存储空间]
-> [后台下载新模型]
-> [验证 SHA256]
-> [下次会话启动时切换模型]
-> [如果版本一致]: [无操作]
实现
// iOS:检查模型更新
class ModelUpdater {
private let manifestURL = URL(string: "https://cdn.example.com/manifest.json")!
func checkForUpdate() async -> ModelUpdate? {
guard let data = try? await URLSession.shared.data(from: manifestURL).0,
let manifest = try? JSONDecoder().decode(ModelManifest.self, from: data)
else { return nil }
let currentVersion = UserDefaults.standard.string(forKey: "model_version") ?? "0.0.0"
if manifest.currentVersion > currentVersion {
return ModelUpdate(
version: manifest.currentVersion,
url: manifest.models[selectedTier]!.url,
size: manifest.models[selectedTier]!.sizeBytes,
hash: manifest.models[selectedTier]!.sha256
)
}
return nil
}
}
// Android:应用启动时检查更新
class ModelUpdater(private val context: Context) {
suspend fun checkForUpdate(): ModelUpdate? = withContext(Dispatchers.IO) {
val manifest = fetchManifest() ?: return@withContext null
val currentVersion = prefs.getString("model_version", "0.0.0")
if (manifest.currentVersion > currentVersion) {
val model = manifest.models[selectedTier]
ModelUpdate(
version = manifest.currentVersion,
url = model.url,
sizeBytes = model.sizeBytes,
sha256 = model.sha256
)
} else null
}
}
后台下载
模型下载应在后台进行,不阻塞用户:
iOS:后台 URLSession
func downloadUpdate(_ update: ModelUpdate) {
let config = URLSessionConfiguration.background(
withIdentifier: "com.app.model-download"
)
let session = URLSession(configuration: config, delegate: self, delegateQueue: nil)
let task = session.downloadTask(with: update.url)
task.resume()
}
// 即使应用被挂起,Delegate 也会处理完成事件
func urlSession(_ session: URLSession, downloadTask: URLSessionDownloadTask,
didFinishDownloadingTo location: URL) {
let destination = modelDirectory.appendingPathComponent("model-new.gguf")
try? FileManager.default.moveItem(at: location, to: destination)
if verifyHash(destination, expected: pendingUpdate.sha256) {
// 下次会话启动时切换
UserDefaults.standard.set(pendingUpdate.version, forKey: "pending_model_version")
} else {
try? FileManager.default.removeItem(at: destination)
}
}
Android:WorkManager
class ModelDownloadWorker(
context: Context, params: WorkerParameters
) : CoroutineWorker(context, params) {
override suspend fun doWork(): Result {
val url = inputData.getString("url") ?: return Result.failure()
val expectedHash = inputData.getString("hash") ?: return Result.failure()
val tempFile = File(applicationContext.cacheDir, "model-new.gguf")
// 下载
downloadFile(url, tempFile) { progress ->
setProgress(workDataOf("progress" to progress))
}
// 验证
if (tempFile.sha256() != expectedHash) {
tempFile.delete()
return Result.failure()
}
// 暂存待切换
val destination = File(applicationContext.filesDir, "model-pending.gguf")
tempFile.renameTo(destination)
return Result.success()
}
}
// 调度下载
fun scheduleModelDownload(url: String, hash: String) {
val request = OneTimeWorkRequestBuilder<ModelDownloadWorker>()
.setConstraints(
Constraints.Builder()
.setRequiredNetworkType(NetworkType.UNMETERED) // 仅 WiFi
.setRequiresStorageNotLow(true)
.build()
)
.setInputData(workDataOf("url" to url, "hash" to hash))
.build()
WorkManager.getInstance(context).enqueue(request)
}
模型切换
不要在模型加载时切换。在安全时间点切换:
安全切换策略
- 下载完成: 新模型保存为
model-pending.gguf - 下次应用启动时(或下次聊天会话开始时):
a. 卸载当前模型
b. 将
model-current.gguf重命名为model-previous.ggufc. 将model-pending.gguf重命名为model-current.ggufd. 加载新模型 e. 更新存储的版本号 - 如果新模型加载失败: 回退到
model-previous.gguf
func swapModelIfPending() throws {
let pendingPath = modelDirectory.appendingPathComponent("model-pending.gguf")
let currentPath = modelDirectory.appendingPathComponent("model-current.gguf")
let previousPath = modelDirectory.appendingPathComponent("model-previous.gguf")
guard FileManager.default.fileExists(atPath: pendingPath.path) else { return }
// 卸载当前模型
engine.unload()
// 轮转文件
try? FileManager.default.removeItem(at: previousPath) // 移除旧备份
try? FileManager.default.moveItem(at: currentPath, to: previousPath) // 备份当前
try FileManager.default.moveItem(at: pendingPath, to: currentPath) // 提升待切换
// 尝试加载新模型
do {
try engine.load(at: currentPath.path)
// 成功:更新版本号
UserDefaults.standard.set(pendingVersion, forKey: "model_version")
} catch {
// 回滚
try? FileManager.default.removeItem(at: currentPath)
try? FileManager.default.moveItem(at: previousPath, to: currentPath)
try engine.load(at: currentPath.path)
}
}
回滚策略
始终保留上一个模型版本:
- 本地回滚: 在设备上保留
model-previous.gguf。如果新模型加载失败或质量不佳,立即回退。 - 远程回滚: 在清单中包含回滚 URL。如果发现模型质量问题,更新清单指向上一个版本。所有应用将"更新"到旧的、可用的模型。
- 自动回滚: 如果应用在模型切换后检测到推理失败或崩溃,自动回退到上一个版本。
更新频率
| 场景 | 更新频率 | 说明 |
|---|---|---|
| 早期产品(快速迭代) | 每周到每两周 | 快速的质量改进 |
| 稳定产品 | 每月到每季度 | 增量改进 |
| 有新基础模型可用 | 按需 | 重大质量提升 |
| 训练数据显著变化 | 按需 | 领域变化 |
每次更新是一次微调运行($5-50)加上 CDN 分发。与质量改进相比,成本微不足道。
基础设施成本
| 用户数 | 月下载量 | 模型大小 | CDN 成本(Cloudflare R2) |
|---|---|---|---|
| 1,000 | 约 200(更新 + 新用户) | 1.7GB | 约 $0.01/月 |
| 10,000 | 约 2,000 | 1.7GB | 约 $0.05/月 |
| 100,000 | 约 20,000 | 1.7GB | 约 $0.51/月 |
借助 Cloudflare R2 的零出站费定价,OTA 模型分发基本上是免费的。即使有 10 万用户,CDN 成本也不到 $1/月。
微调和 GGUF 导出步骤正是 Ertas 等平台简化工作流程的地方。在更新的数据上重新训练,导出 GGUF,上传到 CDN,更新清单。您的用户会自动获得改进后的模型。
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Shipping GGUF Models: App Store Bundling vs Post-Install Download
Two ways to get your GGUF model onto the user's device. Bundle it with the app for simplicity, or download post-install for flexibility. Architecture, size limits, and best practices for both.

Migrating from Cloud API to On-Device AI: The Complete Guide
A step-by-step migration plan for moving your mobile app from cloud AI APIs to on-device inference. Data extraction, fine-tuning, integration, testing, rollout, and monitoring.

How to Add AI to Your Mobile App: A Developer's Decision Guide
A comprehensive guide covering every approach to adding AI features to iOS and Android apps. Cloud APIs, on-device models, and hybrid architectures compared with real cost and performance data.