OTA 模型更新：保持端侧 AI 始终最新

端侧 AI 模型不是静态的。您的训练数据在改进，微调效果在提升，新的基础模型在发布。更新模型不应该需要通过 App Store 进行完整的应用更新。

OTA（Over-the-Air）模型更新让您可以独立于应用二进制文件向用户推送新的 GGUF 文件。应用检查更新，在后台下载新模型，然后无缝切换。

架构

模型清单

在 CDN 上与模型文件一起托管一个 JSON 清单：

{
  "current_version": "2.1.0",
  "models": {
    "1b": {
      "url": "https://cdn.example.com/models/v2.1.0/model-1b-q4.gguf",
      "size_bytes": 612000000,
      "sha256": "a1b2c3d4e5f6...",
      "min_app_version": "3.0.0",
      "release_notes": "Improved classification accuracy"
    },
    "3b": {
      "url": "https://cdn.example.com/models/v2.1.0/model-3b-q4.gguf",
      "size_bytes": 1740000000,
      "sha256": "f6e5d4c3b2a1...",
      "min_app_version": "3.0.0",
      "release_notes": "Better conversation quality"
    }
  },
  "rollback_version": "2.0.0",
  "rollback_url_1b": "https://cdn.example.com/models/v2.0.0/model-1b-q4.gguf",
  "rollback_url_3b": "https://cdn.example.com/models/v2.0.0/model-3b-q4.gguf"
}

清单告诉应用：最新版本是什么、从哪里下载、如何验证，以及出问题时回退到什么。

更新检查流程

[应用启动] -> [从 CDN 获取清单]
  -> [比较本地版本和清单版本]
  -> [如果有更新版本]:
      -> [检查 WiFi + 足够存储空间]
      -> [后台下载新模型]
      -> [验证 SHA256]
      -> [下次会话启动时切换模型]
  -> [如果版本一致]: [无操作]

实现

// iOS：检查模型更新
class ModelUpdater {
    private let manifestURL = URL(string: "https://cdn.example.com/manifest.json")!

    func checkForUpdate() async -> ModelUpdate? {
        guard let data = try? await URLSession.shared.data(from: manifestURL).0,
              let manifest = try? JSONDecoder().decode(ModelManifest.self, from: data)
        else { return nil }

        let currentVersion = UserDefaults.standard.string(forKey: "model_version") ?? "0.0.0"

        if manifest.currentVersion > currentVersion {
            return ModelUpdate(
                version: manifest.currentVersion,
                url: manifest.models[selectedTier]!.url,
                size: manifest.models[selectedTier]!.sizeBytes,
                hash: manifest.models[selectedTier]!.sha256
            )
        }
        return nil
    }
}

// Android：应用启动时检查更新
class ModelUpdater(private val context: Context) {
    suspend fun checkForUpdate(): ModelUpdate? = withContext(Dispatchers.IO) {
        val manifest = fetchManifest() ?: return@withContext null
        val currentVersion = prefs.getString("model_version", "0.0.0")

        if (manifest.currentVersion > currentVersion) {
            val model = manifest.models[selectedTier]
            ModelUpdate(
                version = manifest.currentVersion,
                url = model.url,
                sizeBytes = model.sizeBytes,
                sha256 = model.sha256
            )
        } else null
    }
}

后台下载

模型下载应在后台进行，不阻塞用户：

iOS：后台 URLSession

func downloadUpdate(_ update: ModelUpdate) {
    let config = URLSessionConfiguration.background(
        withIdentifier: "com.app.model-download"
    )
    let session = URLSession(configuration: config, delegate: self, delegateQueue: nil)
    let task = session.downloadTask(with: update.url)
    task.resume()
}

// 即使应用被挂起，Delegate 也会处理完成事件
func urlSession(_ session: URLSession, downloadTask: URLSessionDownloadTask,
                didFinishDownloadingTo location: URL) {
    let destination = modelDirectory.appendingPathComponent("model-new.gguf")
    try? FileManager.default.moveItem(at: location, to: destination)

    if verifyHash(destination, expected: pendingUpdate.sha256) {
        // 下次会话启动时切换
        UserDefaults.standard.set(pendingUpdate.version, forKey: "pending_model_version")
    } else {
        try? FileManager.default.removeItem(at: destination)
    }
}

Android：WorkManager

class ModelDownloadWorker(
    context: Context, params: WorkerParameters
) : CoroutineWorker(context, params) {

    override suspend fun doWork(): Result {
        val url = inputData.getString("url") ?: return Result.failure()
        val expectedHash = inputData.getString("hash") ?: return Result.failure()

        val tempFile = File(applicationContext.cacheDir, "model-new.gguf")

        // 下载
        downloadFile(url, tempFile) { progress ->
            setProgress(workDataOf("progress" to progress))
        }

        // 验证
        if (tempFile.sha256() != expectedHash) {
            tempFile.delete()
            return Result.failure()
        }

        // 暂存待切换
        val destination = File(applicationContext.filesDir, "model-pending.gguf")
        tempFile.renameTo(destination)

        return Result.success()
    }
}

// 调度下载
fun scheduleModelDownload(url: String, hash: String) {
    val request = OneTimeWorkRequestBuilder<ModelDownloadWorker>()
        .setConstraints(
            Constraints.Builder()
                .setRequiredNetworkType(NetworkType.UNMETERED) // 仅 WiFi
                .setRequiresStorageNotLow(true)
                .build()
        )
        .setInputData(workDataOf("url" to url, "hash" to hash))
        .build()

    WorkManager.getInstance(context).enqueue(request)
}

模型切换

不要在模型加载时切换。在安全时间点切换：

安全切换策略

下载完成： 新模型保存为 model-pending.gguf
下次应用启动时（或下次聊天会话开始时）： a. 卸载当前模型 b. 将 model-current.gguf 重命名为 model-previous.gguf c. 将 model-pending.gguf 重命名为 model-current.gguf d. 加载新模型 e. 更新存储的版本号
如果新模型加载失败： 回退到 model-previous.gguf

func swapModelIfPending() throws {
    let pendingPath = modelDirectory.appendingPathComponent("model-pending.gguf")
    let currentPath = modelDirectory.appendingPathComponent("model-current.gguf")
    let previousPath = modelDirectory.appendingPathComponent("model-previous.gguf")

    guard FileManager.default.fileExists(atPath: pendingPath.path) else { return }

    // 卸载当前模型
    engine.unload()

    // 轮转文件
    try? FileManager.default.removeItem(at: previousPath) // 移除旧备份
    try? FileManager.default.moveItem(at: currentPath, to: previousPath) // 备份当前
    try FileManager.default.moveItem(at: pendingPath, to: currentPath) // 提升待切换

    // 尝试加载新模型
    do {
        try engine.load(at: currentPath.path)
        // 成功：更新版本号
        UserDefaults.standard.set(pendingVersion, forKey: "model_version")
    } catch {
        // 回滚
        try? FileManager.default.removeItem(at: currentPath)
        try? FileManager.default.moveItem(at: previousPath, to: currentPath)
        try engine.load(at: currentPath.path)
    }
}

回滚策略

始终保留上一个模型版本：

本地回滚： 在设备上保留 model-previous.gguf。如果新模型加载失败或质量不佳，立即回退。
远程回滚： 在清单中包含回滚 URL。如果发现模型质量问题，更新清单指向上一个版本。所有应用将"更新"到旧的、可用的模型。
自动回滚： 如果应用在模型切换后检测到推理失败或崩溃，自动回退到上一个版本。

更新频率

场景	更新频率	说明
早期产品（快速迭代）	每周到每两周	快速的质量改进
稳定产品	每月到每季度	增量改进
有新基础模型可用	按需	重大质量提升
训练数据显著变化	按需	领域变化

每次更新是一次微调运行（$5-50）加上 CDN 分发。与质量改进相比，成本微不足道。

基础设施成本

用户数	月下载量	模型大小	CDN 成本（Cloudflare R2）
1,000	约 200（更新 + 新用户）	1.7GB	约 $0.01/月
10,000	约 2,000	1.7GB	约 $0.05/月
100,000	约 20,000	1.7GB	约 $0.51/月

借助 Cloudflare R2 的零出站费定价，OTA 模型分发基本上是免费的。即使有 10 万用户，CDN 成本也不到 $1/月。

微调和 GGUF 导出步骤正是 Ertas 等平台简化工作流程的地方。在更新的数据上重新训练，导出 GGUF，上传到 CDN，更新清单。您的用户会自动获得改进后的模型。

OTA 模型更新：保持端侧 AI 始终最新

架构

模型清单

更新检查流程

实现

后台下载

iOS：后台 URLSession

Android：WorkManager

模型切换

安全切换策略

回滚策略

更新频率

基础设施成本

Ship AI that runs on your users' devices.

Keep reading

Shipping GGUF Models: App Store Bundling vs Post-Install Download

Migrating from Cloud API to On-Device AI: The Complete Guide

How to Add AI to Your Mobile App: A Developer's Decision Guide