OTA 模型更新：保持裝置端 AI 始終最新

裝置端 AI 模型不是靜態的。您的訓練資料在改進，微調效果在提升，新的基礎模型在發布。更新模型不應該需要透過 App Store 進行完整的應用程式更新。

OTA（Over-the-Air）模型更新讓您可以獨立於應用程式二進位檔向使用者推送新的 GGUF 檔案。應用程式檢查更新，在背景下載新模型，然後無縫切換。

架構

模型清單

在 CDN 上與模型檔案一起託管一個 JSON 清單：

{
  "current_version": "2.1.0",
  "models": {
    "1b": {
      "url": "https://cdn.example.com/models/v2.1.0/model-1b-q4.gguf",
      "size_bytes": 612000000,
      "sha256": "a1b2c3d4e5f6...",
      "min_app_version": "3.0.0",
      "release_notes": "Improved classification accuracy"
    },
    "3b": {
      "url": "https://cdn.example.com/models/v2.1.0/model-3b-q4.gguf",
      "size_bytes": 1740000000,
      "sha256": "f6e5d4c3b2a1...",
      "min_app_version": "3.0.0",
      "release_notes": "Better conversation quality"
    }
  },
  "rollback_version": "2.0.0",
  "rollback_url_1b": "https://cdn.example.com/models/v2.0.0/model-1b-q4.gguf",
  "rollback_url_3b": "https://cdn.example.com/models/v2.0.0/model-3b-q4.gguf"
}

清單告訴應用程式：最新版本是什麼、從哪裡下載、如何驗證，以及出問題時回退到什麼。

更新檢查流程

[應用程式啟動] -> [從 CDN 取得清單]
  -> [比較本地版本和清單版本]
  -> [如果有更新版本]:
      -> [檢查 WiFi + 足夠儲存空間]
      -> [背景下載新模型]
      -> [驗證 SHA256]
      -> [下次工作階段啟動時切換模型]
  -> [如果版本一致]: [無操作]

實作

// iOS：檢查模型更新
class ModelUpdater {
    private let manifestURL = URL(string: "https://cdn.example.com/manifest.json")!

    func checkForUpdate() async -> ModelUpdate? {
        guard let data = try? await URLSession.shared.data(from: manifestURL).0,
              let manifest = try? JSONDecoder().decode(ModelManifest.self, from: data)
        else { return nil }

        let currentVersion = UserDefaults.standard.string(forKey: "model_version") ?? "0.0.0"

        if manifest.currentVersion > currentVersion {
            return ModelUpdate(
                version: manifest.currentVersion,
                url: manifest.models[selectedTier]!.url,
                size: manifest.models[selectedTier]!.sizeBytes,
                hash: manifest.models[selectedTier]!.sha256
            )
        }
        return nil
    }
}

// Android：應用程式啟動時檢查更新
class ModelUpdater(private val context: Context) {
    suspend fun checkForUpdate(): ModelUpdate? = withContext(Dispatchers.IO) {
        val manifest = fetchManifest() ?: return@withContext null
        val currentVersion = prefs.getString("model_version", "0.0.0")

        if (manifest.currentVersion > currentVersion) {
            val model = manifest.models[selectedTier]
            ModelUpdate(
                version = manifest.currentVersion,
                url = model.url,
                sizeBytes = model.sizeBytes,
                sha256 = model.sha256
            )
        } else null
    }
}

背景下載

模型下載應在背景進行，不阻塞使用者：

iOS：背景 URLSession

func downloadUpdate(_ update: ModelUpdate) {
    let config = URLSessionConfiguration.background(
        withIdentifier: "com.app.model-download"
    )
    let session = URLSession(configuration: config, delegate: self, delegateQueue: nil)
    let task = session.downloadTask(with: update.url)
    task.resume()
}

// 即使應用程式被掛起，Delegate 也會處理完成事件
func urlSession(_ session: URLSession, downloadTask: URLSessionDownloadTask,
                didFinishDownloadingTo location: URL) {
    let destination = modelDirectory.appendingPathComponent("model-new.gguf")
    try? FileManager.default.moveItem(at: location, to: destination)

    if verifyHash(destination, expected: pendingUpdate.sha256) {
        // 下次工作階段啟動時切換
        UserDefaults.standard.set(pendingUpdate.version, forKey: "pending_model_version")
    } else {
        try? FileManager.default.removeItem(at: destination)
    }
}

Android：WorkManager

class ModelDownloadWorker(
    context: Context, params: WorkerParameters
) : CoroutineWorker(context, params) {

    override suspend fun doWork(): Result {
        val url = inputData.getString("url") ?: return Result.failure()
        val expectedHash = inputData.getString("hash") ?: return Result.failure()

        val tempFile = File(applicationContext.cacheDir, "model-new.gguf")

        // 下載
        downloadFile(url, tempFile) { progress ->
            setProgress(workDataOf("progress" to progress))
        }

        // 驗證
        if (tempFile.sha256() != expectedHash) {
            tempFile.delete()
            return Result.failure()
        }

        // 暫存待切換
        val destination = File(applicationContext.filesDir, "model-pending.gguf")
        tempFile.renameTo(destination)

        return Result.success()
    }
}

// 排程下載
fun scheduleModelDownload(url: String, hash: String) {
    val request = OneTimeWorkRequestBuilder<ModelDownloadWorker>()
        .setConstraints(
            Constraints.Builder()
                .setRequiredNetworkType(NetworkType.UNMETERED) // 僅 WiFi
                .setRequiresStorageNotLow(true)
                .build()
        )
        .setInputData(workDataOf("url" to url, "hash" to hash))
        .build()

    WorkManager.getInstance(context).enqueue(request)
}

模型切換

不要在模型載入時切換。在安全時間點切換：

安全切換策略

下載完成： 新模型儲存為 model-pending.gguf
下次應用程式啟動時（或下次聊天工作階段開始時）： a. 卸載當前模型 b. 將 model-current.gguf 重新命名為 model-previous.gguf c. 將 model-pending.gguf 重新命名為 model-current.gguf d. 載入新模型 e. 更新儲存的版本號
如果新模型載入失敗： 回退到 model-previous.gguf

func swapModelIfPending() throws {
    let pendingPath = modelDirectory.appendingPathComponent("model-pending.gguf")
    let currentPath = modelDirectory.appendingPathComponent("model-current.gguf")
    let previousPath = modelDirectory.appendingPathComponent("model-previous.gguf")

    guard FileManager.default.fileExists(atPath: pendingPath.path) else { return }

    // 卸載當前模型
    engine.unload()

    // 輪轉檔案
    try? FileManager.default.removeItem(at: previousPath) // 移除舊備份
    try? FileManager.default.moveItem(at: currentPath, to: previousPath) // 備份當前
    try FileManager.default.moveItem(at: pendingPath, to: currentPath) // 提升待切換

    // 嘗試載入新模型
    do {
        try engine.load(at: currentPath.path)
        // 成功：更新版本號
        UserDefaults.standard.set(pendingVersion, forKey: "model_version")
    } catch {
        // 回滾
        try? FileManager.default.removeItem(at: currentPath)
        try? FileManager.default.moveItem(at: previousPath, to: currentPath)
        try engine.load(at: currentPath.path)
    }
}

回滾策略

始終保留上一個模型版本：

本地回滾： 在裝置上保留 model-previous.gguf。如果新模型載入失敗或品質不佳，立即回退。
遠端回滾： 在清單中包含回滾 URL。如果發現模型品質問題，更新清單指向上一個版本。所有應用程式將「更新」到舊的、可用的模型。
自動回滾： 如果應用程式在模型切換後偵測到推理失敗或當機，自動回退到上一個版本。

更新頻率

場景	更新頻率	說明
早期產品（快速迭代）	每週到每兩週	快速的品質改進
穩定產品	每月到每季	增量改進
有新基礎模型可用	按需	重大品質提升
訓練資料顯著變化	按需	領域變化

每次更新是一次微調運行（$5-50）加上 CDN 分發。與品質改進相比，成本微不足道。

基礎設施成本

使用者數	月下載量	模型大小	CDN 成本（Cloudflare R2）
1,000	約 200（更新 + 新使用者）	1.7GB	約 $0.01/月
10,000	約 2,000	1.7GB	約 $0.05/月
100,000	約 20,000	1.7GB	約 $0.51/月

藉助 Cloudflare R2 的零出站費定價，OTA 模型分發基本上是免費的。即使有 10 萬使用者，CDN 成本也不到 $1/月。

微調和 GGUF 匯出步驟正是 Ertas 等平台簡化工作流程的地方。在更新的資料上重新訓練，匯出 GGUF，上傳到 CDN，更新清單。您的使用者會自動獲得改進後的模型。

OTA 模型更新：保持裝置端 AI 始終最新

架構

模型清單

更新檢查流程

實作

背景下載

iOS：背景 URLSession

Android：WorkManager

模型切換

安全切換策略

回滾策略

更新頻率

基礎設施成本

Ship AI that runs on your users' devices.

Keep reading

Shipping GGUF Models: App Store Bundling vs Post-Install Download

Migrating from Cloud API to On-Device AI: The Complete Guide

How to Add AI to Your Mobile App: A Developer's Decision Guide