Back to blog
    OTA Model Updates: Keeping Your On-Device AI Current
    OTA updatesmodel managementdeploymentmobile AIinfrastructuresegment:mobile-builder

    OTA Model Updates: Keeping Your On-Device AI Current

    How to push model updates to users without an app store release. Version checking, background downloads, rollback strategies, and the infrastructure for over-the-air model delivery.

    EErtas Team·

    On-device AI models are not static. Your training data improves, your fine-tuning gets better, and new base models are released. Updating the model should not require a full app update through the App Store.

    Over-the-air (OTA) model updates let you push new GGUF files to users independently of the app binary. The app checks for updates, downloads the new model in the background, and swaps it in seamlessly.

    Architecture

    The Model Manifest

    Host a JSON manifest on your CDN alongside the model file:

    {
      "current_version": "2.1.0",
      "models": {
        "1b": {
          "url": "https://cdn.example.com/models/v2.1.0/model-1b-q4.gguf",
          "size_bytes": 612000000,
          "sha256": "a1b2c3d4e5f6...",
          "min_app_version": "3.0.0",
          "release_notes": "Improved classification accuracy"
        },
        "3b": {
          "url": "https://cdn.example.com/models/v2.1.0/model-3b-q4.gguf",
          "size_bytes": 1740000000,
          "sha256": "f6e5d4c3b2a1...",
          "min_app_version": "3.0.0",
          "release_notes": "Better conversation quality"
        }
      },
      "rollback_version": "2.0.0",
      "rollback_url_1b": "https://cdn.example.com/models/v2.0.0/model-1b-q4.gguf",
      "rollback_url_3b": "https://cdn.example.com/models/v2.0.0/model-3b-q4.gguf"
    }
    

    The manifest tells the app: what is the latest version, where to download it, how to verify it, and what to fall back to if something goes wrong.

    Update Check Flow

    [App launch] -> [Fetch manifest from CDN]
      -> [Compare local version to manifest version]
      -> [If newer version available]:
          -> [Check WiFi + sufficient storage]
          -> [Download new model in background]
          -> [Verify SHA256]
          -> [Swap model on next session start]
      -> [If current version matches]: [No action]
    

    Implementation

    // iOS: Check for model updates
    class ModelUpdater {
        private let manifestURL = URL(string: "https://cdn.example.com/manifest.json")!
    
        func checkForUpdate() async -> ModelUpdate? {
            guard let data = try? await URLSession.shared.data(from: manifestURL).0,
                  let manifest = try? JSONDecoder().decode(ModelManifest.self, from: data)
            else { return nil }
    
            let currentVersion = UserDefaults.standard.string(forKey: "model_version") ?? "0.0.0"
    
            if manifest.currentVersion > currentVersion {
                return ModelUpdate(
                    version: manifest.currentVersion,
                    url: manifest.models[selectedTier]!.url,
                    size: manifest.models[selectedTier]!.sizeBytes,
                    hash: manifest.models[selectedTier]!.sha256
                )
            }
            return nil
        }
    }
    
    // Android: Check for updates on app launch
    class ModelUpdater(private val context: Context) {
        suspend fun checkForUpdate(): ModelUpdate? = withContext(Dispatchers.IO) {
            val manifest = fetchManifest() ?: return@withContext null
            val currentVersion = prefs.getString("model_version", "0.0.0")
    
            if (manifest.currentVersion > currentVersion) {
                val model = manifest.models[selectedTier]
                ModelUpdate(
                    version = manifest.currentVersion,
                    url = model.url,
                    sizeBytes = model.sizeBytes,
                    sha256 = model.sha256
                )
            } else null
        }
    }
    

    Background Download

    Model downloads should happen in the background without blocking the user:

    iOS: Background URLSession

    func downloadUpdate(_ update: ModelUpdate) {
        let config = URLSessionConfiguration.background(
            withIdentifier: "com.app.model-download"
        )
        let session = URLSession(configuration: config, delegate: self, delegateQueue: nil)
        let task = session.downloadTask(with: update.url)
        task.resume()
    }
    
    // Delegate handles completion even if app is suspended
    func urlSession(_ session: URLSession, downloadTask: URLSessionDownloadTask,
                    didFinishDownloadingTo location: URL) {
        let destination = modelDirectory.appendingPathComponent("model-new.gguf")
        try? FileManager.default.moveItem(at: location, to: destination)
    
        if verifyHash(destination, expected: pendingUpdate.sha256) {
            // Swap will happen on next session start
            UserDefaults.standard.set(pendingUpdate.version, forKey: "pending_model_version")
        } else {
            try? FileManager.default.removeItem(at: destination)
        }
    }
    

    Android: WorkManager

    class ModelDownloadWorker(
        context: Context, params: WorkerParameters
    ) : CoroutineWorker(context, params) {
    
        override suspend fun doWork(): Result {
            val url = inputData.getString("url") ?: return Result.failure()
            val expectedHash = inputData.getString("hash") ?: return Result.failure()
    
            val tempFile = File(applicationContext.cacheDir, "model-new.gguf")
    
            // Download
            downloadFile(url, tempFile) { progress ->
                setProgress(workDataOf("progress" to progress))
            }
    
            // Verify
            if (tempFile.sha256() != expectedHash) {
                tempFile.delete()
                return Result.failure()
            }
    
            // Stage for swap
            val destination = File(applicationContext.filesDir, "model-pending.gguf")
            tempFile.renameTo(destination)
    
            return Result.success()
        }
    }
    
    // Schedule the download
    fun scheduleModelDownload(url: String, hash: String) {
        val request = OneTimeWorkRequestBuilder<ModelDownloadWorker>()
            .setConstraints(
                Constraints.Builder()
                    .setRequiredNetworkType(NetworkType.UNMETERED) // WiFi only
                    .setRequiresStorageNotLow(true)
                    .build()
            )
            .setInputData(workDataOf("url" to url, "hash" to hash))
            .build()
    
        WorkManager.getInstance(context).enqueue(request)
    }
    

    Model Swapping

    Do not swap the model while it is loaded. Swap at a safe point:

    Safe Swap Strategy

    1. Download completes: New model saved as model-pending.gguf
    2. On next app launch (or next chat session start): a. Unload current model b. Rename model-current.gguf to model-previous.gguf c. Rename model-pending.gguf to model-current.gguf d. Load new model e. Update stored version number
    3. If new model fails to load: Revert to model-previous.gguf
    func swapModelIfPending() throws {
        let pendingPath = modelDirectory.appendingPathComponent("model-pending.gguf")
        let currentPath = modelDirectory.appendingPathComponent("model-current.gguf")
        let previousPath = modelDirectory.appendingPathComponent("model-previous.gguf")
    
        guard FileManager.default.fileExists(atPath: pendingPath.path) else { return }
    
        // Unload current model
        engine.unload()
    
        // Rotate files
        try? FileManager.default.removeItem(at: previousPath) // Remove old backup
        try? FileManager.default.moveItem(at: currentPath, to: previousPath) // Backup current
        try FileManager.default.moveItem(at: pendingPath, to: currentPath) // Promote pending
    
        // Try loading new model
        do {
            try engine.load(at: currentPath.path)
            // Success: update version
            UserDefaults.standard.set(pendingVersion, forKey: "model_version")
        } catch {
            // Rollback
            try? FileManager.default.removeItem(at: currentPath)
            try? FileManager.default.moveItem(at: previousPath, to: currentPath)
            try engine.load(at: currentPath.path)
        }
    }
    

    Rollback Strategy

    Always keep the previous model version available:

    • Local rollback: Keep model-previous.gguf on device. If the new model fails to load or produces poor quality, revert immediately.
    • Remote rollback: Include rollback URLs in the manifest. If you discover a model quality issue, update the manifest to point back to the previous version. All apps will "update" to the older, working model.
    • Automatic rollback: If the app detects inference failures or crashes after a model swap, automatically revert to the previous version.

    Update Frequency

    ScenarioUpdate FrequencyNotes
    Early product (iterating fast)Weekly-biweeklyRapid quality improvements
    Stable productMonthly-quarterlyIncremental improvements
    New base model availableAs neededMajor quality jumps
    Training data significantly changesAs neededDomain shifts

    Each update is a fine-tuning run ($5-50) plus CDN distribution. The cost is minimal compared to the quality improvement.

    Infrastructure Costs

    UsersDownloads/MonthModel SizeCDN Cost (Cloudflare R2)
    1,000~200 (updates + new users)1.7GB~$0.01/month
    10,000~2,0001.7GB~$0.05/month
    100,000~20,0001.7GB~$0.51/month

    With Cloudflare R2's zero-egress pricing, OTA model delivery is essentially free. Even at 100K users, the CDN cost is under $1/month.

    The fine-tuning and GGUF export step is where platforms like Ertas streamline the workflow. Re-train on updated data, export GGUF, upload to CDN, update the manifest. Your users get the improved model automatically.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading