OTA Model Updates: Keeping Your On-Device AI Current

On-device AI models are not static. Your training data improves, your fine-tuning gets better, and new base models are released. Updating the model should not require a full app update through the App Store.

Over-the-air (OTA) model updates let you push new GGUF files to users independently of the app binary. The app checks for updates, downloads the new model in the background, and swaps it in seamlessly.

Architecture

The Model Manifest

Host a JSON manifest on your CDN alongside the model file:

{
  "current_version": "2.1.0",
  "models": {
    "1b": {
      "url": "https://cdn.example.com/models/v2.1.0/model-1b-q4.gguf",
      "size_bytes": 612000000,
      "sha256": "a1b2c3d4e5f6...",
      "min_app_version": "3.0.0",
      "release_notes": "Improved classification accuracy"
    },
    "3b": {
      "url": "https://cdn.example.com/models/v2.1.0/model-3b-q4.gguf",
      "size_bytes": 1740000000,
      "sha256": "f6e5d4c3b2a1...",
      "min_app_version": "3.0.0",
      "release_notes": "Better conversation quality"
    }
  },
  "rollback_version": "2.0.0",
  "rollback_url_1b": "https://cdn.example.com/models/v2.0.0/model-1b-q4.gguf",
  "rollback_url_3b": "https://cdn.example.com/models/v2.0.0/model-3b-q4.gguf"
}

The manifest tells the app: what is the latest version, where to download it, how to verify it, and what to fall back to if something goes wrong.

Update Check Flow

[App launch] -> [Fetch manifest from CDN]
  -> [Compare local version to manifest version]
  -> [If newer version available]:
      -> [Check WiFi + sufficient storage]
      -> [Download new model in background]
      -> [Verify SHA256]
      -> [Swap model on next session start]
  -> [If current version matches]: [No action]

Implementation

// iOS: Check for model updates
class ModelUpdater {
    private let manifestURL = URL(string: "https://cdn.example.com/manifest.json")!

    func checkForUpdate() async -> ModelUpdate? {
        guard let data = try? await URLSession.shared.data(from: manifestURL).0,
              let manifest = try? JSONDecoder().decode(ModelManifest.self, from: data)
        else { return nil }

        let currentVersion = UserDefaults.standard.string(forKey: "model_version") ?? "0.0.0"

        if manifest.currentVersion > currentVersion {
            return ModelUpdate(
                version: manifest.currentVersion,
                url: manifest.models[selectedTier]!.url,
                size: manifest.models[selectedTier]!.sizeBytes,
                hash: manifest.models[selectedTier]!.sha256
            )
        }
        return nil
    }
}

// Android: Check for updates on app launch
class ModelUpdater(private val context: Context) {
    suspend fun checkForUpdate(): ModelUpdate? = withContext(Dispatchers.IO) {
        val manifest = fetchManifest() ?: return@withContext null
        val currentVersion = prefs.getString("model_version", "0.0.0")

        if (manifest.currentVersion > currentVersion) {
            val model = manifest.models[selectedTier]
            ModelUpdate(
                version = manifest.currentVersion,
                url = model.url,
                sizeBytes = model.sizeBytes,
                sha256 = model.sha256
            )
        } else null
    }
}

Background Download

Model downloads should happen in the background without blocking the user:

iOS: Background URLSession

func downloadUpdate(_ update: ModelUpdate) {
    let config = URLSessionConfiguration.background(
        withIdentifier: "com.app.model-download"
    )
    let session = URLSession(configuration: config, delegate: self, delegateQueue: nil)
    let task = session.downloadTask(with: update.url)
    task.resume()
}

// Delegate handles completion even if app is suspended
func urlSession(_ session: URLSession, downloadTask: URLSessionDownloadTask,
                didFinishDownloadingTo location: URL) {
    let destination = modelDirectory.appendingPathComponent("model-new.gguf")
    try? FileManager.default.moveItem(at: location, to: destination)

    if verifyHash(destination, expected: pendingUpdate.sha256) {
        // Swap will happen on next session start
        UserDefaults.standard.set(pendingUpdate.version, forKey: "pending_model_version")
    } else {
        try? FileManager.default.removeItem(at: destination)
    }
}

Android: WorkManager

class ModelDownloadWorker(
    context: Context, params: WorkerParameters
) : CoroutineWorker(context, params) {

    override suspend fun doWork(): Result {
        val url = inputData.getString("url") ?: return Result.failure()
        val expectedHash = inputData.getString("hash") ?: return Result.failure()

        val tempFile = File(applicationContext.cacheDir, "model-new.gguf")

        // Download
        downloadFile(url, tempFile) { progress ->
            setProgress(workDataOf("progress" to progress))
        }

        // Verify
        if (tempFile.sha256() != expectedHash) {
            tempFile.delete()
            return Result.failure()
        }

        // Stage for swap
        val destination = File(applicationContext.filesDir, "model-pending.gguf")
        tempFile.renameTo(destination)

        return Result.success()
    }
}

// Schedule the download
fun scheduleModelDownload(url: String, hash: String) {
    val request = OneTimeWorkRequestBuilder<ModelDownloadWorker>()
        .setConstraints(
            Constraints.Builder()
                .setRequiredNetworkType(NetworkType.UNMETERED) // WiFi only
                .setRequiresStorageNotLow(true)
                .build()
        )
        .setInputData(workDataOf("url" to url, "hash" to hash))
        .build()

    WorkManager.getInstance(context).enqueue(request)
}

Model Swapping

Do not swap the model while it is loaded. Swap at a safe point:

Safe Swap Strategy

Download completes: New model saved as model-pending.gguf
On next app launch (or next chat session start): a. Unload current model b. Rename model-current.gguf to model-previous.gguf c. Rename model-pending.gguf to model-current.gguf d. Load new model e. Update stored version number
If new model fails to load: Revert to model-previous.gguf

func swapModelIfPending() throws {
    let pendingPath = modelDirectory.appendingPathComponent("model-pending.gguf")
    let currentPath = modelDirectory.appendingPathComponent("model-current.gguf")
    let previousPath = modelDirectory.appendingPathComponent("model-previous.gguf")

    guard FileManager.default.fileExists(atPath: pendingPath.path) else { return }

    // Unload current model
    engine.unload()

    // Rotate files
    try? FileManager.default.removeItem(at: previousPath) // Remove old backup
    try? FileManager.default.moveItem(at: currentPath, to: previousPath) // Backup current
    try FileManager.default.moveItem(at: pendingPath, to: currentPath) // Promote pending

    // Try loading new model
    do {
        try engine.load(at: currentPath.path)
        // Success: update version
        UserDefaults.standard.set(pendingVersion, forKey: "model_version")
    } catch {
        // Rollback
        try? FileManager.default.removeItem(at: currentPath)
        try? FileManager.default.moveItem(at: previousPath, to: currentPath)
        try engine.load(at: currentPath.path)
    }
}

Rollback Strategy

Always keep the previous model version available:

Local rollback: Keep model-previous.gguf on device. If the new model fails to load or produces poor quality, revert immediately.
Remote rollback: Include rollback URLs in the manifest. If you discover a model quality issue, update the manifest to point back to the previous version. All apps will "update" to the older, working model.
Automatic rollback: If the app detects inference failures or crashes after a model swap, automatically revert to the previous version.

Update Frequency

Scenario	Update Frequency	Notes
Early product (iterating fast)	Weekly-biweekly	Rapid quality improvements
Stable product	Monthly-quarterly	Incremental improvements
New base model available	As needed	Major quality jumps
Training data significantly changes	As needed	Domain shifts

Each update is a fine-tuning run ($5-50) plus CDN distribution. The cost is minimal compared to the quality improvement.

Infrastructure Costs

Users	Downloads/Month	Model Size	CDN Cost (Cloudflare R2)
1,000	~200 (updates + new users)	1.7GB	~$0.01/month
10,000	~2,000	1.7GB	~$0.05/month
100,000	~20,000	1.7GB	~$0.51/month

With Cloudflare R2's zero-egress pricing, OTA model delivery is essentially free. Even at 100K users, the CDN cost is under $1/month.

The fine-tuning and GGUF export step is where platforms like Ertas streamline the workflow. Re-train on updated data, export GGUF, upload to CDN, update the manifest. Your users get the improved model automatically.

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →

OTA Model Updates: Keeping Your On-Device AI Current

Architecture

The Model Manifest

Update Check Flow

Implementation

Background Download

iOS: Background URLSession

Android: WorkManager

Model Swapping

Safe Swap Strategy

Rollback Strategy

Update Frequency

Infrastructure Costs

Ship AI that runs on your users' devices.

Ship AI that runs on your users' devices.

Keep reading

Shipping GGUF Models: App Store Bundling vs Post-Install Download

Migrating from Cloud API to On-Device AI: The Complete Guide

How to Add AI to Your Mobile App: A Developer's Decision Guide