
OTA Model Updates: Keeping Your On-Device AI Current
How to push model updates to users without an app store release. Version checking, background downloads, rollback strategies, and the infrastructure for over-the-air model delivery.
On-device AI models are not static. Your training data improves, your fine-tuning gets better, and new base models are released. Updating the model should not require a full app update through the App Store.
Over-the-air (OTA) model updates let you push new GGUF files to users independently of the app binary. The app checks for updates, downloads the new model in the background, and swaps it in seamlessly.
Architecture
The Model Manifest
Host a JSON manifest on your CDN alongside the model file:
{
"current_version": "2.1.0",
"models": {
"1b": {
"url": "https://cdn.example.com/models/v2.1.0/model-1b-q4.gguf",
"size_bytes": 612000000,
"sha256": "a1b2c3d4e5f6...",
"min_app_version": "3.0.0",
"release_notes": "Improved classification accuracy"
},
"3b": {
"url": "https://cdn.example.com/models/v2.1.0/model-3b-q4.gguf",
"size_bytes": 1740000000,
"sha256": "f6e5d4c3b2a1...",
"min_app_version": "3.0.0",
"release_notes": "Better conversation quality"
}
},
"rollback_version": "2.0.0",
"rollback_url_1b": "https://cdn.example.com/models/v2.0.0/model-1b-q4.gguf",
"rollback_url_3b": "https://cdn.example.com/models/v2.0.0/model-3b-q4.gguf"
}
The manifest tells the app: what is the latest version, where to download it, how to verify it, and what to fall back to if something goes wrong.
Update Check Flow
[App launch] -> [Fetch manifest from CDN]
-> [Compare local version to manifest version]
-> [If newer version available]:
-> [Check WiFi + sufficient storage]
-> [Download new model in background]
-> [Verify SHA256]
-> [Swap model on next session start]
-> [If current version matches]: [No action]
Implementation
// iOS: Check for model updates
class ModelUpdater {
private let manifestURL = URL(string: "https://cdn.example.com/manifest.json")!
func checkForUpdate() async -> ModelUpdate? {
guard let data = try? await URLSession.shared.data(from: manifestURL).0,
let manifest = try? JSONDecoder().decode(ModelManifest.self, from: data)
else { return nil }
let currentVersion = UserDefaults.standard.string(forKey: "model_version") ?? "0.0.0"
if manifest.currentVersion > currentVersion {
return ModelUpdate(
version: manifest.currentVersion,
url: manifest.models[selectedTier]!.url,
size: manifest.models[selectedTier]!.sizeBytes,
hash: manifest.models[selectedTier]!.sha256
)
}
return nil
}
}
// Android: Check for updates on app launch
class ModelUpdater(private val context: Context) {
suspend fun checkForUpdate(): ModelUpdate? = withContext(Dispatchers.IO) {
val manifest = fetchManifest() ?: return@withContext null
val currentVersion = prefs.getString("model_version", "0.0.0")
if (manifest.currentVersion > currentVersion) {
val model = manifest.models[selectedTier]
ModelUpdate(
version = manifest.currentVersion,
url = model.url,
sizeBytes = model.sizeBytes,
sha256 = model.sha256
)
} else null
}
}
Background Download
Model downloads should happen in the background without blocking the user:
iOS: Background URLSession
func downloadUpdate(_ update: ModelUpdate) {
let config = URLSessionConfiguration.background(
withIdentifier: "com.app.model-download"
)
let session = URLSession(configuration: config, delegate: self, delegateQueue: nil)
let task = session.downloadTask(with: update.url)
task.resume()
}
// Delegate handles completion even if app is suspended
func urlSession(_ session: URLSession, downloadTask: URLSessionDownloadTask,
didFinishDownloadingTo location: URL) {
let destination = modelDirectory.appendingPathComponent("model-new.gguf")
try? FileManager.default.moveItem(at: location, to: destination)
if verifyHash(destination, expected: pendingUpdate.sha256) {
// Swap will happen on next session start
UserDefaults.standard.set(pendingUpdate.version, forKey: "pending_model_version")
} else {
try? FileManager.default.removeItem(at: destination)
}
}
Android: WorkManager
class ModelDownloadWorker(
context: Context, params: WorkerParameters
) : CoroutineWorker(context, params) {
override suspend fun doWork(): Result {
val url = inputData.getString("url") ?: return Result.failure()
val expectedHash = inputData.getString("hash") ?: return Result.failure()
val tempFile = File(applicationContext.cacheDir, "model-new.gguf")
// Download
downloadFile(url, tempFile) { progress ->
setProgress(workDataOf("progress" to progress))
}
// Verify
if (tempFile.sha256() != expectedHash) {
tempFile.delete()
return Result.failure()
}
// Stage for swap
val destination = File(applicationContext.filesDir, "model-pending.gguf")
tempFile.renameTo(destination)
return Result.success()
}
}
// Schedule the download
fun scheduleModelDownload(url: String, hash: String) {
val request = OneTimeWorkRequestBuilder<ModelDownloadWorker>()
.setConstraints(
Constraints.Builder()
.setRequiredNetworkType(NetworkType.UNMETERED) // WiFi only
.setRequiresStorageNotLow(true)
.build()
)
.setInputData(workDataOf("url" to url, "hash" to hash))
.build()
WorkManager.getInstance(context).enqueue(request)
}
Model Swapping
Do not swap the model while it is loaded. Swap at a safe point:
Safe Swap Strategy
- Download completes: New model saved as
model-pending.gguf - On next app launch (or next chat session start):
a. Unload current model
b. Rename
model-current.gguftomodel-previous.ggufc. Renamemodel-pending.gguftomodel-current.ggufd. Load new model e. Update stored version number - If new model fails to load: Revert to
model-previous.gguf
func swapModelIfPending() throws {
let pendingPath = modelDirectory.appendingPathComponent("model-pending.gguf")
let currentPath = modelDirectory.appendingPathComponent("model-current.gguf")
let previousPath = modelDirectory.appendingPathComponent("model-previous.gguf")
guard FileManager.default.fileExists(atPath: pendingPath.path) else { return }
// Unload current model
engine.unload()
// Rotate files
try? FileManager.default.removeItem(at: previousPath) // Remove old backup
try? FileManager.default.moveItem(at: currentPath, to: previousPath) // Backup current
try FileManager.default.moveItem(at: pendingPath, to: currentPath) // Promote pending
// Try loading new model
do {
try engine.load(at: currentPath.path)
// Success: update version
UserDefaults.standard.set(pendingVersion, forKey: "model_version")
} catch {
// Rollback
try? FileManager.default.removeItem(at: currentPath)
try? FileManager.default.moveItem(at: previousPath, to: currentPath)
try engine.load(at: currentPath.path)
}
}
Rollback Strategy
Always keep the previous model version available:
- Local rollback: Keep
model-previous.ggufon device. If the new model fails to load or produces poor quality, revert immediately. - Remote rollback: Include rollback URLs in the manifest. If you discover a model quality issue, update the manifest to point back to the previous version. All apps will "update" to the older, working model.
- Automatic rollback: If the app detects inference failures or crashes after a model swap, automatically revert to the previous version.
Update Frequency
| Scenario | Update Frequency | Notes |
|---|---|---|
| Early product (iterating fast) | Weekly-biweekly | Rapid quality improvements |
| Stable product | Monthly-quarterly | Incremental improvements |
| New base model available | As needed | Major quality jumps |
| Training data significantly changes | As needed | Domain shifts |
Each update is a fine-tuning run ($5-50) plus CDN distribution. The cost is minimal compared to the quality improvement.
Infrastructure Costs
| Users | Downloads/Month | Model Size | CDN Cost (Cloudflare R2) |
|---|---|---|---|
| 1,000 | ~200 (updates + new users) | 1.7GB | ~$0.01/month |
| 10,000 | ~2,000 | 1.7GB | ~$0.05/month |
| 100,000 | ~20,000 | 1.7GB | ~$0.51/month |
With Cloudflare R2's zero-egress pricing, OTA model delivery is essentially free. Even at 100K users, the CDN cost is under $1/month.
The fine-tuning and GGUF export step is where platforms like Ertas streamline the workflow. Re-train on updated data, export GGUF, upload to CDN, update the manifest. Your users get the improved model automatically.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Shipping GGUF Models: App Store Bundling vs Post-Install Download
Two ways to get your GGUF model onto the user's device. Bundle it with the app for simplicity, or download post-install for flexibility. Architecture, size limits, and best practices for both.

Migrating from Cloud API to On-Device AI: The Complete Guide
A step-by-step migration plan for moving your mobile app from cloud AI APIs to on-device inference. Data extraction, fine-tuning, integration, testing, rollout, and monitoring.

How to Add AI to Your Mobile App: A Developer's Decision Guide
A comprehensive guide covering every approach to adding AI features to iOS and Android apps. Cloud APIs, on-device models, and hybrid architectures compared with real cost and performance data.