
Offline AI: Building Mobile Features That Work Without Internet
How to build AI features that work without an internet connection. On-device models, offline-first architecture patterns, and the use cases where offline AI is not optional.
Cloud AI has a hard dependency: the internet. When the connection drops, the feature breaks. For many mobile use cases, this is not just an inconvenience. It is a deal-breaker.
A travel translation app that fails without roaming data. A field service assistant that stops working at a construction site with spotty coverage. A health app that cannot process data in a remote clinic. A note-taking app that loses its AI features on a plane.
Offline AI is not a nice-to-have for these apps. It is a core requirement. And with on-device inference, it is entirely achievable.
Where Offline AI Is Essential
Travel and International
Users traveling internationally often avoid roaming data charges. Airport WiFi is unreliable. Translation, navigation assistance, and travel planning need to work without a connection. A language app that only translates when connected to WiFi is useless at the exact moment users need it most.
Field Work
Construction sites, agricultural fields, mining operations, remote inspections. These environments frequently have poor or no cellular coverage. Workers using mobile apps for documentation, measurement, quality inspection, or reporting need AI features that work on-site.
Healthcare
Patient data should not leave the device for privacy and regulatory reasons (HIPAA, GDPR). Clinical decision support, note transcription, medical coding. These features must work in clinic environments where WiFi may be restricted or unreliable.
Developing Markets
Over 3 billion people worldwide have intermittent internet access. Apps targeting these markets cannot assume always-on connectivity. AI features that work offline have a dramatically larger addressable market.
Everyday Interruptions
Subways, elevators, airplanes, basements, rural areas. Even in developed markets, connectivity gaps are constant. An AI feature that shows a loading spinner or error in these moments trains users to not rely on it.
The Architecture: Offline-First with On-Device Inference
The key insight is architectural. Instead of building cloud-first with an offline fallback (which is fragile), build offline-first with optional cloud augmentation.
Model Delivery
The GGUF model file needs to be on the device before AI features work. Two approaches:
Bundle with app: Include the model in the app binary or as an app asset. The user downloads it with the app. Simplest approach for models under 200MB (fits within cellular download limits). For larger models, use iOS App Thinning or Android Asset Delivery.
Post-install download: App installs without the model. On first launch, detect the missing model and prompt the user to download it over WiFi. This keeps the initial app download small while still delivering the model before it is needed.
// iOS: Check for model and prompt download
func ensureModelAvailable() -> Bool {
let modelPath = FileManager.default
.urls(for: .documentDirectory, in: .userDomainMask)[0]
.appendingPathComponent("model.gguf")
if FileManager.default.fileExists(atPath: modelPath.path) {
return true
}
// Show download UI
showModelDownloadPrompt()
return false
}
Local Inference
Once the model is on-device, inference is entirely local. llama.cpp handles the computation using the device's CPU and GPU (Metal on iOS, Vulkan on Android). No network call required.
The inference pipeline:
- User provides input (text, voice transcription, etc.)
- Input is tokenized locally
- Model generates response token by token
- Tokens are decoded and displayed in the UI
Every step happens on the device. The internet is not involved.
Sync-When-Available
For features that benefit from cloud connectivity (analytics, model updates, user data sync), use a queue-and-sync pattern:
- Queue analytics events locally (SQLite, Core Data, Room)
- When connectivity returns, sync the queue to your backend
- Check for model updates on app launch when connected
- Never block the user experience on network availability
What Works Offline
| Feature | Offline Feasible? | Model Size Needed | Notes |
|---|---|---|---|
| Text classification | Yes | 1B (~600MB) | Fast, small model sufficient |
| Chat assistant | Yes | 3B (~1.7GB) | Quality generation needs 3B |
| Summarization | Yes | 3B (~1.7GB) | Good quality at 3B |
| Translation | Yes | 1-3B | Depends on language pair |
| Content drafting | Yes | 3B (~1.7GB) | Email replies, notes, messages |
| Autocomplete | Yes | 1B (~600MB) | Speed matters, 1B is fast |
| Sentiment analysis | Yes | 1B (~600MB) | Simple classification task |
| Named entity extraction | Yes | 1-3B | Structured output task |
| Web search | No | N/A | Requires live internet data |
| Real-time data retrieval | No | N/A | Requires server connection |
| Image generation | Partially | 1-2GB+ | Possible but memory-intensive |
The pattern: any task that processes the user's input and generates output from the model's trained knowledge works offline. Tasks that require external, real-time data do not.
Model Management for Offline-First
Version Checking
// Android: Check model version on app launch
suspend fun checkModelUpdate() {
if (!isNetworkAvailable()) return
val manifest = fetchModelManifest() // JSON from your CDN
val currentVersion = prefs.getString("model_version", "")
if (manifest.version != currentVersion) {
// Queue background download
scheduleModelDownload(manifest.url, manifest.checksum)
}
}
Storage Management
GGUF models are 600MB-1.7GB. Storage management matters:
- Check available space before download
- Show the model's size to the user before they download
- Allow users to delete and re-download the model
- On iOS, mark the model file so it is not backed up to iCloud (it can be re-downloaded)
- Handle low storage warnings by offering to remove the model
Integrity Verification
Always verify the model file after download:
// SHA256 verification
func verifyModel(at path: URL, expectedHash: String) -> Bool {
guard let data = try? Data(contentsOf: path) else { return false }
let hash = SHA256.hash(data: data)
let hashString = hash.compactMap { String(format: "%02x", $0) }.joined()
return hashString == expectedHash
}
UX Patterns for Offline AI
Model Download State
- Show a clear progress indicator during download
- Allow pausing and resuming
- Resume automatically on network reconnection
- Show the download size before starting
Feature Availability
- When the model is not yet downloaded, show a clear message explaining why the AI feature is unavailable and how to enable it
- When the model is loaded and ready, show no special indicator. Offline AI should feel native.
- Never show "offline mode" badges or degraded UI. The on-device model IS the feature.
Graceful Fallback
For hybrid apps that use both on-device and cloud AI:
- Default to on-device for all supported tasks
- If connected and the task exceeds on-device capability, optionally route to cloud
- Never fail a user request because the cloud is unavailable if on-device can handle it
Building the Offline-First Pipeline
The path to offline AI:
- Choose your model: 1B for simple tasks (classification, autocomplete), 3B for generation and chat
- Fine-tune on your domain: Use your app's data to train a model that excels at your specific tasks. Platforms like Ertas provide a visual fine-tuning pipeline that exports GGUF files ready for mobile deployment.
- Integrate llama.cpp: Add the inference library to your iOS or Android project
- Implement model delivery: Bundle or post-install download
- Build offline-first: Design every AI interaction to work without network, add cloud augmentation only where essential
The result is an AI feature that works everywhere your users go. Subway. Airplane. Construction site. Rural clinic. No loading spinners. No error messages. Just instant, private, reliable AI.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

AI in iOS Apps: CoreML, Cloud APIs, and On-Device LLMs Compared
Three paths to AI in your iOS app. CoreML for Apple's ecosystem, cloud APIs for capability, and on-device LLMs via llama.cpp for cost and privacy. A practical comparison for Swift developers.

AI in Android Apps: ML Kit, Cloud APIs, and On-Device LLMs Compared
Three paths to AI in your Android app. Google ML Kit for common tasks, cloud APIs for full LLM capability, and on-device models via llama.cpp for cost and privacy. A practical comparison for Kotlin developers.

AI in React Native: From Cloud APIs to On-Device Models
How to add AI features to React Native apps. Cloud API integration with fetch, on-device inference with llama.cpp bindings, and a practical migration path from one to the other.