Offline AI: Building Mobile Features That Work Without Internet

Cloud AI has a hard dependency: the internet. When the connection drops, the feature breaks. For many mobile use cases, this is not just an inconvenience. It is a deal-breaker.

A travel translation app that fails without roaming data. A field service assistant that stops working at a construction site with spotty coverage. A health app that cannot process data in a remote clinic. A note-taking app that loses its AI features on a plane.

Offline AI is not a nice-to-have for these apps. It is a core requirement. And with on-device inference, it is entirely achievable.

Where Offline AI Is Essential

Travel and International

Users traveling internationally often avoid roaming data charges. Airport WiFi is unreliable. Translation, navigation assistance, and travel planning need to work without a connection. A language app that only translates when connected to WiFi is useless at the exact moment users need it most.

Field Work

Construction sites, agricultural fields, mining operations, remote inspections. These environments frequently have poor or no cellular coverage. Workers using mobile apps for documentation, measurement, quality inspection, or reporting need AI features that work on-site.

Healthcare

Patient data should not leave the device for privacy and regulatory reasons (HIPAA, GDPR). Clinical decision support, note transcription, medical coding. These features must work in clinic environments where WiFi may be restricted or unreliable.

Developing Markets

Over 3 billion people worldwide have intermittent internet access. Apps targeting these markets cannot assume always-on connectivity. AI features that work offline have a dramatically larger addressable market.

Everyday Interruptions

Subways, elevators, airplanes, basements, rural areas. Even in developed markets, connectivity gaps are constant. An AI feature that shows a loading spinner or error in these moments trains users to not rely on it.

The Architecture: Offline-First with On-Device Inference

The key insight is architectural. Instead of building cloud-first with an offline fallback (which is fragile), build offline-first with optional cloud augmentation.

Model Delivery

The GGUF model file needs to be on the device before AI features work. Two approaches:

Bundle with app: Include the model in the app binary or as an app asset. The user downloads it with the app. Simplest approach for models under 200MB (fits within cellular download limits). For larger models, use iOS App Thinning or Android Asset Delivery.

Post-install download: App installs without the model. On first launch, detect the missing model and prompt the user to download it over WiFi. This keeps the initial app download small while still delivering the model before it is needed.

// iOS: Check for model and prompt download
func ensureModelAvailable() -> Bool {
    let modelPath = FileManager.default
        .urls(for: .documentDirectory, in: .userDomainMask)[0]
        .appendingPathComponent("model.gguf")

    if FileManager.default.fileExists(atPath: modelPath.path) {
        return true
    }

    // Show download UI
    showModelDownloadPrompt()
    return false
}

Local Inference

Once the model is on-device, inference is entirely local. llama.cpp handles the computation using the device's CPU and GPU (Metal on iOS, Vulkan on Android). No network call required.

The inference pipeline:

User provides input (text, voice transcription, etc.)
Input is tokenized locally
Model generates response token by token
Tokens are decoded and displayed in the UI

Every step happens on the device. The internet is not involved.

Sync-When-Available

For features that benefit from cloud connectivity (analytics, model updates, user data sync), use a queue-and-sync pattern:

Queue analytics events locally (SQLite, Core Data, Room)
When connectivity returns, sync the queue to your backend
Check for model updates on app launch when connected
Never block the user experience on network availability

What Works Offline

Feature	Offline Feasible?	Model Size Needed	Notes
Text classification	Yes	1B (~600MB)	Fast, small model sufficient
Chat assistant	Yes	3B (~1.7GB)	Quality generation needs 3B
Summarization	Yes	3B (~1.7GB)	Good quality at 3B
Translation	Yes	1-3B	Depends on language pair
Content drafting	Yes	3B (~1.7GB)	Email replies, notes, messages
Autocomplete	Yes	1B (~600MB)	Speed matters, 1B is fast
Sentiment analysis	Yes	1B (~600MB)	Simple classification task
Named entity extraction	Yes	1-3B	Structured output task
Web search	No	N/A	Requires live internet data
Real-time data retrieval	No	N/A	Requires server connection
Image generation	Partially	1-2GB+	Possible but memory-intensive

The pattern: any task that processes the user's input and generates output from the model's trained knowledge works offline. Tasks that require external, real-time data do not.

Model Management for Offline-First

Version Checking

// Android: Check model version on app launch
suspend fun checkModelUpdate() {
    if (!isNetworkAvailable()) return

    val manifest = fetchModelManifest() // JSON from your CDN
    val currentVersion = prefs.getString("model_version", "")

    if (manifest.version != currentVersion) {
        // Queue background download
        scheduleModelDownload(manifest.url, manifest.checksum)
    }
}

Storage Management

GGUF models are 600MB-1.7GB. Storage management matters:

Check available space before download
Show the model's size to the user before they download
Allow users to delete and re-download the model
On iOS, mark the model file so it is not backed up to iCloud (it can be re-downloaded)
Handle low storage warnings by offering to remove the model

Integrity Verification

Always verify the model file after download:

// SHA256 verification
func verifyModel(at path: URL, expectedHash: String) -> Bool {
    guard let data = try? Data(contentsOf: path) else { return false }
    let hash = SHA256.hash(data: data)
    let hashString = hash.compactMap { String(format: "%02x", $0) }.joined()
    return hashString == expectedHash
}

UX Patterns for Offline AI

Model Download State

Show a clear progress indicator during download
Allow pausing and resuming
Resume automatically on network reconnection
Show the download size before starting

Feature Availability

When the model is not yet downloaded, show a clear message explaining why the AI feature is unavailable and how to enable it
When the model is loaded and ready, show no special indicator. Offline AI should feel native.
Never show "offline mode" badges or degraded UI. The on-device model IS the feature.

Graceful Fallback

For hybrid apps that use both on-device and cloud AI:

Default to on-device for all supported tasks
If connected and the task exceeds on-device capability, optionally route to cloud
Never fail a user request because the cloud is unavailable if on-device can handle it

Building the Offline-First Pipeline

The path to offline AI:

Choose your model: 1B for simple tasks (classification, autocomplete), 3B for generation and chat
Fine-tune on your domain: Use your app's data to train a model that excels at your specific tasks. Platforms like Ertas provide a visual fine-tuning pipeline that exports GGUF files ready for mobile deployment.
Integrate llama.cpp: Add the inference library to your iOS or Android project
Implement model delivery: Bundle or post-install download
Build offline-first: Design every AI interaction to work without network, add cloud augmentation only where essential

The result is an AI feature that works everywhere your users go. Subway. Airplane. Construction site. Rural clinic. No loading spinners. No error messages. Just instant, private, reliable AI.

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →

Offline AI: Building Mobile Features That Work Without Internet

Where Offline AI Is Essential

Travel and International

Field Work

Healthcare

Developing Markets

Everyday Interruptions

The Architecture: Offline-First with On-Device Inference

Model Delivery

Local Inference

Sync-When-Available

What Works Offline

Model Management for Offline-First

Version Checking

Storage Management

Integrity Verification

UX Patterns for Offline AI

Model Download State

Feature Availability

Graceful Fallback

Building the Offline-First Pipeline

Ship AI that runs on your users' devices.

Ship AI that runs on your users' devices.

Keep reading

AI in iOS Apps: CoreML, Cloud APIs, and On-Device LLMs Compared

AI in Android Apps: ML Kit, Cloud APIs, and On-Device LLMs Compared

AI in React Native: From Cloud APIs to On-Device Models