
Shipping GGUF Models: App Store Bundling vs Post-Install Download
Two ways to get your GGUF model onto the user's device. Bundle it with the app for simplicity, or download post-install for flexibility. Architecture, size limits, and best practices for both.
Your model is fine-tuned and exported to GGUF. Now you need to get it onto the user's device. There are two fundamental approaches: bundle it with the app binary, or download it after install.
Each has trade-offs around app size, user experience, update flexibility, and platform constraints.
Option 1: Bundle with the App
Include the GGUF file in the app package. The user downloads the model when they download the app.
iOS Bundling
Direct inclusion: Add the GGUF file to your Xcode project. It ships inside the IPA. Access via Bundle.main:
let modelPath = Bundle.main.path(forResource: "model", ofType: "gguf")!
On Demand Resources (ODR): Tag the model as an on-demand resource. iOS downloads it when first needed, not at app install time. The initial app download stays small.
let request = NSBundleResourceRequest(tags: ["ai-model"])
request.beginAccessingResources { error in
guard error == nil else { return }
let modelPath = Bundle.main.path(forResource: "model", ofType: "gguf")!
loadModel(at: modelPath)
}
ODR files are managed by iOS and may be purged under storage pressure. Your app must handle re-downloading.
Android Bundling
APK assets: For models under 150MB, place the GGUF in assets/. Copy to internal storage on first launch for llama.cpp to access.
Play Asset Delivery: For larger models, use Google Play's asset delivery system:
// Install-time delivery (downloaded with app)
// build.gradle.kts
assetPacks += ":model_pack"
Play Asset Delivery supports three modes:
- Install-time: Downloaded with the app. Simplest but increases initial download.
- Fast-follow: Downloaded immediately after install, in background.
- On-demand: Downloaded when the app requests it.
Size Limits
| Platform | Limit | Notes |
|---|---|---|
| iOS IPA | 4GB | Includes all resources |
| iOS OTA download | 200MB | Cellular download limit (user can override) |
| Android APK | 150MB | Without Play Asset Delivery |
| Android AAB | 150MB base + 2GB assets | With Play Asset Delivery |
| Play Asset Delivery pack | 512MB per pack | Multiple packs allowed |
A 1B GGUF Q4 model (~600MB) fits within iOS's 4GB limit but exceeds the 200MB cellular OTA threshold. On Android, it requires Play Asset Delivery.
A 3B GGUF Q4 model (~1.7GB) fits within both platforms' upper limits but will be a large download.
Pros and Cons of Bundling
Pros:
- Model is immediately available on first launch (no download wait)
- No CDN infrastructure needed
- No network connectivity required for first use
- Simpler architecture (no download/verification logic)
Cons:
- Increases app download size significantly
- Model updates require a full app update through the store
- App Store review for every model change
- Users may hesitate to download a 600MB-1.7GB app
- On iOS, the 200MB cellular limit means users may need WiFi to download
Option 2: Post-Install Download
The app installs without the model. On first launch (or when the user accesses the AI feature), the app downloads the model from your CDN.
Download Flow
[App installed] -> [User opens AI feature] -> [Model not found]
-> [Show download prompt: "Download AI model (1.7GB)?"]
-> [User taps Download] -> [Progress bar]
-> [Download complete] -> [Verify hash] -> [Model ready]
iOS Implementation
class ModelDownloader: ObservableObject {
@Published var progress: Double = 0
@Published var isDownloading = false
@Published var isReady = false
private let modelURL = URL(string: "https://cdn.example.com/model.gguf")!
private var modelPath: URL {
FileManager.default
.urls(for: .documentDirectory, in: .userDomainMask)[0]
.appendingPathComponent("model.gguf")
}
func checkModelAvailable() -> Bool {
FileManager.default.fileExists(atPath: modelPath.path)
}
func downloadModel() async throws {
isDownloading = true
let (tempURL, response) = try await URLSession.shared.download(
from: modelURL,
delegate: ProgressDelegate { progress in
Task { @MainActor in self.progress = progress }
}
)
try FileManager.default.moveItem(at: tempURL, to: modelPath)
// Verify integrity
guard verifyHash(modelPath, expected: expectedSHA256) else {
try FileManager.default.removeItem(at: modelPath)
throw ModelError.corruptedDownload
}
isDownloading = false
isReady = true
}
}
Android Implementation
class ModelDownloader(private val context: Context) {
private val modelFile = File(context.filesDir, "model.gguf")
fun isModelAvailable(): Boolean = modelFile.exists()
suspend fun downloadModel(
onProgress: (Float) -> Unit
) = withContext(Dispatchers.IO) {
val client = OkHttpClient()
val request = Request.Builder().url(MODEL_CDN_URL).build()
val response = client.newCall(request).execute()
val body = response.body ?: throw IOException("Empty response")
val totalBytes = body.contentLength()
var downloadedBytes = 0L
modelFile.outputStream().use { output ->
body.byteStream().use { input ->
val buffer = ByteArray(8192)
var read: Int
while (input.read(buffer).also { read = it } != -1) {
output.write(buffer, 0, read)
downloadedBytes += read
onProgress(downloadedBytes.toFloat() / totalBytes)
}
}
}
// Verify integrity
val hash = modelFile.sha256()
if (hash != EXPECTED_SHA256) {
modelFile.delete()
throw IOException("Corrupted download")
}
}
}
CDN Setup
Host the GGUF file on a CDN for fast, reliable delivery:
- AWS CloudFront + S3: Standard setup. ~$0.085/GB transfer.
- Cloudflare R2: No egress fees for downloads. ~$0.015/GB storage only.
- Firebase Hosting: Simple for small projects. 10GB free, then $0.15/GB.
Cost example at 10,000 monthly downloads of a 1.7GB model:
- CloudFront: ~$1,445/month
- Cloudflare R2: ~$0.26/month (storage only, no egress)
- Firebase: ~$2,550/month
Cloudflare R2's zero-egress pricing makes it dramatically cheaper for model distribution.
Resume Support
Large downloads will be interrupted. Support resume:
// iOS: Resume interrupted download
let resumeData = try? Data(contentsOf: resumeDataURL)
if let resumeData = resumeData {
downloadTask = session.downloadTask(withResumeData: resumeData)
} else {
downloadTask = session.downloadTask(with: modelURL)
}
Pros and Cons of Post-Install Download
Pros:
- Small initial app download (fast install, no store size hesitation)
- Model updates without app store review (push new model to CDN)
- Can offer multiple model sizes (1B for everyone, 3B as upgrade)
- Users only download if they use the AI feature
Cons:
- First-use delay (1-5 minutes for download)
- Requires network for first use
- CDN infrastructure and cost
- More complex code (download, verify, resume, storage management)
The Recommendation
| Scenario | Approach |
|---|---|
| 1B model, AI is core to the app | Bundle (600MB is acceptable) |
| 3B model, AI is core to the app | Fast-follow / On-demand delivery |
| AI is an optional feature | Post-install download |
| Model updates frequently (monthly) | Post-install download |
| Model is stable (quarterly updates) | Bundle or fast-follow |
| Target market has slow internet | Bundle |
For most apps: post-install download with a clear download prompt. This keeps the initial app download small, lets you update models independently, and only downloads for users who actually use the AI feature.
Integrity Verification
Always verify the model file after download. A corrupted GGUF file will cause crashes during inference:
func verifyHash(_ fileURL: URL, expected: String) -> Bool {
guard let data = try? Data(contentsOf: fileURL) else { return false }
let hash = SHA256.hash(data: data)
let hashString = hash.compactMap { String(format: "%02x", $0) }.joined()
return hashString == expected
}
Storage Management
GGUF models are large. Respect the user's storage:
- Show model size before download
- Allow deleting and re-downloading the model
- On iOS, exclude the model from iCloud backup (it can be re-downloaded)
- Handle low-storage scenarios gracefully
The model itself is what makes the difference. A well-fine-tuned GGUF from a platform like Ertas, delivered via either bundling or download, provides domain-specific AI that runs locally, instantly, and at zero per-use cost.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

On-Device AI Model Size Guide: 1B vs 3B vs 7B for Mobile
How to choose the right model size for your mobile app. Capability breakdown, device requirements, quality benchmarks, and the fine-tuning factor that changes the math.

Quantization for Mobile: Q4, Q5, and Q8 Across Real Devices
A practical guide to GGUF quantization levels for mobile deployment. How Q4, Q5, and Q8 affect model size, speed, quality, and memory usage on iPhones and Android devices.

Llama 3.2 for Mobile Apps: Fine-Tuning and On-Device Deployment
A complete guide to using Meta's Llama 3.2 1B and 3B models in mobile apps. Fine-tuning with LoRA, exporting to GGUF, and deploying on iOS and Android via llama.cpp.