Shipping GGUF Models: App Store Bundling vs Post-Install Download

Your model is fine-tuned and exported to GGUF. Now you need to get it onto the user's device. There are two fundamental approaches: bundle it with the app binary, or download it after install.

Each has trade-offs around app size, user experience, update flexibility, and platform constraints.

Option 1: Bundle with the App

Include the GGUF file in the app package. The user downloads the model when they download the app.

iOS Bundling

Direct inclusion: Add the GGUF file to your Xcode project. It ships inside the IPA. Access via Bundle.main:

let modelPath = Bundle.main.path(forResource: "model", ofType: "gguf")!

On Demand Resources (ODR): Tag the model as an on-demand resource. iOS downloads it when first needed, not at app install time. The initial app download stays small.

let request = NSBundleResourceRequest(tags: ["ai-model"])
request.beginAccessingResources { error in
    guard error == nil else { return }
    let modelPath = Bundle.main.path(forResource: "model", ofType: "gguf")!
    loadModel(at: modelPath)
}

ODR files are managed by iOS and may be purged under storage pressure. Your app must handle re-downloading.

Android Bundling

APK assets: For models under 150MB, place the GGUF in assets/. Copy to internal storage on first launch for llama.cpp to access.

Play Asset Delivery: For larger models, use Google Play's asset delivery system:

// Install-time delivery (downloaded with app)
// build.gradle.kts
assetPacks += ":model_pack"

Play Asset Delivery supports three modes:

Install-time: Downloaded with the app. Simplest but increases initial download.
Fast-follow: Downloaded immediately after install, in background.
On-demand: Downloaded when the app requests it.

Size Limits

Platform	Limit	Notes
iOS IPA	4GB	Includes all resources
iOS OTA download	200MB	Cellular download limit (user can override)
Android APK	150MB	Without Play Asset Delivery
Android AAB	150MB base + 2GB assets	With Play Asset Delivery
Play Asset Delivery pack	512MB per pack	Multiple packs allowed

A 1B GGUF Q4 model (~600MB) fits within iOS's 4GB limit but exceeds the 200MB cellular OTA threshold. On Android, it requires Play Asset Delivery.

A 3B GGUF Q4 model (~1.7GB) fits within both platforms' upper limits but will be a large download.

Pros and Cons of Bundling

Pros:

Model is immediately available on first launch (no download wait)
No CDN infrastructure needed
No network connectivity required for first use
Simpler architecture (no download/verification logic)

Cons:

Increases app download size significantly
Model updates require a full app update through the store
App Store review for every model change
Users may hesitate to download a 600MB-1.7GB app
On iOS, the 200MB cellular limit means users may need WiFi to download

Option 2: Post-Install Download

The app installs without the model. On first launch (or when the user accesses the AI feature), the app downloads the model from your CDN.

Download Flow

[App installed] -> [User opens AI feature] -> [Model not found]
  -> [Show download prompt: "Download AI model (1.7GB)?"]
  -> [User taps Download] -> [Progress bar]
  -> [Download complete] -> [Verify hash] -> [Model ready]

iOS Implementation

class ModelDownloader: ObservableObject {
    @Published var progress: Double = 0
    @Published var isDownloading = false
    @Published var isReady = false

    private let modelURL = URL(string: "https://cdn.example.com/model.gguf")!
    private var modelPath: URL {
        FileManager.default
            .urls(for: .documentDirectory, in: .userDomainMask)[0]
            .appendingPathComponent("model.gguf")
    }

    func checkModelAvailable() -> Bool {
        FileManager.default.fileExists(atPath: modelPath.path)
    }

    func downloadModel() async throws {
        isDownloading = true

        let (tempURL, response) = try await URLSession.shared.download(
            from: modelURL,
            delegate: ProgressDelegate { progress in
                Task { @MainActor in self.progress = progress }
            }
        )

        try FileManager.default.moveItem(at: tempURL, to: modelPath)

        // Verify integrity
        guard verifyHash(modelPath, expected: expectedSHA256) else {
            try FileManager.default.removeItem(at: modelPath)
            throw ModelError.corruptedDownload
        }

        isDownloading = false
        isReady = true
    }
}

Android Implementation

class ModelDownloader(private val context: Context) {
    private val modelFile = File(context.filesDir, "model.gguf")

    fun isModelAvailable(): Boolean = modelFile.exists()

    suspend fun downloadModel(
        onProgress: (Float) -> Unit
    ) = withContext(Dispatchers.IO) {
        val client = OkHttpClient()
        val request = Request.Builder().url(MODEL_CDN_URL).build()
        val response = client.newCall(request).execute()

        val body = response.body ?: throw IOException("Empty response")
        val totalBytes = body.contentLength()
        var downloadedBytes = 0L

        modelFile.outputStream().use { output ->
            body.byteStream().use { input ->
                val buffer = ByteArray(8192)
                var read: Int
                while (input.read(buffer).also { read = it } != -1) {
                    output.write(buffer, 0, read)
                    downloadedBytes += read
                    onProgress(downloadedBytes.toFloat() / totalBytes)
                }
            }
        }

        // Verify integrity
        val hash = modelFile.sha256()
        if (hash != EXPECTED_SHA256) {
            modelFile.delete()
            throw IOException("Corrupted download")
        }
    }
}

CDN Setup

Host the GGUF file on a CDN for fast, reliable delivery:

AWS CloudFront + S3: Standard setup. ~$0.085/GB transfer.
Cloudflare R2: No egress fees for downloads. ~$0.015/GB storage only.
Firebase Hosting: Simple for small projects. 10GB free, then $0.15/GB.

Cost example at 10,000 monthly downloads of a 1.7GB model:

CloudFront: ~$1,445/month
Cloudflare R2: ~$0.26/month (storage only, no egress)
Firebase: ~$2,550/month

Cloudflare R2's zero-egress pricing makes it dramatically cheaper for model distribution.

Resume Support

Large downloads will be interrupted. Support resume:

// iOS: Resume interrupted download
let resumeData = try? Data(contentsOf: resumeDataURL)
if let resumeData = resumeData {
    downloadTask = session.downloadTask(withResumeData: resumeData)
} else {
    downloadTask = session.downloadTask(with: modelURL)
}

Pros and Cons of Post-Install Download

Pros:

Small initial app download (fast install, no store size hesitation)
Model updates without app store review (push new model to CDN)
Can offer multiple model sizes (1B for everyone, 3B as upgrade)
Users only download if they use the AI feature

Cons:

First-use delay (1-5 minutes for download)
Requires network for first use
CDN infrastructure and cost
More complex code (download, verify, resume, storage management)

The Recommendation

Scenario	Approach
1B model, AI is core to the app	Bundle (600MB is acceptable)
3B model, AI is core to the app	Fast-follow / On-demand delivery
AI is an optional feature	Post-install download
Model updates frequently (monthly)	Post-install download
Model is stable (quarterly updates)	Bundle or fast-follow
Target market has slow internet	Bundle

For most apps: post-install download with a clear download prompt. This keeps the initial app download small, lets you update models independently, and only downloads for users who actually use the AI feature.

Integrity Verification

Always verify the model file after download. A corrupted GGUF file will cause crashes during inference:

func verifyHash(_ fileURL: URL, expected: String) -> Bool {
    guard let data = try? Data(contentsOf: fileURL) else { return false }
    let hash = SHA256.hash(data: data)
    let hashString = hash.compactMap { String(format: "%02x", $0) }.joined()
    return hashString == expected
}

Storage Management

GGUF models are large. Respect the user's storage:

Show model size before download
Allow deleting and re-downloading the model
On iOS, exclude the model from iCloud backup (it can be re-downloaded)
Handle low-storage scenarios gracefully

The model itself is what makes the difference. A well-fine-tuned GGUF from a platform like Ertas, delivered via either bundling or download, provides domain-specific AI that runs locally, instantly, and at zero per-use cost.

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →

Shipping GGUF Models: App Store Bundling vs Post-Install Download

Option 1: Bundle with the App

iOS Bundling

Android Bundling

Size Limits

Pros and Cons of Bundling

Option 2: Post-Install Download

Download Flow

iOS Implementation

Android Implementation

CDN Setup

Resume Support

Pros and Cons of Post-Install Download

The Recommendation

Integrity Verification

Storage Management

Ship AI that runs on your users' devices.

Ship AI that runs on your users' devices.

Keep reading

On-Device AI Model Size Guide: 1B vs 3B vs 7B for Mobile

Quantization for Mobile: Q4, Q5, and Q8 Across Real Devices

Llama 3.2 for Mobile Apps: Fine-Tuning and On-Device Deployment