Back to blog
    Shipping GGUF Models: App Store Bundling vs Post-Install Download
    GGUFdeploymentApp Storemodel deliverymobile AIsegment:mobile-builder

    Shipping GGUF Models: App Store Bundling vs Post-Install Download

    Two ways to get your GGUF model onto the user's device. Bundle it with the app for simplicity, or download post-install for flexibility. Architecture, size limits, and best practices for both.

    EErtas Team·

    Your model is fine-tuned and exported to GGUF. Now you need to get it onto the user's device. There are two fundamental approaches: bundle it with the app binary, or download it after install.

    Each has trade-offs around app size, user experience, update flexibility, and platform constraints.

    Option 1: Bundle with the App

    Include the GGUF file in the app package. The user downloads the model when they download the app.

    iOS Bundling

    Direct inclusion: Add the GGUF file to your Xcode project. It ships inside the IPA. Access via Bundle.main:

    let modelPath = Bundle.main.path(forResource: "model", ofType: "gguf")!
    

    On Demand Resources (ODR): Tag the model as an on-demand resource. iOS downloads it when first needed, not at app install time. The initial app download stays small.

    let request = NSBundleResourceRequest(tags: ["ai-model"])
    request.beginAccessingResources { error in
        guard error == nil else { return }
        let modelPath = Bundle.main.path(forResource: "model", ofType: "gguf")!
        loadModel(at: modelPath)
    }
    

    ODR files are managed by iOS and may be purged under storage pressure. Your app must handle re-downloading.

    Android Bundling

    APK assets: For models under 150MB, place the GGUF in assets/. Copy to internal storage on first launch for llama.cpp to access.

    Play Asset Delivery: For larger models, use Google Play's asset delivery system:

    // Install-time delivery (downloaded with app)
    // build.gradle.kts
    assetPacks += ":model_pack"
    

    Play Asset Delivery supports three modes:

    • Install-time: Downloaded with the app. Simplest but increases initial download.
    • Fast-follow: Downloaded immediately after install, in background.
    • On-demand: Downloaded when the app requests it.

    Size Limits

    PlatformLimitNotes
    iOS IPA4GBIncludes all resources
    iOS OTA download200MBCellular download limit (user can override)
    Android APK150MBWithout Play Asset Delivery
    Android AAB150MB base + 2GB assetsWith Play Asset Delivery
    Play Asset Delivery pack512MB per packMultiple packs allowed

    A 1B GGUF Q4 model (~600MB) fits within iOS's 4GB limit but exceeds the 200MB cellular OTA threshold. On Android, it requires Play Asset Delivery.

    A 3B GGUF Q4 model (~1.7GB) fits within both platforms' upper limits but will be a large download.

    Pros and Cons of Bundling

    Pros:

    • Model is immediately available on first launch (no download wait)
    • No CDN infrastructure needed
    • No network connectivity required for first use
    • Simpler architecture (no download/verification logic)

    Cons:

    • Increases app download size significantly
    • Model updates require a full app update through the store
    • App Store review for every model change
    • Users may hesitate to download a 600MB-1.7GB app
    • On iOS, the 200MB cellular limit means users may need WiFi to download

    Option 2: Post-Install Download

    The app installs without the model. On first launch (or when the user accesses the AI feature), the app downloads the model from your CDN.

    Download Flow

    [App installed] -> [User opens AI feature] -> [Model not found]
      -> [Show download prompt: "Download AI model (1.7GB)?"]
      -> [User taps Download] -> [Progress bar]
      -> [Download complete] -> [Verify hash] -> [Model ready]
    

    iOS Implementation

    class ModelDownloader: ObservableObject {
        @Published var progress: Double = 0
        @Published var isDownloading = false
        @Published var isReady = false
    
        private let modelURL = URL(string: "https://cdn.example.com/model.gguf")!
        private var modelPath: URL {
            FileManager.default
                .urls(for: .documentDirectory, in: .userDomainMask)[0]
                .appendingPathComponent("model.gguf")
        }
    
        func checkModelAvailable() -> Bool {
            FileManager.default.fileExists(atPath: modelPath.path)
        }
    
        func downloadModel() async throws {
            isDownloading = true
    
            let (tempURL, response) = try await URLSession.shared.download(
                from: modelURL,
                delegate: ProgressDelegate { progress in
                    Task { @MainActor in self.progress = progress }
                }
            )
    
            try FileManager.default.moveItem(at: tempURL, to: modelPath)
    
            // Verify integrity
            guard verifyHash(modelPath, expected: expectedSHA256) else {
                try FileManager.default.removeItem(at: modelPath)
                throw ModelError.corruptedDownload
            }
    
            isDownloading = false
            isReady = true
        }
    }
    

    Android Implementation

    class ModelDownloader(private val context: Context) {
        private val modelFile = File(context.filesDir, "model.gguf")
    
        fun isModelAvailable(): Boolean = modelFile.exists()
    
        suspend fun downloadModel(
            onProgress: (Float) -> Unit
        ) = withContext(Dispatchers.IO) {
            val client = OkHttpClient()
            val request = Request.Builder().url(MODEL_CDN_URL).build()
            val response = client.newCall(request).execute()
    
            val body = response.body ?: throw IOException("Empty response")
            val totalBytes = body.contentLength()
            var downloadedBytes = 0L
    
            modelFile.outputStream().use { output ->
                body.byteStream().use { input ->
                    val buffer = ByteArray(8192)
                    var read: Int
                    while (input.read(buffer).also { read = it } != -1) {
                        output.write(buffer, 0, read)
                        downloadedBytes += read
                        onProgress(downloadedBytes.toFloat() / totalBytes)
                    }
                }
            }
    
            // Verify integrity
            val hash = modelFile.sha256()
            if (hash != EXPECTED_SHA256) {
                modelFile.delete()
                throw IOException("Corrupted download")
            }
        }
    }
    

    CDN Setup

    Host the GGUF file on a CDN for fast, reliable delivery:

    • AWS CloudFront + S3: Standard setup. ~$0.085/GB transfer.
    • Cloudflare R2: No egress fees for downloads. ~$0.015/GB storage only.
    • Firebase Hosting: Simple for small projects. 10GB free, then $0.15/GB.

    Cost example at 10,000 monthly downloads of a 1.7GB model:

    • CloudFront: ~$1,445/month
    • Cloudflare R2: ~$0.26/month (storage only, no egress)
    • Firebase: ~$2,550/month

    Cloudflare R2's zero-egress pricing makes it dramatically cheaper for model distribution.

    Resume Support

    Large downloads will be interrupted. Support resume:

    // iOS: Resume interrupted download
    let resumeData = try? Data(contentsOf: resumeDataURL)
    if let resumeData = resumeData {
        downloadTask = session.downloadTask(withResumeData: resumeData)
    } else {
        downloadTask = session.downloadTask(with: modelURL)
    }
    

    Pros and Cons of Post-Install Download

    Pros:

    • Small initial app download (fast install, no store size hesitation)
    • Model updates without app store review (push new model to CDN)
    • Can offer multiple model sizes (1B for everyone, 3B as upgrade)
    • Users only download if they use the AI feature

    Cons:

    • First-use delay (1-5 minutes for download)
    • Requires network for first use
    • CDN infrastructure and cost
    • More complex code (download, verify, resume, storage management)

    The Recommendation

    ScenarioApproach
    1B model, AI is core to the appBundle (600MB is acceptable)
    3B model, AI is core to the appFast-follow / On-demand delivery
    AI is an optional featurePost-install download
    Model updates frequently (monthly)Post-install download
    Model is stable (quarterly updates)Bundle or fast-follow
    Target market has slow internetBundle

    For most apps: post-install download with a clear download prompt. This keeps the initial app download small, lets you update models independently, and only downloads for users who actually use the AI feature.

    Integrity Verification

    Always verify the model file after download. A corrupted GGUF file will cause crashes during inference:

    func verifyHash(_ fileURL: URL, expected: String) -> Bool {
        guard let data = try? Data(contentsOf: fileURL) else { return false }
        let hash = SHA256.hash(data: data)
        let hashString = hash.compactMap { String(format: "%02x", $0) }.joined()
        return hashString == expected
    }
    

    Storage Management

    GGUF models are large. Respect the user's storage:

    • Show model size before download
    • Allow deleting and re-downloading the model
    • On iOS, exclude the model from iCloud backup (it can be re-downloaded)
    • Handle low-storage scenarios gracefully

    The model itself is what makes the difference. A well-fine-tuned GGUF from a platform like Ertas, delivered via either bundling or download, provides domain-specific AI that runs locally, instantly, and at zero per-use cost.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading