Ship

    Get an Ertas-trained GGUF into an iOS, Android, desktop, or web app, including the model delivery, first-run UX, and performance habits that survive production.

    This section covers everything that happens after Export: how the GGUF reaches the user's device, how your app loads and queries it, and the performance habits that keep it responsive in production. The Ertas bundle is Ollama-ready out of the box, so a desktop user can run your model in three clicks. Mobile and web take more work, but the patterns are consistent across platforms.

    The shortest possible summary

    If you read nothing else in this section, read this:

    • Default to post-install download. The Ertas GGUF is 0.5 to 9 GB; bundling it in your app's install package wrecks install conversion on most stores. See Model delivery and UX.
    • Pick your integration path by app shape, not by model. Flutter via llamadart. React Native via llama.rn. Native iOS via llama.cpp's SwiftUI example or another maintained Swift wrapper. Native Android via llama.cpp's Android example and JNI. Desktop is easiest via Ollama (the Ertas bundle is Ollama-ready), or via node-llama-cpp (Electron) and llama-cpp-2 (Tauri) for embedded inference. Browser via wllama (WebAssembly + WebGPU, GGUF-native) or WebLLM (WebGPU, conversion required).
    • Load once at startup, reuse the session, dispose on teardown. Loading a model costs 1 to 3 seconds and 0.5 to 9 GB of native RAM. Re-using the loaded model across calls is the biggest single performance win available.
    • Cache outputs aggressively in file storage (not SharedPreferences or UserDefaults). A ~10 ms cache hit beats a 1 to 3 second inference every time.
    • Verify before shipping. Run the smoke test on every release build. Templating mismatches and integrity failures are the two biggest sources of post-export surprises.

    Start with Model delivery and UX if you have not picked a delivery path yet, or jump to your target platform's page if you already know what you are shipping into.