Back to blog
    AI in iOS Apps: CoreML, Cloud APIs, and On-Device LLMs Compared
    iOSCoreMLSwiftcloud APIon-device AIllama.cppsegment:mobile-builder

    AI in iOS Apps: CoreML, Cloud APIs, and On-Device LLMs Compared

    Three paths to AI in your iOS app. CoreML for Apple's ecosystem, cloud APIs for capability, and on-device LLMs via llama.cpp for cost and privacy. A practical comparison for Swift developers.

    EErtas Team·

    As an iOS developer, you have three distinct paths to adding AI features to your app. Each uses different technology, has different cost characteristics, and is suited to different tasks. Choosing the wrong path wastes either money or time.

    This guide compares the three approaches from a Swift developer's perspective: what each can do, what it costs, and when to use it.

    Path 1: CoreML

    Apple's native machine learning framework. CoreML runs models directly on the device using Apple's Neural Engine, GPU, and CPU. It is deeply integrated into the Apple ecosystem and optimized for Apple silicon.

    What CoreML Can Do

    CoreML excels at vision and traditional NLP tasks that Apple has specifically optimized:

    • Image classification and object detection via Vision framework
    • Text classification and sentiment analysis via Natural Language framework
    • Sound classification via SoundAnalysis framework
    • Hand pose, body pose, and face detection
    • On-device translation (limited language pairs)

    Apple provides pre-trained models through Create ML and the Apple Developer documentation. You can also convert models from PyTorch or TensorFlow using coremltools.

    What CoreML Cannot Do

    CoreML does not support running large language models for text generation, chat, or complex reasoning. There is no native support for running a GPT-style model through CoreML in a way that produces conversational responses. Apple's on-device language features are limited to specific, narrow tasks.

    Integration Pattern

    import CoreML
    import Vision
    
    // Image classification example
    let model = try VNCoreMLModel(for: MobileNetV2().model)
    let request = VNCoreMLRequest(model: model) { request, error in
        guard let results = request.results as? [VNClassificationObservation],
              let topResult = results.first else { return }
        print("\(topResult.identifier): \(topResult.confidence)")
    }
    let handler = VNImageRequestHandler(cgImage: image)
    try handler.perform([request])
    

    Cost

    Zero. CoreML inference runs locally on the device with no API calls and no per-request charges.

    Best For

    Vision tasks (photo categorization, barcode scanning, face detection), text classification, sound analysis. Tasks where Apple provides optimized models or where you can train a custom classifier with Create ML.

    Path 2: Cloud APIs

    Call an external API (OpenAI, Anthropic, Google) from your iOS app. The model runs on the provider's servers. Your app sends the request and receives the response.

    What Cloud APIs Can Do

    Everything. Frontier models like GPT-4o, Claude 3.5 Sonnet, and Gemini can handle complex reasoning, creative generation, multi-turn conversation, code generation, and tasks that require broad world knowledge.

    Integration Pattern

    func chat(_ message: String) async throws -> String {
        var request = URLRequest(url: URL(string: "https://api.openai.com/v1/chat/completions")!)
        request.httpMethod = "POST"
        request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")
        request.setValue("application/json", forHTTPHeaderField: "Content-Type")
        let body: [String: Any] = [
            "model": "gpt-4o-mini",
            "messages": [["role": "user", "content": message]]
        ]
        request.httpBody = try JSONSerialization.data(withJSONObject: body)
        let (data, _) = try await URLSession.shared.data(for: request)
        // Parse and return response
    }
    

    Cost

    Per-token pricing. GPT-4o-mini costs $0.15/$0.60 per million input/output tokens. At 10K MAU with 3 daily interactions, expect $300-$1,000+/month depending on your system prompt and conversation history.

    Best For

    Prototyping and validation. Tasks requiring frontier reasoning on novel inputs. Very low volume features. Features needing access to current world knowledge.

    Drawbacks for iOS

    Network dependency (fails offline, subway, airplane mode). Latency (500ms-3s for each response). Privacy (user data sent to third-party servers, must be disclosed in App Store privacy labels). Cost scales with every user.

    Path 3: On-Device LLMs via llama.cpp

    Run a full language model locally on the iPhone using llama.cpp. This gives you GPT-style capabilities (chat, generation, classification, summarization) entirely on-device.

    What On-Device LLMs Can Do

    Any text-in, text-out task that a small language model can handle: conversational AI, content drafting, classification, summarization, translation, structured data extraction, and function/tool calling. Fine-tuned on your domain data, a 3B model achieves 94% accuracy on domain-specific tasks.

    How It Works on iOS

    llama.cpp is a C/C++ library that runs GGUF model files. On iOS, it automatically uses Metal for GPU acceleration via Apple's Neural Engine. The library provides Swift-compatible interfaces through its C API or community Swift wrappers.

    // Conceptual pattern using llama.cpp Swift bindings
    let model = try LlamaModel(path: modelPath, params: .default)
    let context = try model.createContext(contextLength: 2048)
    
    // Streaming inference
    for await token in context.generate(prompt: userMessage) {
        await MainActor.run { responseText += token }
    }
    

    Performance on Apple Silicon

    iPhoneChipRAM1B Model (tok/s)3B Model (tok/s)
    iPhone 12A144GB20-30Not recommended
    iPhone 13A154-6GB30-4012-18
    iPhone 14A15/A166GB30-4015-22
    iPhone 15A16/A176-8GB35-5020-30
    iPhone 16 ProA18 Pro8GB45-6025-35

    Anything above 10 tokens per second is usable for chat. Above 20 feels responsive. Modern iPhones (A15 and later) comfortably run 1-3B models.

    Cost

    One-time fine-tuning cost ($5-50). Model distribution via CDN (~$0.08/GB). Then zero per-inference cost. Permanently.

    Best For

    High-volume AI features (chat, search, classification). Privacy-sensitive data. Offline-required features. Domain-specific tasks. Any app where AI costs need to stay flat as users grow.

    The Comparison

    FactorCoreMLCloud APIOn-Device LLM
    Text generation / chatNoYesYes
    Image classificationYes (optimized)YesNo (text only)
    Offline supportYesNoYes
    Cost per inference$0$0.0001-$0.01$0
    Setup complexityLowLowMedium
    LatencyInstant500ms-3,000ms50-200ms first token
    PrivacyOn-deviceThird-party serversOn-device
    Model flexibilityApple models onlyAny provider modelAny GGUF model
    Fine-tuningCreate ML (limited)Some providersFull LoRA/QLoRA

    The Practical Decision

    Use CoreML when you need image classification, object detection, text classification, or sound analysis. Apple's optimized models are hard to beat for these specific tasks on iOS hardware.

    Use a cloud API when you are prototyping, when task volume is very low, or when you genuinely need frontier-model reasoning that a 3B model cannot match.

    Use on-device LLMs when you need text generation, chat, summarization, translation, or any high-volume text task. The cost, latency, privacy, and offline advantages are significant. A fine-tuned model on your domain data will outperform generic cloud API prompting for your specific use case.

    Many apps combine approaches. CoreML for the camera features. On-device LLM for the chat assistant. Cloud API as a fallback for the occasional complex query. This hybrid approach gives you the best of each technology.

    For the fine-tuning step, tools like Ertas provide a visual pipeline that takes you from training data to a GGUF file ready for iOS deployment. No ML expertise required. The model runs on-device via llama.cpp with Metal acceleration, giving you production-grade inference performance on any modern iPhone.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading