AI in iOS Apps: CoreML, Cloud APIs, and On-Device LLMs Compared

As an iOS developer, you have three distinct paths to adding AI features to your app. Each uses different technology, has different cost characteristics, and is suited to different tasks. Choosing the wrong path wastes either money or time.

This guide compares the three approaches from a Swift developer's perspective: what each can do, what it costs, and when to use it.

Path 1: CoreML

Apple's native machine learning framework. CoreML runs models directly on the device using Apple's Neural Engine, GPU, and CPU. It is deeply integrated into the Apple ecosystem and optimized for Apple silicon.

What CoreML Can Do

CoreML excels at vision and traditional NLP tasks that Apple has specifically optimized:

Image classification and object detection via Vision framework
Text classification and sentiment analysis via Natural Language framework
Sound classification via SoundAnalysis framework
Hand pose, body pose, and face detection
On-device translation (limited language pairs)

Apple provides pre-trained models through Create ML and the Apple Developer documentation. You can also convert models from PyTorch or TensorFlow using coremltools.

What CoreML Cannot Do

CoreML does not support running large language models for text generation, chat, or complex reasoning. There is no native support for running a GPT-style model through CoreML in a way that produces conversational responses. Apple's on-device language features are limited to specific, narrow tasks.

Integration Pattern

import CoreML
import Vision

// Image classification example
let model = try VNCoreMLModel(for: MobileNetV2().model)
let request = VNCoreMLRequest(model: model) { request, error in
    guard let results = request.results as? [VNClassificationObservation],
          let topResult = results.first else { return }
    print("\(topResult.identifier): \(topResult.confidence)")
}
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])

Cost

Zero. CoreML inference runs locally on the device with no API calls and no per-request charges.

Best For

Vision tasks (photo categorization, barcode scanning, face detection), text classification, sound analysis. Tasks where Apple provides optimized models or where you can train a custom classifier with Create ML.

Path 2: Cloud APIs

Call an external API (OpenAI, Anthropic, Google) from your iOS app. The model runs on the provider's servers. Your app sends the request and receives the response.

What Cloud APIs Can Do

Everything. Frontier models like GPT-4o, Claude 3.5 Sonnet, and Gemini can handle complex reasoning, creative generation, multi-turn conversation, code generation, and tasks that require broad world knowledge.

Integration Pattern

func chat(_ message: String) async throws -> String {
    var request = URLRequest(url: URL(string: "https://api.openai.com/v1/chat/completions")!)
    request.httpMethod = "POST"
    request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")
    request.setValue("application/json", forHTTPHeaderField: "Content-Type")
    let body: [String: Any] = [
        "model": "gpt-4o-mini",
        "messages": [["role": "user", "content": message]]
    ]
    request.httpBody = try JSONSerialization.data(withJSONObject: body)
    let (data, _) = try await URLSession.shared.data(for: request)
    // Parse and return response
}

Cost

Per-token pricing. GPT-4o-mini costs $0.15/$0.60 per million input/output tokens. At 10K MAU with 3 daily interactions, expect $300-$1,000+/month depending on your system prompt and conversation history.

Best For

Prototyping and validation. Tasks requiring frontier reasoning on novel inputs. Very low volume features. Features needing access to current world knowledge.

Drawbacks for iOS

Network dependency (fails offline, subway, airplane mode). Latency (500ms-3s for each response). Privacy (user data sent to third-party servers, must be disclosed in App Store privacy labels). Cost scales with every user.

Path 3: On-Device LLMs via llama.cpp

Run a full language model locally on the iPhone using llama.cpp. This gives you GPT-style capabilities (chat, generation, classification, summarization) entirely on-device.

What On-Device LLMs Can Do

Any text-in, text-out task that a small language model can handle: conversational AI, content drafting, classification, summarization, translation, structured data extraction, and function/tool calling. Fine-tuned on your domain data, a 3B model achieves 94% accuracy on domain-specific tasks.

How It Works on iOS

llama.cpp is a C/C++ library that runs GGUF model files. On iOS, it automatically uses Metal for GPU acceleration via Apple's Neural Engine. The library provides Swift-compatible interfaces through its C API or community Swift wrappers.

// Conceptual pattern using llama.cpp Swift bindings
let model = try LlamaModel(path: modelPath, params: .default)
let context = try model.createContext(contextLength: 2048)

// Streaming inference
for await token in context.generate(prompt: userMessage) {
    await MainActor.run { responseText += token }
}

Performance on Apple Silicon

iPhone	Chip	RAM	1B Model (tok/s)	3B Model (tok/s)
iPhone 12	A14	4GB	20-30	Not recommended
iPhone 13	A15	4-6GB	30-40	12-18
iPhone 14	A15/A16	6GB	30-40	15-22
iPhone 15	A16/A17	6-8GB	35-50	20-30
iPhone 16 Pro	A18 Pro	8GB	45-60	25-35

Anything above 10 tokens per second is usable for chat. Above 20 feels responsive. Modern iPhones (A15 and later) comfortably run 1-3B models.

Cost

One-time fine-tuning cost ($5-50). Model distribution via CDN (~$0.08/GB). Then zero per-inference cost. Permanently.

Best For

High-volume AI features (chat, search, classification). Privacy-sensitive data. Offline-required features. Domain-specific tasks. Any app where AI costs need to stay flat as users grow.

The Comparison

Factor	CoreML	Cloud API	On-Device LLM
Text generation / chat	No	Yes	Yes
Image classification	Yes (optimized)	Yes	No (text only)
Offline support	Yes	No	Yes
Cost per inference	$0	$0.0001-$0.01	$0
Setup complexity	Low	Low	Medium
Latency	Instant	500ms-3,000ms	50-200ms first token
Privacy	On-device	Third-party servers	On-device
Model flexibility	Apple models only	Any provider model	Any GGUF model
Fine-tuning	Create ML (limited)	Some providers	Full LoRA/QLoRA

The Practical Decision

Use CoreML when you need image classification, object detection, text classification, or sound analysis. Apple's optimized models are hard to beat for these specific tasks on iOS hardware.

Use a cloud API when you are prototyping, when task volume is very low, or when you genuinely need frontier-model reasoning that a 3B model cannot match.

Use on-device LLMs when you need text generation, chat, summarization, translation, or any high-volume text task. The cost, latency, privacy, and offline advantages are significant. A fine-tuned model on your domain data will outperform generic cloud API prompting for your specific use case.

Many apps combine approaches. CoreML for the camera features. On-device LLM for the chat assistant. Cloud API as a fallback for the occasional complex query. This hybrid approach gives you the best of each technology.

For the fine-tuning step, tools like Ertas provide a visual pipeline that takes you from training data to a GGUF file ready for iOS deployment. No ML expertise required. The model runs on-device via llama.cpp with Metal acceleration, giving you production-grade inference performance on any modern iPhone.

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →

AI in iOS Apps: CoreML, Cloud APIs, and On-Device LLMs Compared

Path 1: CoreML

What CoreML Can Do

What CoreML Cannot Do

Integration Pattern

Cost

Best For

Path 2: Cloud APIs

What Cloud APIs Can Do

Integration Pattern

Cost

Best For

Drawbacks for iOS

Path 3: On-Device LLMs via llama.cpp

What On-Device LLMs Can Do

How It Works on iOS

Performance on Apple Silicon

Cost

Best For

The Comparison

The Practical Decision

Ship AI that runs on your users' devices.

Ship AI that runs on your users' devices.

Keep reading

llama.cpp on iOS: A Swift Integration Guide

How to Add AI to Your Mobile App: A Developer's Decision Guide

AI in Android Apps: ML Kit, Cloud APIs, and On-Device LLMs Compared