
AI in Flutter Apps: Cloud APIs, TFLite, and On-Device LLMs
Three paths to AI in Flutter. Cloud APIs via the http package, TensorFlow Lite for classical ML tasks, and on-device LLMs via llama.cpp for text generation. A practical comparison for Dart developers.
Flutter developers building AI features have three distinct paths. Cloud APIs give you access to frontier models through HTTP calls. TensorFlow Lite handles classical ML tasks on-device. And llama.cpp brings full LLM text generation to the device via platform channels.
Each serves a different purpose. This guide compares them from a Dart developer's perspective.
Path 1: Cloud APIs
Flutter's http or dio packages make cloud API integration straightforward. The pattern works identically on iOS, Android, web, and desktop.
Basic Integration
import 'dart:convert';
import 'package:http/http.dart' as http;
Future<String> generateResponse(String prompt) async {
final response = await http.post(
Uri.parse('https://api.openai.com/v1/chat/completions'),
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer $apiKey',
},
body: jsonEncode({
'model': 'gpt-4o-mini',
'messages': [{'role': 'user', 'content': prompt}],
}),
);
final data = jsonDecode(response.body);
return data['choices'][0]['message']['content'];
}
Streaming with SSE
For chat interfaces that display tokens as they arrive:
import 'package:flutter_client_sse/flutter_client_sse.dart';
void streamResponse(String prompt, Function(String) onToken) {
SSEClient.subscribeToSSE(
url: 'https://api.openai.com/v1/chat/completions',
header: {
'Content-Type': 'application/json',
'Authorization': 'Bearer $apiKey',
},
body: {
'model': 'gpt-4o-mini',
'messages': [{'role': 'user', 'content': prompt}],
'stream': true,
},
).listen((event) {
if (event.data == '[DONE]') return;
final parsed = jsonDecode(event.data!);
final token = parsed['choices'][0]['delta']['content'];
if (token != null) onToken(token);
});
}
When to Use Cloud APIs
Cloud APIs are the right choice for prototyping, feature validation, and very low-volume apps. They require no native code, work across all Flutter platforms, and give you access to frontier models.
The trade-offs are the standard cloud API trade-offs: per-token cost scaling, network dependency, 500-3,000ms latency, and data leaving the device on every request.
Path 2: TensorFlow Lite
The tflite_flutter plugin provides TFLite support for Flutter. TFLite runs optimized ML models on-device for specific tasks.
What TFLite Does Well
- Image classification and object detection
- Text classification and sentiment analysis
- On-device translation (pre-built models)
- Pose estimation
- Audio classification
Integration Pattern
import 'package:tflite_flutter/tflite_flutter.dart';
class TextClassifier {
late Interpreter _interpreter;
Future<void> loadModel() async {
_interpreter = await Interpreter.fromAsset('model.tflite');
}
List<double> classify(List<int> tokenizedInput) {
var output = List.filled(1 * numClasses, 0.0).reshape([1, numClasses]);
_interpreter.run([tokenizedInput], output);
return output[0];
}
}
What TFLite Cannot Do
TFLite does not support large language models for open-ended text generation. There is no TFLite equivalent of ChatGPT or Claude. You cannot use TFLite for conversational AI, content drafting, summarization, or any task that requires generating natural language responses.
For those tasks, you need either a cloud API or an on-device LLM.
Cost
Free. TFLite runs entirely on-device. The models are small (typically 1-50MB) and bundled with the app.
Path 3: On-Device LLMs via llama.cpp
Run a full language model on the user's device. llama.cpp handles inference. GGUF models provide the intelligence. Flutter communicates through platform channels (method channels or FFI).
Integration Approaches
Platform channels: Write a thin native wrapper in Swift (iOS) and Kotlin (Android) that calls llama.cpp, then communicate from Dart via MethodChannel.
// Dart side
class OnDeviceLlm {
static const _channel = MethodChannel('com.app/llm');
Future<void> loadModel(String path) async {
await _channel.invokeMethod('loadModel', {'path': path});
}
Future<String> generate(String prompt) async {
return await _channel.invokeMethod('generate', {'prompt': prompt});
}
Stream<String> generateStream(String prompt) {
const eventChannel = EventChannel('com.app/llm_stream');
_channel.invokeMethod('startGeneration', {'prompt': prompt});
return eventChannel.receiveBroadcastStream().map((e) => e as String);
}
}
Dart FFI: Use dart:ffi to call llama.cpp's C API directly. This avoids the platform channel overhead but requires more setup:
import 'dart:ffi';
// Bind to llama.cpp shared library
final llamaLib = DynamicLibrary.open('libllama.so'); // Android
// DynamicLibrary.process() for iOS (statically linked)
typedef LlamaInitNative = Pointer Function(Pointer<Utf8>);
typedef LlamaInit = Pointer Function(Pointer<Utf8>);
final llamaInit = llamaLib
.lookupFunction<LlamaInitNative, LlamaInit>('llama_load_model');
Model Delivery in Flutter
Bundled: Place the GGUF file in the platform-specific asset directories. For Android, use asset delivery for large files. For iOS, add to the Xcode project.
Downloaded: Use dio or http for background downloads with progress:
import 'package:dio/dio.dart';
Future<void> downloadModel() async {
final dir = await getApplicationDocumentsDirectory();
final modelPath = '${dir.path}/model.gguf';
await Dio().download(
modelCdnUrl,
modelPath,
onReceiveProgress: (received, total) {
final progress = received / total;
// Update UI with download progress
},
);
}
Performance
On-device inference performance is the same as native apps since llama.cpp runs natively, not through the Dart VM. The platform channel or FFI overhead is negligible (under 1ms per token).
| Device | 1B Model (tok/s) | 3B Model (tok/s) |
|---|---|---|
| iPhone 15 Pro (A17) | 35-45 | 18-25 |
| Galaxy S24 (SD 8 Gen 3) | 35-45 | 18-25 |
| Pixel 9 (Tensor G4) | 30-40 | 15-22 |
| Mid-range 2024+ | 18-25 | 8-12 |
The Comparison
| Factor | Cloud API | TFLite | On-Device LLM |
|---|---|---|---|
| Text generation | Yes | No | Yes |
| Image classification | Via API | Yes (optimized) | No |
| Offline support | No | Yes | Yes |
| Cost per inference | $0.0001-$0.01 | $0 | $0 |
| Flutter integration | Native Dart | Plugin | Platform channel/FFI |
| Custom models | Via API selection | Custom TFLite | Any GGUF model |
| Model size | N/A (server-side) | 1-50MB | 600MB-1.7GB |
Practical Decision Framework
Use cloud APIs when you are validating a feature, the user base is small, or you need frontier-model reasoning. The http package makes this trivial in Dart.
Use TFLite when you need image classification, object detection, text classification, or other classical ML tasks. Google's pre-built models cover many common use cases.
Use on-device LLMs when you need conversational AI, content generation, summarization, or any text-heavy AI feature. The zero per-inference cost, offline support, and privacy guarantees make this the right choice for production apps at scale.
The fine-tuning step is where you make the on-device model competitive with cloud APIs on your specific task. Platforms like Ertas handle the full workflow: upload training data, fine-tune with LoRA, export GGUF, deploy to any device. A fine-tuned 3B model typically outperforms prompted cloud models on domain-specific tasks.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

AI in iOS Apps: CoreML, Cloud APIs, and On-Device LLMs Compared
Three paths to AI in your iOS app. CoreML for Apple's ecosystem, cloud APIs for capability, and on-device LLMs via llama.cpp for cost and privacy. A practical comparison for Swift developers.

AI in Android Apps: ML Kit, Cloud APIs, and On-Device LLMs Compared
Three paths to AI in your Android app. Google ML Kit for common tasks, cloud APIs for full LLM capability, and on-device models via llama.cpp for cost and privacy. A practical comparison for Kotlin developers.

AI in React Native: From Cloud APIs to On-Device Models
How to add AI features to React Native apps. Cloud API integration with fetch, on-device inference with llama.cpp bindings, and a practical migration path from one to the other.