
On-Device Text Classification for Mobile Apps
How to build fast, accurate text classification that runs on the user's phone. Sentiment analysis, content categorization, intent detection, and spam filtering without an API call.
Text classification is the most practical on-device AI feature. It is fast (under 100ms), accurate (90%+ with fine-tuning), works on the smallest models (1B), and runs on virtually any modern phone.
If your app needs to categorize content, detect intent, filter spam, analyze sentiment, or route messages, on-device classification is the most efficient approach.
Why Classification Is Ideal for On-Device
Classification has properties that make it perfect for mobile:
Short output: The model generates a single word or short phrase (the category label). This takes milliseconds, not seconds.
Small model sufficient: A fine-tuned 1B model handles classification with 90-94% accuracy. No need for larger, slower models.
High frequency: Classification often runs on every piece of content (every message, every note, every photo caption). At high frequency, cloud API costs add up fast. On-device, each classification is free.
Background-compatible: Classification can run in the background without user interaction. Auto-categorize expenses as they are entered. Auto-tag notes on save. Auto-detect spam on message receipt.
Classification Use Cases
Sentiment Analysis
Determine the emotional tone of user input. Useful for:
- Customer feedback apps (positive/negative/neutral)
- Social media monitoring
- Journal/mood tracking apps
- Support ticket priority routing
Content Categorization
Automatically assign categories to user content:
- Expense categorization (food, transport, entertainment, utilities)
- Note tagging (work, personal, ideas, reference)
- Email sorting (important, newsletter, social, transactional)
- Photo album organization (travel, food, people, nature)
Intent Detection
Understand what the user wants to do:
- Voice assistant routing (play music, set timer, send message, search)
- Chatbot intent classification (ask question, make complaint, request refund)
- Search query classification (navigation, information, transaction)
Content Filtering
Detect and filter unwanted content:
- Spam detection in messaging apps
- Inappropriate content flagging
- Off-topic message detection in community apps
Language Detection
Identify the language of input text for multilingual apps. Route to the appropriate model or translation pipeline.
Implementation
The Prompt Pattern
For LLM-based classification, the prompt is simple:
Classify the following text into one of these categories: [Food, Transport, Entertainment, Utilities, Shopping, Healthcare, Other]
Text: "Uber ride to airport"
Category:
The model generates a single word: "Transport"
Fine-Tuning for Classification
Create training examples in the chat format:
{"messages": [
{"role": "user", "content": "Classify: Uber ride to airport"},
{"role": "assistant", "content": "Transport"}
]}
{"messages": [
{"role": "user", "content": "Classify: Netflix monthly subscription"},
{"role": "assistant", "content": "Entertainment"}
]}
{"messages": [
{"role": "user", "content": "Classify: Grocery store visit"},
{"role": "assistant", "content": "Food"}
]}
With 500-1,000 examples across all categories, a fine-tuned 1B model achieves 90-94% accuracy. This exceeds what a prompted GPT-4o achieves on the same task (typically 78-85%).
Structured Output
For reliable parsing, instruct the model to output JSON:
{"messages": [
{"role": "user", "content": "Classify this expense: Uber ride to airport\nOutput JSON with 'category' and 'confidence' fields."},
{"role": "assistant", "content": "{\"category\": \"Transport\", \"confidence\": \"high\"}"}
]}
Fine-tuned models learn to produce consistent JSON. Parse the output directly in your app code.
Speed Optimization
Classification generates very few tokens (1-10). You can optimize for maximum throughput:
- Set
n_predictto a low value (10-20 tokens max) - Use
stoptokens to halt generation after the category label - Use temperature 0 for deterministic output
- Batch multiple classifications if processing a list
With these optimizations, a 1B model classifies 5-15 items per second on a flagship phone.
Performance Expectations
Accuracy by Training Data Size
| Training Examples | 1B Accuracy | 3B Accuracy |
|---|---|---|
| 100 | 78-82% | 82-86% |
| 250 | 84-88% | 87-91% |
| 500 | 88-92% | 91-94% |
| 1,000 | 90-94% | 93-96% |
| 2,000 | 92-95% | 94-97% |
Diminishing returns above 1,000 examples for most classification tasks. Start with 500 and add more only if accuracy is insufficient.
Speed by Device
| Device | 1B Classification Time | Classifications/Second |
|---|---|---|
| iPhone 16 Pro | 30-60ms | 15-30 |
| iPhone 14 | 50-80ms | 12-20 |
| Galaxy S24 | 40-70ms | 14-25 |
| Mid-range Android | 80-130ms | 8-12 |
Every classification completes in under 150ms on any modern phone. Users perceive this as instant.
Comparison to Cloud APIs
| Metric | Cloud API | On-Device 1B (Fine-Tuned) |
|---|---|---|
| Latency | 500-2,000ms | 30-130ms |
| Accuracy (domain) | 78-85% (prompted) | 90-94% (fine-tuned) |
| Cost per classification | $0.00003-0.0003 | $0 |
| Offline | No | Yes |
| Privacy | Data sent to third party | On-device |
On-device wins on every metric for classification: faster, more accurate (when fine-tuned), free, offline, and private.
Architecture Pattern
Background Classification Service
For apps that need to classify content automatically:
// Android: Background classification
class ClassificationService {
private val model: LlamaModel = LlamaModel()
fun classifyExpense(description: String): ExpenseCategory {
val prompt = "Classify: $description"
val result = model.generate(prompt, maxTokens = 10, temperature = 0f)
return ExpenseCategory.fromString(result.trim())
}
fun classifyBatch(items: List<String>): List<ExpenseCategory> {
return items.map { classifyExpense(it) }
}
}
Real-Time Classification
For features that classify as the user types (auto-suggest category while entering an expense):
// iOS: Real-time classification with debounce
class ClassificationViewModel: ObservableObject {
@Published var suggestedCategory: String = ""
private var classifyTask: Task<Void, Never>?
func onTextChanged(_ text: String) {
classifyTask?.cancel()
classifyTask = Task {
try? await Task.sleep(nanoseconds: 300_000_000) // 300ms debounce
guard !Task.isCancelled else { return }
let category = await classifier.classify(text)
await MainActor.run { suggestedCategory = category }
}
}
}
Getting Started
- Define your categories. List the 5-20 labels your classifier should produce.
- Create training examples. 500 examples minimum, covering all categories with realistic inputs.
- Fine-tune a 1B model. Use a platform like Ertas: upload your examples, select a 1B base model, train with LoRA, export GGUF.
- Integrate llama.cpp. Add the inference library to your iOS or Android project.
- Test on real data. Run your evaluation set through the model. Target 90%+ accuracy.
- Deploy. Bundle the model or deliver via post-install download.
Classification is the fastest path to production-quality on-device AI. The model is small, the task is simple, and the results are measurably better than cloud APIs for domain-specific categories.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Building an On-Device AI Assistant for Your Mobile App
Architecture patterns for building a conversational AI assistant that runs entirely on the user's device. Model selection, conversation management, UI patterns, and production considerations.

On-Device Semantic Search: AI-Powered Search Without a Server
How to build semantic search that runs entirely on the user's phone. Local embeddings, vector similarity, and natural language queries across user content without a server or API.

On-Device Content Generation: AI Drafts That Work Offline
How to build AI-powered drafting features that work without internet. Email replies, message suggestions, note expansion, and content templates generated entirely on the user's device.