On-Device Text Classification for Mobile Apps

Text classification is the most practical on-device AI feature. It is fast (under 100ms), accurate (90%+ with fine-tuning), works on the smallest models (1B), and runs on virtually any modern phone.

If your app needs to categorize content, detect intent, filter spam, analyze sentiment, or route messages, on-device classification is the most efficient approach.

Why Classification Is Ideal for On-Device

Classification has properties that make it perfect for mobile:

Short output: The model generates a single word or short phrase (the category label). This takes milliseconds, not seconds.

Small model sufficient: A fine-tuned 1B model handles classification with 90-94% accuracy. No need for larger, slower models.

High frequency: Classification often runs on every piece of content (every message, every note, every photo caption). At high frequency, cloud API costs add up fast. On-device, each classification is free.

Background-compatible: Classification can run in the background without user interaction. Auto-categorize expenses as they are entered. Auto-tag notes on save. Auto-detect spam on message receipt.

Classification Use Cases

Sentiment Analysis

Determine the emotional tone of user input. Useful for:

Customer feedback apps (positive/negative/neutral)
Social media monitoring
Journal/mood tracking apps
Support ticket priority routing

Content Categorization

Automatically assign categories to user content:

Expense categorization (food, transport, entertainment, utilities)
Note tagging (work, personal, ideas, reference)
Email sorting (important, newsletter, social, transactional)
Photo album organization (travel, food, people, nature)

Intent Detection

Understand what the user wants to do:

Voice assistant routing (play music, set timer, send message, search)
Chatbot intent classification (ask question, make complaint, request refund)
Search query classification (navigation, information, transaction)

Content Filtering

Detect and filter unwanted content:

Spam detection in messaging apps
Inappropriate content flagging
Off-topic message detection in community apps

Language Detection

Identify the language of input text for multilingual apps. Route to the appropriate model or translation pipeline.

Implementation

The Prompt Pattern

For LLM-based classification, the prompt is simple:

Classify the following text into one of these categories: [Food, Transport, Entertainment, Utilities, Shopping, Healthcare, Other]

Text: "Uber ride to airport"
Category:

The model generates a single word: "Transport"

Fine-Tuning for Classification

Create training examples in the chat format:

{"messages": [
  {"role": "user", "content": "Classify: Uber ride to airport"},
  {"role": "assistant", "content": "Transport"}
]}

{"messages": [
  {"role": "user", "content": "Classify: Netflix monthly subscription"},
  {"role": "assistant", "content": "Entertainment"}
]}

{"messages": [
  {"role": "user", "content": "Classify: Grocery store visit"},
  {"role": "assistant", "content": "Food"}
]}

With 500-1,000 examples across all categories, a fine-tuned 1B model achieves 90-94% accuracy. This exceeds what a prompted GPT-4o achieves on the same task (typically 78-85%).

Structured Output

For reliable parsing, instruct the model to output JSON:

{"messages": [
  {"role": "user", "content": "Classify this expense: Uber ride to airport\nOutput JSON with 'category' and 'confidence' fields."},
  {"role": "assistant", "content": "{\"category\": \"Transport\", \"confidence\": \"high\"}"}
]}

Fine-tuned models learn to produce consistent JSON. Parse the output directly in your app code.

Speed Optimization

Classification generates very few tokens (1-10). You can optimize for maximum throughput:

Set n_predict to a low value (10-20 tokens max)
Use stop tokens to halt generation after the category label
Use temperature 0 for deterministic output
Batch multiple classifications if processing a list

With these optimizations, a 1B model classifies 5-15 items per second on a flagship phone.

Performance Expectations

Accuracy by Training Data Size

Training Examples	1B Accuracy	3B Accuracy
100	78-82%	82-86%
250	84-88%	87-91%
500	88-92%	91-94%
1,000	90-94%	93-96%
2,000	92-95%	94-97%

Diminishing returns above 1,000 examples for most classification tasks. Start with 500 and add more only if accuracy is insufficient.

Speed by Device

Device	1B Classification Time	Classifications/Second
iPhone 16 Pro	30-60ms	15-30
iPhone 14	50-80ms	12-20
Galaxy S24	40-70ms	14-25
Mid-range Android	80-130ms	8-12

Every classification completes in under 150ms on any modern phone. Users perceive this as instant.

Comparison to Cloud APIs

Metric	Cloud API	On-Device 1B (Fine-Tuned)
Latency	500-2,000ms	30-130ms
Accuracy (domain)	78-85% (prompted)	90-94% (fine-tuned)
Cost per classification	$0.00003-0.0003	$0
Offline	No	Yes
Privacy	Data sent to third party	On-device

On-device wins on every metric for classification: faster, more accurate (when fine-tuned), free, offline, and private.

Architecture Pattern

Background Classification Service

For apps that need to classify content automatically:

// Android: Background classification
class ClassificationService {
    private val model: LlamaModel = LlamaModel()

    fun classifyExpense(description: String): ExpenseCategory {
        val prompt = "Classify: $description"
        val result = model.generate(prompt, maxTokens = 10, temperature = 0f)
        return ExpenseCategory.fromString(result.trim())
    }

    fun classifyBatch(items: List<String>): List<ExpenseCategory> {
        return items.map { classifyExpense(it) }
    }
}

Real-Time Classification

For features that classify as the user types (auto-suggest category while entering an expense):

// iOS: Real-time classification with debounce
class ClassificationViewModel: ObservableObject {
    @Published var suggestedCategory: String = ""
    private var classifyTask: Task<Void, Never>?

    func onTextChanged(_ text: String) {
        classifyTask?.cancel()
        classifyTask = Task {
            try? await Task.sleep(nanoseconds: 300_000_000) // 300ms debounce
            guard !Task.isCancelled else { return }
            let category = await classifier.classify(text)
            await MainActor.run { suggestedCategory = category }
        }
    }
}

Getting Started

Define your categories. List the 5-20 labels your classifier should produce.
Create training examples. 500 examples minimum, covering all categories with realistic inputs.
Fine-tune a 1B model. Use a platform like Ertas: upload your examples, select a 1B base model, train with LoRA, export GGUF.
Integrate llama.cpp. Add the inference library to your iOS or Android project.
Test on real data. Run your evaluation set through the model. Target 90%+ accuracy.
Deploy. Bundle the model or deliver via post-install download.

Classification is the fastest path to production-quality on-device AI. The model is small, the task is simple, and the results are measurably better than cloud APIs for domain-specific categories.

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →

On-Device Text Classification for Mobile Apps

Why Classification Is Ideal for On-Device

Classification Use Cases

Sentiment Analysis

Content Categorization

Intent Detection

Content Filtering

Language Detection

Implementation

The Prompt Pattern

Fine-Tuning for Classification

Structured Output

Speed Optimization

Performance Expectations

Accuracy by Training Data Size

Speed by Device

Comparison to Cloud APIs

Architecture Pattern

Background Classification Service

Real-Time Classification

Getting Started

Ship AI that runs on your users' devices.

Ship AI that runs on your users' devices.

Keep reading

Building an On-Device AI Assistant for Your Mobile App

On-Device Semantic Search: AI-Powered Search Without a Server

On-Device Content Generation: AI Drafts That Work Offline