
On-Device AI Model Size Guide: 1B vs 3B vs 7B for Mobile
How to choose the right model size for your mobile app. Capability breakdown, device requirements, quality benchmarks, and the fine-tuning factor that changes the math.
Choosing the right model size for your mobile app is the most consequential technical decision in on-device AI. Too small and the model cannot handle your task. Too large and it runs slowly, uses too much memory, or excludes too many devices.
The right answer depends on your task, your target devices, and whether you fine-tune.
The Size Spectrum
| Parameter Count | GGUF Q4 Size | RAM Needed | Device Requirement |
|---|---|---|---|
| 1B | ~600MB | ~800MB | 4GB+ RAM (any modern phone) |
| 3B | ~1.7GB | ~2.2GB | 6GB+ RAM (mid-range 2023+) |
| 7B | ~4GB | ~5GB | 8GB+ RAM (flagship only) |
These sizes assume Q4_K_M quantization, which provides the best balance of size reduction and quality retention. Higher quantization (Q5, Q8) increases size by 25-100% with marginal quality improvement.
What Each Size Can Do
1B Models
Strengths:
- Text classification (sentiment, category, intent)
- Autocomplete and text prediction
- Smart suggestions (reply suggestions, action suggestions)
- Named entity recognition
- Simple Q&A with short responses
- Keyword extraction and tagging
Limitations:
- Limited reasoning ability
- Short, sometimes repetitive generation
- Struggles with nuanced instructions
- Cannot maintain coherent long-form output
Best for: Features that transform input into a structured output. Classification, tagging, suggestions, and short-form generation.
3B Models
Strengths:
- Conversational chat with multi-turn coherence
- Summarization of articles and documents
- Content drafting (emails, messages, notes)
- Translation between common language pairs
- Complex instruction following
- Structured output generation (JSON, formatted text)
Limitations:
- Slower than 1B (roughly half the speed)
- Cannot match frontier model reasoning (GPT-4, Claude Sonnet)
- May struggle with highly technical or specialized content without fine-tuning
- Uses 2-3x more memory than 1B
Best for: Features that generate human-readable text. Chat, summarization, content creation, and complex classification.
7B Models
Strengths:
- Stronger reasoning and inference
- Better at ambiguous or open-ended tasks
- More robust instruction following
- Can handle longer, more coherent outputs
Limitations:
- Only runs on flagship devices with 8GB+ RAM
- Slow generation (5-12 tok/s on most devices)
- Excludes 50-70% of the device market
- Memory pressure causes app instability
Best for: Rarely the right choice for mobile. The device coverage and performance trade-offs are severe. If you need 7B quality, fine-tune a 3B model on your domain data instead.
Quality Comparison
General Benchmarks (Base Models, Not Fine-Tuned)
| Task | 1B | 3B | 7B |
|---|---|---|---|
| Text classification accuracy | 78-85% | 85-90% | 88-93% |
| Summarization quality (human eval) | 5.5/10 | 7/10 | 8/10 |
| Instruction following rate | 70% | 85% | 90% |
| Conversation coherence (5 turns) | Poor | Good | Very Good |
| JSON output reliability | 60% | 82% | 90% |
After Fine-Tuning on Domain Data
| Task | 1B Fine-Tuned | 3B Fine-Tuned | Cloud API (Prompted) |
|---|---|---|---|
| Domain classification accuracy | 90-94% | 93-96% | 71-80% |
| Domain-specific Q&A | 82-88% | 88-94% | 75-82% |
| Structured output reliability | 85-90% | 92-96% | 80-88% |
The critical insight: a fine-tuned 1B model outperforms a prompted cloud API on domain-specific tasks. A fine-tuned 3B model significantly outperforms it. Fine-tuning closes the quality gap while keeping the model small enough for mobile.
The Fine-Tuning Factor
Fine-tuning changes the size selection math:
Without fine-tuning, you need a larger model to handle your task because the model relies on general knowledge and prompt instructions. You compensate for lack of domain knowledge with more parameters.
With fine-tuning, you bake domain knowledge into the model weights. The model does not need to figure out your domain from a prompt. It already knows it. This means a smaller fine-tuned model often matches or exceeds a larger general model on your specific task.
Practical implications:
- Need chat? Start with 3B fine-tuned. You may find it matches your cloud API quality on your domain.
- Need classification? Start with 1B fine-tuned. It will likely exceed your cloud API accuracy.
- Think you need 7B? Fine-tune 3B first. Test it. You will probably not need 7B.
Device Coverage by Model Size
| Model Size | iPhone Coverage | Android Coverage | Total Addressable |
|---|---|---|---|
| 1B | iPhone 12+ (95%+ active) | 4GB+ (85%+ active) | ~90% of smartphones |
| 3B | iPhone 14+ (70%+ active) | 6GB+ (60%+ active) | ~65% of smartphones |
| 7B | iPhone 15 Pro+ (15% active) | 8GB+ flagship (20% active) | ~18% of smartphones |
Choosing 1B over 3B roughly doubles your addressable device market. Choosing 3B over 7B triples it.
Decision Framework
Step 1: Define Your Task
What will the model do in your app?
| Task Type | Minimum Size | Recommended Size |
|---|---|---|
| Classification / tagging | 1B | 1B fine-tuned |
| Autocomplete / suggestions | 1B | 1B fine-tuned |
| Short Q&A (1-2 sentences) | 1B | 1B fine-tuned |
| Chat (multi-turn) | 3B | 3B fine-tuned |
| Summarization | 3B | 3B fine-tuned |
| Content drafting | 3B | 3B fine-tuned |
| Translation | 1-3B | 3B fine-tuned |
| Complex reasoning | 3B+ | 3B fine-tuned (test first) |
Step 2: Know Your Audience
What devices do your users have? Check your analytics for device RAM distribution. If 80%+ of your users have 6GB+ RAM, 3B is safe. If you target developing markets or budget-conscious users, 1B is the safer choice.
Step 3: Fine-Tune and Test
Do not guess. Fine-tune both 1B and 3B on your domain data using a platform like Ertas. Test both against your quality benchmarks. Choose the smallest model that meets your quality bar.
The fine-tuning investment is small ($5-50 per training run) and the testing gives you empirical evidence instead of assumptions.
Step 4: Offer Both
The ideal architecture detects available RAM at runtime and loads the appropriate model:
- 4-6GB devices: 1B fine-tuned
- 6GB+ devices: 3B fine-tuned
- Fallback: cloud API for devices below 4GB (or no AI feature)
This maximizes both quality and device coverage.
Summary
| 1B | 3B | 7B | |
|---|---|---|---|
| File size (Q4) | ~600MB | ~1.7GB | ~4GB |
| Speed (flagship) | 35-50 tok/s | 18-30 tok/s | 6-12 tok/s |
| Device coverage | ~90% | ~65% | ~18% |
| Best use case | Classification, suggestions | Chat, generation | Rarely appropriate for mobile |
| Fine-tuned quality | Exceeds prompted cloud APIs | Significantly exceeds | Not needed if 3B is fine-tuned |
Start with the smallest model that handles your task. Fine-tune it. Test it. Only go larger if the quality is genuinely insufficient.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Gemma 3 for Mobile: Fine-Tuning and On-Device Deployment
How to use Google's Gemma 3 models for on-device mobile AI. Model selection, fine-tuning with LoRA, GGUF export, and deployment via llama.cpp on iOS and Android.

How to Add AI to Your Mobile App: A Developer's Decision Guide
A comprehensive guide covering every approach to adding AI features to iOS and Android apps. Cloud APIs, on-device models, and hybrid architectures compared with real cost and performance data.

Offline AI: Building Mobile Features That Work Without Internet
How to build AI features that work without an internet connection. On-device models, offline-first architecture patterns, and the use cases where offline AI is not optional.