On-Device AI Model Size Guide: 1B vs 3B vs 7B for Mobile

Choosing the right model size for your mobile app is the most consequential technical decision in on-device AI. Too small and the model cannot handle your task. Too large and it runs slowly, uses too much memory, or excludes too many devices.

The right answer depends on your task, your target devices, and whether you fine-tune.

The Size Spectrum

Parameter Count	GGUF Q4 Size	RAM Needed	Device Requirement
1B	~600MB	~800MB	4GB+ RAM (any modern phone)
3B	~1.7GB	~2.2GB	6GB+ RAM (mid-range 2023+)
7B	~4GB	~5GB	8GB+ RAM (flagship only)

These sizes assume Q4_K_M quantization, which provides the best balance of size reduction and quality retention. Higher quantization (Q5, Q8) increases size by 25-100% with marginal quality improvement.

What Each Size Can Do

1B Models

Strengths:

Text classification (sentiment, category, intent)
Autocomplete and text prediction
Smart suggestions (reply suggestions, action suggestions)
Named entity recognition
Simple Q&A with short responses
Keyword extraction and tagging

Limitations:

Limited reasoning ability
Short, sometimes repetitive generation
Struggles with nuanced instructions
Cannot maintain coherent long-form output

Best for: Features that transform input into a structured output. Classification, tagging, suggestions, and short-form generation.

3B Models

Strengths:

Conversational chat with multi-turn coherence
Summarization of articles and documents
Content drafting (emails, messages, notes)
Translation between common language pairs
Complex instruction following
Structured output generation (JSON, formatted text)

Limitations:

Slower than 1B (roughly half the speed)
Cannot match frontier model reasoning (GPT-4, Claude Sonnet)
May struggle with highly technical or specialized content without fine-tuning
Uses 2-3x more memory than 1B

Best for: Features that generate human-readable text. Chat, summarization, content creation, and complex classification.

7B Models

Strengths:

Stronger reasoning and inference
Better at ambiguous or open-ended tasks
More robust instruction following
Can handle longer, more coherent outputs

Limitations:

Only runs on flagship devices with 8GB+ RAM
Slow generation (5-12 tok/s on most devices)
Excludes 50-70% of the device market
Memory pressure causes app instability

Best for: Rarely the right choice for mobile. The device coverage and performance trade-offs are severe. If you need 7B quality, fine-tune a 3B model on your domain data instead.

Quality Comparison

General Benchmarks (Base Models, Not Fine-Tuned)

Task	1B	3B	7B
Text classification accuracy	78-85%	85-90%	88-93%
Summarization quality (human eval)	5.5/10	7/10	8/10
Instruction following rate	70%	85%	90%
Conversation coherence (5 turns)	Poor	Good	Very Good
JSON output reliability	60%	82%	90%

After Fine-Tuning on Domain Data

Task	1B Fine-Tuned	3B Fine-Tuned	Cloud API (Prompted)
Domain classification accuracy	90-94%	93-96%	71-80%
Domain-specific Q&A	82-88%	88-94%	75-82%
Structured output reliability	85-90%	92-96%	80-88%

The critical insight: a fine-tuned 1B model outperforms a prompted cloud API on domain-specific tasks. A fine-tuned 3B model significantly outperforms it. Fine-tuning closes the quality gap while keeping the model small enough for mobile.

The Fine-Tuning Factor

Fine-tuning changes the size selection math:

Without fine-tuning, you need a larger model to handle your task because the model relies on general knowledge and prompt instructions. You compensate for lack of domain knowledge with more parameters.

With fine-tuning, you bake domain knowledge into the model weights. The model does not need to figure out your domain from a prompt. It already knows it. This means a smaller fine-tuned model often matches or exceeds a larger general model on your specific task.

Practical implications:

Need chat? Start with 3B fine-tuned. You may find it matches your cloud API quality on your domain.
Need classification? Start with 1B fine-tuned. It will likely exceed your cloud API accuracy.
Think you need 7B? Fine-tune 3B first. Test it. You will probably not need 7B.

Device Coverage by Model Size

Model Size	iPhone Coverage	Android Coverage	Total Addressable
1B	iPhone 12+ (95%+ active)	4GB+ (85%+ active)	~90% of smartphones
3B	iPhone 14+ (70%+ active)	6GB+ (60%+ active)	~65% of smartphones
7B	iPhone 15 Pro+ (15% active)	8GB+ flagship (20% active)	~18% of smartphones

Choosing 1B over 3B roughly doubles your addressable device market. Choosing 3B over 7B triples it.

Decision Framework

Step 1: Define Your Task

What will the model do in your app?

Task Type	Minimum Size	Recommended Size
Classification / tagging	1B	1B fine-tuned
Autocomplete / suggestions	1B	1B fine-tuned
Short Q&A (1-2 sentences)	1B	1B fine-tuned
Chat (multi-turn)	3B	3B fine-tuned
Summarization	3B	3B fine-tuned
Content drafting	3B	3B fine-tuned
Translation	1-3B	3B fine-tuned
Complex reasoning	3B+	3B fine-tuned (test first)

Step 2: Know Your Audience

What devices do your users have? Check your analytics for device RAM distribution. If 80%+ of your users have 6GB+ RAM, 3B is safe. If you target developing markets or budget-conscious users, 1B is the safer choice.

Step 3: Fine-Tune and Test

Do not guess. Fine-tune both 1B and 3B on your domain data using a platform like Ertas. Test both against your quality benchmarks. Choose the smallest model that meets your quality bar.

The fine-tuning investment is small ($5-50 per training run) and the testing gives you empirical evidence instead of assumptions.

Step 4: Offer Both

The ideal architecture detects available RAM at runtime and loads the appropriate model:

4-6GB devices: 1B fine-tuned
6GB+ devices: 3B fine-tuned
Fallback: cloud API for devices below 4GB (or no AI feature)

This maximizes both quality and device coverage.

Summary

	1B	3B	7B
File size (Q4)	~600MB	~1.7GB	~4GB
Speed (flagship)	35-50 tok/s	18-30 tok/s	6-12 tok/s
Device coverage	~90%	~65%	~18%
Best use case	Classification, suggestions	Chat, generation	Rarely appropriate for mobile
Fine-tuned quality	Exceeds prompted cloud APIs	Significantly exceeds	Not needed if 3B is fine-tuned

Start with the smallest model that handles your task. Fine-tune it. Test it. Only go larger if the quality is genuinely insufficient.

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →

On-Device AI Model Size Guide: 1B vs 3B vs 7B for Mobile

The Size Spectrum

What Each Size Can Do

1B Models

3B Models

7B Models

Quality Comparison

General Benchmarks (Base Models, Not Fine-Tuned)

After Fine-Tuning on Domain Data

The Fine-Tuning Factor

Device Coverage by Model Size

Decision Framework

Step 1: Define Your Task

Step 2: Know Your Audience

Step 3: Fine-Tune and Test

Step 4: Offer Both

Summary

Ship AI that runs on your users' devices.

Ship AI that runs on your users' devices.

Keep reading

Gemma 3 for Mobile: Fine-Tuning and On-Device Deployment

How to Add AI to Your Mobile App: A Developer's Decision Guide

Offline AI: Building Mobile Features That Work Without Internet