Back to blog
    On-Device AI Model Size Guide: 1B vs 3B vs 7B for Mobile
    model sizeon-device AImobile AIGGUFarchitecturesegment:mobile-builder

    On-Device AI Model Size Guide: 1B vs 3B vs 7B for Mobile

    How to choose the right model size for your mobile app. Capability breakdown, device requirements, quality benchmarks, and the fine-tuning factor that changes the math.

    EErtas Team·

    Choosing the right model size for your mobile app is the most consequential technical decision in on-device AI. Too small and the model cannot handle your task. Too large and it runs slowly, uses too much memory, or excludes too many devices.

    The right answer depends on your task, your target devices, and whether you fine-tune.

    The Size Spectrum

    Parameter CountGGUF Q4 SizeRAM NeededDevice Requirement
    1B~600MB~800MB4GB+ RAM (any modern phone)
    3B~1.7GB~2.2GB6GB+ RAM (mid-range 2023+)
    7B~4GB~5GB8GB+ RAM (flagship only)

    These sizes assume Q4_K_M quantization, which provides the best balance of size reduction and quality retention. Higher quantization (Q5, Q8) increases size by 25-100% with marginal quality improvement.

    What Each Size Can Do

    1B Models

    Strengths:

    • Text classification (sentiment, category, intent)
    • Autocomplete and text prediction
    • Smart suggestions (reply suggestions, action suggestions)
    • Named entity recognition
    • Simple Q&A with short responses
    • Keyword extraction and tagging

    Limitations:

    • Limited reasoning ability
    • Short, sometimes repetitive generation
    • Struggles with nuanced instructions
    • Cannot maintain coherent long-form output

    Best for: Features that transform input into a structured output. Classification, tagging, suggestions, and short-form generation.

    3B Models

    Strengths:

    • Conversational chat with multi-turn coherence
    • Summarization of articles and documents
    • Content drafting (emails, messages, notes)
    • Translation between common language pairs
    • Complex instruction following
    • Structured output generation (JSON, formatted text)

    Limitations:

    • Slower than 1B (roughly half the speed)
    • Cannot match frontier model reasoning (GPT-4, Claude Sonnet)
    • May struggle with highly technical or specialized content without fine-tuning
    • Uses 2-3x more memory than 1B

    Best for: Features that generate human-readable text. Chat, summarization, content creation, and complex classification.

    7B Models

    Strengths:

    • Stronger reasoning and inference
    • Better at ambiguous or open-ended tasks
    • More robust instruction following
    • Can handle longer, more coherent outputs

    Limitations:

    • Only runs on flagship devices with 8GB+ RAM
    • Slow generation (5-12 tok/s on most devices)
    • Excludes 50-70% of the device market
    • Memory pressure causes app instability

    Best for: Rarely the right choice for mobile. The device coverage and performance trade-offs are severe. If you need 7B quality, fine-tune a 3B model on your domain data instead.

    Quality Comparison

    General Benchmarks (Base Models, Not Fine-Tuned)

    Task1B3B7B
    Text classification accuracy78-85%85-90%88-93%
    Summarization quality (human eval)5.5/107/108/10
    Instruction following rate70%85%90%
    Conversation coherence (5 turns)PoorGoodVery Good
    JSON output reliability60%82%90%

    After Fine-Tuning on Domain Data

    Task1B Fine-Tuned3B Fine-TunedCloud API (Prompted)
    Domain classification accuracy90-94%93-96%71-80%
    Domain-specific Q&A82-88%88-94%75-82%
    Structured output reliability85-90%92-96%80-88%

    The critical insight: a fine-tuned 1B model outperforms a prompted cloud API on domain-specific tasks. A fine-tuned 3B model significantly outperforms it. Fine-tuning closes the quality gap while keeping the model small enough for mobile.

    The Fine-Tuning Factor

    Fine-tuning changes the size selection math:

    Without fine-tuning, you need a larger model to handle your task because the model relies on general knowledge and prompt instructions. You compensate for lack of domain knowledge with more parameters.

    With fine-tuning, you bake domain knowledge into the model weights. The model does not need to figure out your domain from a prompt. It already knows it. This means a smaller fine-tuned model often matches or exceeds a larger general model on your specific task.

    Practical implications:

    • Need chat? Start with 3B fine-tuned. You may find it matches your cloud API quality on your domain.
    • Need classification? Start with 1B fine-tuned. It will likely exceed your cloud API accuracy.
    • Think you need 7B? Fine-tune 3B first. Test it. You will probably not need 7B.

    Device Coverage by Model Size

    Model SizeiPhone CoverageAndroid CoverageTotal Addressable
    1BiPhone 12+ (95%+ active)4GB+ (85%+ active)~90% of smartphones
    3BiPhone 14+ (70%+ active)6GB+ (60%+ active)~65% of smartphones
    7BiPhone 15 Pro+ (15% active)8GB+ flagship (20% active)~18% of smartphones

    Choosing 1B over 3B roughly doubles your addressable device market. Choosing 3B over 7B triples it.

    Decision Framework

    Step 1: Define Your Task

    What will the model do in your app?

    Task TypeMinimum SizeRecommended Size
    Classification / tagging1B1B fine-tuned
    Autocomplete / suggestions1B1B fine-tuned
    Short Q&A (1-2 sentences)1B1B fine-tuned
    Chat (multi-turn)3B3B fine-tuned
    Summarization3B3B fine-tuned
    Content drafting3B3B fine-tuned
    Translation1-3B3B fine-tuned
    Complex reasoning3B+3B fine-tuned (test first)

    Step 2: Know Your Audience

    What devices do your users have? Check your analytics for device RAM distribution. If 80%+ of your users have 6GB+ RAM, 3B is safe. If you target developing markets or budget-conscious users, 1B is the safer choice.

    Step 3: Fine-Tune and Test

    Do not guess. Fine-tune both 1B and 3B on your domain data using a platform like Ertas. Test both against your quality benchmarks. Choose the smallest model that meets your quality bar.

    The fine-tuning investment is small ($5-50 per training run) and the testing gives you empirical evidence instead of assumptions.

    Step 4: Offer Both

    The ideal architecture detects available RAM at runtime and loads the appropriate model:

    • 4-6GB devices: 1B fine-tuned
    • 6GB+ devices: 3B fine-tuned
    • Fallback: cloud API for devices below 4GB (or no AI feature)

    This maximizes both quality and device coverage.

    Summary

    1B3B7B
    File size (Q4)~600MB~1.7GB~4GB
    Speed (flagship)35-50 tok/s18-30 tok/s6-12 tok/s
    Device coverage~90%~65%~18%
    Best use caseClassification, suggestionsChat, generationRarely appropriate for mobile
    Fine-tuned qualityExceeds prompted cloud APIsSignificantly exceedsNot needed if 3B is fine-tuned

    Start with the smallest model that handles your task. Fine-tune it. Test it. Only go larger if the quality is genuinely insufficient.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading