Back to blog
    Google Gemini API for Mobile: Pricing, Limits, and When to Go On-Device
    Google GeminiAPImobile developmentAndroidcost optimizationsegment:mobile-builder

    Google Gemini API for Mobile: Pricing, Limits, and When to Go On-Device

    Google's Gemini API offers aggressive pricing and native Android integration. Here's what the pricing actually looks like at scale, where the free tier ends, and when on-device models make more sense.

    EErtas Team·

    Google's Gemini is the cheapest major AI API. Gemini 2.0 Flash costs $0.10 per million input tokens and $0.40 per million output tokens. That is 33% cheaper than GPT-4o-mini. There is also a free tier with generous limits.

    For Android developers, Google offers a native SDK that integrates directly with your Kotlin code. No REST wrangling required.

    This sounds ideal for mobile apps. Here is where the reality gets more nuanced.

    The Pricing Advantage

    ModelInput (per 1M tokens)Output (per 1M tokens)
    Gemini 2.0 Flash$0.10$0.40
    Gemini 2.0 Flash-Lite$0.075$0.30
    Gemini 1.5 Pro$1.25$5.00
    GPT-4o-mini (comparison)$0.15$0.60
    Claude 3.5 Haiku (comparison)$0.80$4.00

    Gemini Flash is genuinely the cheapest option for per-token inference from a major provider. Flash-Lite is even cheaper if you can accept slightly reduced capability.

    The Free Tier

    Google offers a free tier for Gemini API through Google AI Studio:

    • Rate limit: 15 requests per minute
    • Daily limit: 1,500 requests per day
    • Token limit: 1 million tokens per minute
    • No credit card required

    This is generous for development and testing. It can even support a small production app with limited traffic. At 1,500 requests per day, you could serve roughly 50 MAU at 30 requests per user per day.

    The catch: the free tier has no SLA, no guaranteed uptime, and Google can change the terms at any time. It is not a production foundation.

    The Native Android SDK

    Google provides the Google AI Client SDK for Android, which is the cleanest mobile integration of any AI provider:

    val model = GenerativeModel(
        modelName = "gemini-2.0-flash",
        apiKey = BuildConfig.GEMINI_API_KEY
    )
    
    // Simple generation
    val response = model.generateContent("Summarize this article: $text")
    println(response.text)
    
    // Streaming
    model.generateContentStream("Draft a reply to: $email").collect { chunk ->
        responseText += chunk.text ?: ""
    }
    

    This is cleaner than raw REST calls to OpenAI. The SDK handles serialization, error handling, and streaming.

    For iOS, there is a Swift SDK available through Swift Package Manager that follows the same pattern.

    Cost at Scale

    Even at the cheapest per-token rate, linear scaling with users still adds up.

    Using the same baseline (3 interactions/day, 1,000 tokens each, Gemini Flash at $0.10/$0.40):

    MAUNaive Monthly CostReal Cost (3x multiplier)
    1,000$22.50$67.50
    5,000$112.50$337.50
    10,000$225.00$675.00
    50,000$1,125.00$3,375.00
    100,000$2,250.00$6,750.00

    At 100K MAU, Gemini Flash costs $6,750/month with the real-cost multiplier. That is cheaper than GPT-4o-mini ($10,125) but still a material expense that grows with every user.

    Gemini Nano: Google's On-Device Option

    Google has its own on-device model: Gemini Nano. It runs directly on the phone via Android AICore. Zero API costs, zero latency.

    The limitations are significant:

    Device restrictions: Only available on Pixel 8/9 series and Samsung Galaxy S24/S25 series. That is a fraction of the Android market.

    No custom models: You cannot fine-tune Gemini Nano. You cannot use your own models. You get Google's pre-configured capabilities.

    Limited tasks: Summarization, smart reply, and a few other pre-defined capabilities. No open-ended text generation with custom behavior.

    API restrictions: Access is through the AICore API, which is not the same as the Gemini Cloud API. Different integration, different capabilities.

    For developers who need on-device AI that works across all Android devices with custom model behavior, Gemini Nano is not the solution.

    Gemini API vs Gemini Nano vs GGUF On-Device

    FactorGemini API (Flash)Gemini NanoGGUF + llama.cpp
    Cost per inference$0.0001-$0.001$0$0
    Device coverageAll (with internet)Pixel 8+, Galaxy S24+Any 4GB+ device
    Custom modelsNoNoYes (any GGUF)
    Fine-tuningNoNoFull LoRA/QLoRA
    OfflineNoYesYes
    Tasks supportedAll (cloud model)Limited pre-definedAll text generation
    Model controlNoneNoneFull
    Domain accuracyGood (prompted)N/AExcellent (fine-tuned)

    When to Use Each

    Gemini API is the best cloud API choice for cost-sensitive mobile apps. If you need a cloud API for prototyping or low-volume features, Gemini Flash is the most economical option. The native Android SDK makes integration smoother than competing providers.

    Gemini Nano is useful if your app exclusively targets Pixel and Samsung flagships and you only need summarization or smart reply. For most developers, the device restrictions make it impractical as a primary AI strategy.

    GGUF + llama.cpp is the right choice when you need custom AI behavior across all devices, offline support, privacy, and zero per-inference cost. Fine-tune a model on your domain data using a platform like Ertas, export as GGUF, and run on any device with enough RAM.

    The Practical Path

    Start with Gemini Flash for the cheapest possible cloud AI validation. Use the free tier during development and early testing. Monitor your token usage and cost per user from day one.

    When you cross 5,000 MAU or when your monthly Gemini bill exceeds the one-time cost of fine-tuning, it is time to evaluate on-device. Your Gemini API logs contain the training data you need for fine-tuning. The migration path is straightforward: extract training data, fine-tune with LoRA, export GGUF, integrate llama.cpp, and A/B test against your Gemini baseline.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading