
Google Gemini API for Mobile: Pricing, Limits, and When to Go On-Device
Google's Gemini API offers aggressive pricing and native Android integration. Here's what the pricing actually looks like at scale, where the free tier ends, and when on-device models make more sense.
Google's Gemini is the cheapest major AI API. Gemini 2.0 Flash costs $0.10 per million input tokens and $0.40 per million output tokens. That is 33% cheaper than GPT-4o-mini. There is also a free tier with generous limits.
For Android developers, Google offers a native SDK that integrates directly with your Kotlin code. No REST wrangling required.
This sounds ideal for mobile apps. Here is where the reality gets more nuanced.
The Pricing Advantage
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Gemini 2.0 Flash | $0.10 | $0.40 |
| Gemini 2.0 Flash-Lite | $0.075 | $0.30 |
| Gemini 1.5 Pro | $1.25 | $5.00 |
| GPT-4o-mini (comparison) | $0.15 | $0.60 |
| Claude 3.5 Haiku (comparison) | $0.80 | $4.00 |
Gemini Flash is genuinely the cheapest option for per-token inference from a major provider. Flash-Lite is even cheaper if you can accept slightly reduced capability.
The Free Tier
Google offers a free tier for Gemini API through Google AI Studio:
- Rate limit: 15 requests per minute
- Daily limit: 1,500 requests per day
- Token limit: 1 million tokens per minute
- No credit card required
This is generous for development and testing. It can even support a small production app with limited traffic. At 1,500 requests per day, you could serve roughly 50 MAU at 30 requests per user per day.
The catch: the free tier has no SLA, no guaranteed uptime, and Google can change the terms at any time. It is not a production foundation.
The Native Android SDK
Google provides the Google AI Client SDK for Android, which is the cleanest mobile integration of any AI provider:
val model = GenerativeModel(
modelName = "gemini-2.0-flash",
apiKey = BuildConfig.GEMINI_API_KEY
)
// Simple generation
val response = model.generateContent("Summarize this article: $text")
println(response.text)
// Streaming
model.generateContentStream("Draft a reply to: $email").collect { chunk ->
responseText += chunk.text ?: ""
}
This is cleaner than raw REST calls to OpenAI. The SDK handles serialization, error handling, and streaming.
For iOS, there is a Swift SDK available through Swift Package Manager that follows the same pattern.
Cost at Scale
Even at the cheapest per-token rate, linear scaling with users still adds up.
Using the same baseline (3 interactions/day, 1,000 tokens each, Gemini Flash at $0.10/$0.40):
| MAU | Naive Monthly Cost | Real Cost (3x multiplier) |
|---|---|---|
| 1,000 | $22.50 | $67.50 |
| 5,000 | $112.50 | $337.50 |
| 10,000 | $225.00 | $675.00 |
| 50,000 | $1,125.00 | $3,375.00 |
| 100,000 | $2,250.00 | $6,750.00 |
At 100K MAU, Gemini Flash costs $6,750/month with the real-cost multiplier. That is cheaper than GPT-4o-mini ($10,125) but still a material expense that grows with every user.
Gemini Nano: Google's On-Device Option
Google has its own on-device model: Gemini Nano. It runs directly on the phone via Android AICore. Zero API costs, zero latency.
The limitations are significant:
Device restrictions: Only available on Pixel 8/9 series and Samsung Galaxy S24/S25 series. That is a fraction of the Android market.
No custom models: You cannot fine-tune Gemini Nano. You cannot use your own models. You get Google's pre-configured capabilities.
Limited tasks: Summarization, smart reply, and a few other pre-defined capabilities. No open-ended text generation with custom behavior.
API restrictions: Access is through the AICore API, which is not the same as the Gemini Cloud API. Different integration, different capabilities.
For developers who need on-device AI that works across all Android devices with custom model behavior, Gemini Nano is not the solution.
Gemini API vs Gemini Nano vs GGUF On-Device
| Factor | Gemini API (Flash) | Gemini Nano | GGUF + llama.cpp |
|---|---|---|---|
| Cost per inference | $0.0001-$0.001 | $0 | $0 |
| Device coverage | All (with internet) | Pixel 8+, Galaxy S24+ | Any 4GB+ device |
| Custom models | No | No | Yes (any GGUF) |
| Fine-tuning | No | No | Full LoRA/QLoRA |
| Offline | No | Yes | Yes |
| Tasks supported | All (cloud model) | Limited pre-defined | All text generation |
| Model control | None | None | Full |
| Domain accuracy | Good (prompted) | N/A | Excellent (fine-tuned) |
When to Use Each
Gemini API is the best cloud API choice for cost-sensitive mobile apps. If you need a cloud API for prototyping or low-volume features, Gemini Flash is the most economical option. The native Android SDK makes integration smoother than competing providers.
Gemini Nano is useful if your app exclusively targets Pixel and Samsung flagships and you only need summarization or smart reply. For most developers, the device restrictions make it impractical as a primary AI strategy.
GGUF + llama.cpp is the right choice when you need custom AI behavior across all devices, offline support, privacy, and zero per-inference cost. Fine-tune a model on your domain data using a platform like Ertas, export as GGUF, and run on any device with enough RAM.
The Practical Path
Start with Gemini Flash for the cheapest possible cloud AI validation. Use the free tier during development and early testing. Monitor your token usage and cost per user from day one.
When you cross 5,000 MAU or when your monthly Gemini bill exceeds the one-time cost of fine-tuning, it is time to evaluate on-device. Your Gemini API logs contain the training data you need for fine-tuning. The migration path is straightforward: extract training data, fine-tune with LoRA, export GGUF, integrate llama.cpp, and A/B test against your Gemini baseline.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

OpenAI API for Mobile Apps: Quick Start and the Costs Nobody Mentions
A practical guide to integrating OpenAI's API into iOS and Android apps, with honest cost projections at 1K to 100K users that most tutorials skip.

Claude API vs OpenAI API for Mobile Apps
A side-by-side comparison of Anthropic's Claude and OpenAI's GPT models for mobile app integration. Pricing, rate limits, capabilities, and when neither is the right answer.

Your AI API Bill Will 10x When Your App Gets Users
The cost math most AI tutorials skip. Your API bill scales linearly with every user, and the real multipliers are worse than the pricing page suggests. Here's what happens at 1K, 10K, and 100K MAU.