Google Gemini API for Mobile: Pricing, Limits, and When to Go On-Device

Google's Gemini is the cheapest major AI API. Gemini 2.0 Flash costs $0.10 per million input tokens and $0.40 per million output tokens. That is 33% cheaper than GPT-4o-mini. There is also a free tier with generous limits.

For Android developers, Google offers a native SDK that integrates directly with your Kotlin code. No REST wrangling required.

This sounds ideal for mobile apps. Here is where the reality gets more nuanced.

The Pricing Advantage

Model	Input (per 1M tokens)	Output (per 1M tokens)
Gemini 2.0 Flash	$0.10	$0.40
Gemini 2.0 Flash-Lite	$0.075	$0.30
Gemini 1.5 Pro	$1.25	$5.00
GPT-4o-mini (comparison)	$0.15	$0.60
Claude 3.5 Haiku (comparison)	$0.80	$4.00

Gemini Flash is genuinely the cheapest option for per-token inference from a major provider. Flash-Lite is even cheaper if you can accept slightly reduced capability.

The Free Tier

Google offers a free tier for Gemini API through Google AI Studio:

Rate limit: 15 requests per minute
Daily limit: 1,500 requests per day
Token limit: 1 million tokens per minute
No credit card required

This is generous for development and testing. It can even support a small production app with limited traffic. At 1,500 requests per day, you could serve roughly 50 MAU at 30 requests per user per day.

The catch: the free tier has no SLA, no guaranteed uptime, and Google can change the terms at any time. It is not a production foundation.

The Native Android SDK

Google provides the Google AI Client SDK for Android, which is the cleanest mobile integration of any AI provider:

val model = GenerativeModel(
    modelName = "gemini-2.0-flash",
    apiKey = BuildConfig.GEMINI_API_KEY
)

// Simple generation
val response = model.generateContent("Summarize this article: $text")
println(response.text)

// Streaming
model.generateContentStream("Draft a reply to: $email").collect { chunk ->
    responseText += chunk.text ?: ""
}

This is cleaner than raw REST calls to OpenAI. The SDK handles serialization, error handling, and streaming.

For iOS, there is a Swift SDK available through Swift Package Manager that follows the same pattern.

Cost at Scale

Even at the cheapest per-token rate, linear scaling with users still adds up.

Using the same baseline (3 interactions/day, 1,000 tokens each, Gemini Flash at $0.10/$0.40):

MAU	Naive Monthly Cost	Real Cost (3x multiplier)
1,000	$22.50	$67.50
5,000	$112.50	$337.50
10,000	$225.00	$675.00
50,000	$1,125.00	$3,375.00
100,000	$2,250.00	$6,750.00

At 100K MAU, Gemini Flash costs $6,750/month with the real-cost multiplier. That is cheaper than GPT-4o-mini ($10,125) but still a material expense that grows with every user.

Gemini Nano: Google's On-Device Option

Google has its own on-device model: Gemini Nano. It runs directly on the phone via Android AICore. Zero API costs, zero latency.

The limitations are significant:

Device restrictions: Only available on Pixel 8/9 series and Samsung Galaxy S24/S25 series. That is a fraction of the Android market.

No custom models: You cannot fine-tune Gemini Nano. You cannot use your own models. You get Google's pre-configured capabilities.

Limited tasks: Summarization, smart reply, and a few other pre-defined capabilities. No open-ended text generation with custom behavior.

API restrictions: Access is through the AICore API, which is not the same as the Gemini Cloud API. Different integration, different capabilities.

For developers who need on-device AI that works across all Android devices with custom model behavior, Gemini Nano is not the solution.

Gemini API vs Gemini Nano vs GGUF On-Device

Factor	Gemini API (Flash)	Gemini Nano	GGUF + llama.cpp
Cost per inference	$0.0001-$0.001	$0	$0
Device coverage	All (with internet)	Pixel 8+, Galaxy S24+	Any 4GB+ device
Custom models	No	No	Yes (any GGUF)
Fine-tuning	No	No	Full LoRA/QLoRA
Offline	No	Yes	Yes
Tasks supported	All (cloud model)	Limited pre-defined	All text generation
Model control	None	None	Full
Domain accuracy	Good (prompted)	N/A	Excellent (fine-tuned)

When to Use Each

Gemini API is the best cloud API choice for cost-sensitive mobile apps. If you need a cloud API for prototyping or low-volume features, Gemini Flash is the most economical option. The native Android SDK makes integration smoother than competing providers.

Gemini Nano is useful if your app exclusively targets Pixel and Samsung flagships and you only need summarization or smart reply. For most developers, the device restrictions make it impractical as a primary AI strategy.

GGUF + llama.cpp is the right choice when you need custom AI behavior across all devices, offline support, privacy, and zero per-inference cost. Fine-tune a model on your domain data using a platform like Ertas, export as GGUF, and run on any device with enough RAM.

The Practical Path

Start with Gemini Flash for the cheapest possible cloud AI validation. Use the free tier during development and early testing. Monitor your token usage and cost per user from day one.

When you cross 5,000 MAU or when your monthly Gemini bill exceeds the one-time cost of fine-tuning, it is time to evaluate on-device. Your Gemini API logs contain the training data you need for fine-tuning. The migration path is straightforward: extract training data, fine-tune with LoRA, export GGUF, integrate llama.cpp, and A/B test against your Gemini baseline.