
AI API Pricing for Mobile: The Real Cost Per User
How to calculate the true cost of AI per mobile app user. Provider comparison, hidden multipliers, and the unit economics that determine whether your AI feature is sustainable.
Your AI feature costs $0.003 per request. That sounds like nothing. But each user makes 3-5 requests per day. You have 10,000 MAU. The monthly bill is $2,700-$4,500. If your app charges $4.99/month, AI just ate 5-9% of gross revenue.
The cost per user is the number that matters. Not cost per token, not cost per request. Cost per user per month determines whether your AI feature is sustainable at scale.
Calculating Cost Per User
The formula:
Cost per user per month = (tokens per request) * (requests per user per day) * 30 * (price per token)
But this formula only works if you account for all token sources. Most developers miss three of them.
Token Sources Per Request
User input: The actual text the user sends. Typically 50-300 tokens for mobile (short messages, search queries, brief prompts).
System prompt: Sent with every request. Usually 800-1,500 tokens. This is your app's instructions to the model: persona, formatting rules, guardrails, context about the app.
Conversation history: For chat-style features, all prior messages are re-sent with each request. A 5-turn conversation means turn 5 includes all 4 previous exchanges.
RAG context: If you inject retrieved documents or product knowledge, add 500-3,000 tokens per request.
Realistic Token Count Per Request
| Component | Tokens | Sent Every Request? |
|---|---|---|
| System prompt | 1,200 | Yes |
| User input | 200 | Yes |
| Conversation history (avg) | 1,500 | Yes (chat features) |
| RAG context | 1,000 | If applicable |
| Model output | 400 | Yes |
| Total (chat + RAG) | 4,300 | |
| Total (single-turn) | 1,800 |
The naive estimate of "1,000 tokens per request" undercounts by 2-4x.
Provider Comparison: Cost Per User Per Month
Using realistic token counts, 3 requests per user per day, 30 days per month.
Single-Turn Features (no chat history)
1,800 input + 400 output tokens per request. 90 requests per user per month.
| Provider/Model | Input Cost | Output Cost | Total/User/Month |
|---|---|---|---|
| Gemini 2.0 Flash | $0.016 | $0.014 | $0.030 |
| GPT-4o-mini | $0.024 | $0.022 | $0.046 |
| GPT-4.1-mini | $0.065 | $0.058 | $0.123 |
| Claude 3.5 Haiku | $0.130 | $0.144 | $0.274 |
| GPT-4o | $0.405 | $0.360 | $0.765 |
| Claude 3.5 Sonnet | $0.486 | $0.540 | $1.026 |
Chat Features (with conversation history)
4,300 input + 400 output tokens per request. 90 requests per user per month.
| Provider/Model | Input Cost | Output Cost | Total/User/Month |
|---|---|---|---|
| Gemini 2.0 Flash | $0.039 | $0.014 | $0.053 |
| GPT-4o-mini | $0.058 | $0.022 | $0.080 |
| GPT-4.1-mini | $0.155 | $0.058 | $0.213 |
| Claude 3.5 Haiku | $0.310 | $0.144 | $0.454 |
| GPT-4o | $0.968 | $0.360 | $1.328 |
| Claude 3.5 Sonnet | $1.161 | $0.540 | $1.701 |
What This Means at Scale
| MAU | Gemini Flash | GPT-4o-mini | Claude Haiku | GPT-4o |
|---|---|---|---|---|
| 1,000 | $53 | $80 | $454 | $1,328 |
| 10,000 | $530 | $800 | $4,540 | $13,280 |
| 50,000 | $2,650 | $4,000 | $22,700 | $66,400 |
| 100,000 | $5,300 | $8,000 | $45,400 | $132,800 |
The Sustainability Threshold
If your app charges $4.99/month per user, what percentage of revenue does AI consume?
| Model | Cost/User | % of $4.99 Revenue | Sustainable? |
|---|---|---|---|
| Gemini Flash (chat) | $0.053 | 1.1% | Yes |
| GPT-4o-mini (chat) | $0.080 | 1.6% | Yes |
| GPT-4.1-mini (chat) | $0.213 | 4.3% | Marginal |
| Claude Haiku (chat) | $0.454 | 9.1% | Risky |
| GPT-4o (chat) | $1.328 | 26.6% | No |
| Claude Sonnet (chat) | $1.701 | 34.1% | No |
At 1-2% of revenue, AI costs are sustainable. At 5-10%, they compete with other cost centers. Above 10%, they threaten margins.
But these numbers assume 3 requests per day per user. Power users who make 10-20 requests per day cost 3-7x more. If 10% of your users are power users, they can represent 30-50% of your AI spend.
Hidden Cost Multipliers
Retries
At scale, 2-5% of API calls fail (rate limits, timeouts, server errors). Each retry re-sends the full payload. Budget an extra 3-5% on total token spend.
Prompt Engineering Overhead
As you iterate on your system prompt, it tends to grow. What starts at 500 tokens ends up at 1,500. Every added instruction, guardrail, or example multiplies across every request, every user, every day.
Feature Expansion
One AI feature becomes three. Chat, summarization, and smart suggestions each have their own API calls. Total requests per user per day grow from 3 to 10+.
Free Tier / Freemium
If your app has a free tier with AI features, those users generate cost with zero revenue. A freemium model where 90% of users are free means your paying users must cover 10x their own AI costs.
The Break-Even: Cloud vs On-Device
On-device inference has a fixed cost structure: one-time fine-tuning ($5-50) plus CDN distribution (~$0.08/GB per model download). Per-inference cost is $0.
The break-even is simple: when your monthly cloud API bill exceeds the one-time cost of fine-tuning, on-device becomes cheaper.
| Scenario | Monthly Cloud Cost | One-Time Fine-Tuning | Break-Even |
|---|---|---|---|
| 500 MAU, GPT-4o-mini | $40 | $10-30 | Month 1 |
| 1K MAU, Gemini Flash | $53 | $10-30 | Month 1 |
| 5K MAU, GPT-4o-mini | $400 | $10-30 | Month 1 |
At any non-trivial user count, the math favors on-device. The question is not "if" but "when" in your growth trajectory you make the switch.
Platforms like Ertas make the switch practical: upload your training data (which you can extract from your existing API logs), fine-tune with LoRA on cloud GPUs, and export a GGUF model ready for mobile deployment. The pipeline takes hours, not weeks.
What To Track
From day one, track these numbers in your analytics:
- Cost per user per month (total AI spend / MAU)
- Cost per paying user (if freemium, only count paying users)
- Requests per user per day (identify power users)
- Tokens per request (watch for system prompt growth)
- AI cost as % of revenue per user
Set alerts. When cost per user crosses $0.10/month, start planning the on-device migration. When it crosses $0.50, execute.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Your AI API Bill Will 10x When Your App Gets Users
The cost math most AI tutorials skip. Your API bill scales linearly with every user, and the real multipliers are worse than the pricing page suggests. Here's what happens at 1K, 10K, and 100K MAU.

On-Device AI Unit Economics: The Math That Makes Mobile AI Profitable
The complete unit economics breakdown for on-device AI vs cloud APIs. Fixed costs, variable costs, break-even analysis, and the financial model for scaling mobile AI features profitably.

Fine-Tuning vs Prompt Engineering for Mobile Apps
Prompt engineering is fast and flexible. Fine-tuning is accurate and cheap at scale. Here is the practical comparison for mobile developers deciding between the two approaches.