AI API Pricing for Mobile: The Real Cost Per User

Your AI feature costs $0.003 per request. That sounds like nothing. But each user makes 3-5 requests per day. You have 10,000 MAU. The monthly bill is $2,700-$4,500. If your app charges $4.99/month, AI just ate 5-9% of gross revenue.

The cost per user is the number that matters. Not cost per token, not cost per request. Cost per user per month determines whether your AI feature is sustainable at scale.

Calculating Cost Per User

The formula:

Cost per user per month = (tokens per request) * (requests per user per day) * 30 * (price per token)

But this formula only works if you account for all token sources. Most developers miss three of them.

Token Sources Per Request

User input: The actual text the user sends. Typically 50-300 tokens for mobile (short messages, search queries, brief prompts).

System prompt: Sent with every request. Usually 800-1,500 tokens. This is your app's instructions to the model: persona, formatting rules, guardrails, context about the app.

Conversation history: For chat-style features, all prior messages are re-sent with each request. A 5-turn conversation means turn 5 includes all 4 previous exchanges.

RAG context: If you inject retrieved documents or product knowledge, add 500-3,000 tokens per request.

Realistic Token Count Per Request

Component	Tokens	Sent Every Request?
System prompt	1,200	Yes
User input	200	Yes
Conversation history (avg)	1,500	Yes (chat features)
RAG context	1,000	If applicable
Model output	400	Yes
Total (chat + RAG)	4,300
Total (single-turn)	1,800

The naive estimate of "1,000 tokens per request" undercounts by 2-4x.

Provider Comparison: Cost Per User Per Month

Using realistic token counts, 3 requests per user per day, 30 days per month.

Single-Turn Features (no chat history)

1,800 input + 400 output tokens per request. 90 requests per user per month.

Provider/Model	Input Cost	Output Cost	Total/User/Month
Gemini 2.0 Flash	$0.016	$0.014	$0.030
GPT-4o-mini	$0.024	$0.022	$0.046
GPT-4.1-mini	$0.065	$0.058	$0.123
Claude 3.5 Haiku	$0.130	$0.144	$0.274
GPT-4o	$0.405	$0.360	$0.765
Claude 3.5 Sonnet	$0.486	$0.540	$1.026

Chat Features (with conversation history)

4,300 input + 400 output tokens per request. 90 requests per user per month.

Provider/Model	Input Cost	Output Cost	Total/User/Month
Gemini 2.0 Flash	$0.039	$0.014	$0.053
GPT-4o-mini	$0.058	$0.022	$0.080
GPT-4.1-mini	$0.155	$0.058	$0.213
Claude 3.5 Haiku	$0.310	$0.144	$0.454
GPT-4o	$0.968	$0.360	$1.328
Claude 3.5 Sonnet	$1.161	$0.540	$1.701

What This Means at Scale

MAU	Gemini Flash	GPT-4o-mini	Claude Haiku	GPT-4o
1,000	$53	$80	$454	$1,328
10,000	$530	$800	$4,540	$13,280
50,000	$2,650	$4,000	$22,700	$66,400
100,000	$5,300	$8,000	$45,400	$132,800

The Sustainability Threshold

If your app charges $4.99/month per user, what percentage of revenue does AI consume?

Model	Cost/User	% of $4.99 Revenue	Sustainable?
Gemini Flash (chat)	$0.053	1.1%	Yes
GPT-4o-mini (chat)	$0.080	1.6%	Yes
GPT-4.1-mini (chat)	$0.213	4.3%	Marginal
Claude Haiku (chat)	$0.454	9.1%	Risky
GPT-4o (chat)	$1.328	26.6%	No
Claude Sonnet (chat)	$1.701	34.1%	No

At 1-2% of revenue, AI costs are sustainable. At 5-10%, they compete with other cost centers. Above 10%, they threaten margins.

But these numbers assume 3 requests per day per user. Power users who make 10-20 requests per day cost 3-7x more. If 10% of your users are power users, they can represent 30-50% of your AI spend.

Hidden Cost Multipliers

Retries

At scale, 2-5% of API calls fail (rate limits, timeouts, server errors). Each retry re-sends the full payload. Budget an extra 3-5% on total token spend.

Prompt Engineering Overhead

As you iterate on your system prompt, it tends to grow. What starts at 500 tokens ends up at 1,500. Every added instruction, guardrail, or example multiplies across every request, every user, every day.

Feature Expansion

One AI feature becomes three. Chat, summarization, and smart suggestions each have their own API calls. Total requests per user per day grow from 3 to 10+.

Free Tier / Freemium

If your app has a free tier with AI features, those users generate cost with zero revenue. A freemium model where 90% of users are free means your paying users must cover 10x their own AI costs.

The Break-Even: Cloud vs On-Device

On-device inference has a fixed cost structure: one-time fine-tuning ($5-50) plus CDN distribution (~$0.08/GB per model download). Per-inference cost is $0.

The break-even is simple: when your monthly cloud API bill exceeds the one-time cost of fine-tuning, on-device becomes cheaper.

Scenario	Monthly Cloud Cost	One-Time Fine-Tuning	Break-Even
500 MAU, GPT-4o-mini	$40	$10-30	Month 1
1K MAU, Gemini Flash	$53	$10-30	Month 1
5K MAU, GPT-4o-mini	$400	$10-30	Month 1

At any non-trivial user count, the math favors on-device. The question is not "if" but "when" in your growth trajectory you make the switch.

Platforms like Ertas make the switch practical: upload your training data (which you can extract from your existing API logs), fine-tune with LoRA on cloud GPUs, and export a GGUF model ready for mobile deployment. The pipeline takes hours, not weeks.

What To Track

From day one, track these numbers in your analytics:

Cost per user per month (total AI spend / MAU)
Cost per paying user (if freemium, only count paying users)
Requests per user per day (identify power users)
Tokens per request (watch for system prompt growth)
AI cost as % of revenue per user

Set alerts. When cost per user crosses $0.10/month, start planning the on-device migration. When it crosses $0.50, execute.

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →

AI API Pricing for Mobile: The Real Cost Per User

Calculating Cost Per User

Token Sources Per Request

Realistic Token Count Per Request

Provider Comparison: Cost Per User Per Month

Single-Turn Features (no chat history)

Chat Features (with conversation history)

What This Means at Scale

The Sustainability Threshold

Hidden Cost Multipliers

Retries

Prompt Engineering Overhead

Feature Expansion

Free Tier / Freemium

The Break-Even: Cloud vs On-Device

What To Track

Ship AI that runs on your users' devices.

Ship AI that runs on your users' devices.

Keep reading

Your AI API Bill Will 10x When Your App Gets Users

On-Device AI Unit Economics: The Math That Makes Mobile AI Profitable

Fine-Tuning vs Prompt Engineering for Mobile Apps