
On-Device AI Unit Economics: The Math That Makes Mobile AI Profitable
The complete unit economics breakdown for on-device AI vs cloud APIs. Fixed costs, variable costs, break-even analysis, and the financial model for scaling mobile AI features profitably.
Cloud AI has variable costs. Every user, every request costs money. On-device AI has fixed costs. Fine-tune once, distribute once, run free forever. The financial structures are fundamentally different, and the implications for mobile app businesses are significant.
This article breaks down the complete cost model for both approaches.
Cloud API Cost Structure
Variable Costs (Scale with Users)
| Cost Component | Per-User Monthly | At 10K MAU | At 100K MAU |
|---|---|---|---|
| API tokens (GPT-4o-mini) | $0.05-0.10 | $500-1,000 | $5,000-10,000 |
| API tokens (Gemini Flash) | $0.03-0.06 | $300-600 | $3,000-6,000 |
| Server infrastructure (proxy/queue) | $0.01-0.02 | $100-200 | $1,000-2,000 |
| Total variable | $0.06-0.12 | $600-1,200 | $6,000-12,000 |
Fixed Costs (Do Not Scale)
| Cost Component | Monthly |
|---|---|
| Developer time (prompt engineering, maintenance) | $2,000-5,000 |
| Monitoring and logging | $50-200 |
| Total fixed | $2,050-5,200 |
Total Cloud AI Cost
At 10K MAU: $2,650-6,400/month At 100K MAU: $8,050-17,200/month
The variable component dominates at scale. At 100K MAU, variable costs are 75-85% of total AI spend.
On-Device Cost Structure
One-Time Costs
| Cost Component | Amount | Frequency |
|---|---|---|
| Training data preparation | $500-2,000 (developer time) | Once, then incremental |
| Fine-tuning compute | $5-50 | Per training run |
| llama.cpp integration | $1,000-3,000 (developer time) | Once |
| Testing across devices | $500-1,500 (developer time) | Per model update |
| Total one-time | $2,005-6,550 |
Recurring Fixed Costs
| Cost Component | Monthly |
|---|---|
| CDN for model distribution | $50-200 (at 100K downloads/month) |
| Model re-training (quarterly) | $5-50 per run = $2-17/month amortized |
| Developer maintenance | $500-1,000 |
| Total recurring | $552-1,217 |
Variable Costs
| Cost Component | Per-User Monthly |
|---|---|
| CDN bandwidth per new user | ~$0.08-0.15 (one-time model download) |
| Per-inference cost | $0.00 |
| Total variable | ~$0.00 (after initial download) |
Total On-Device Cost
At 10K MAU: $552-1,217/month + amortized one-time costs At 100K MAU: $552-1,217/month + amortized one-time costs
The cost is nearly flat regardless of user count. The CDN cost increases slightly with new user downloads but is minimal compared to API token costs.
Break-Even Analysis
When does on-device become cheaper than cloud APIs?
vs GPT-4o-mini
| MAU | Cloud Monthly | On-Device Monthly | Savings |
|---|---|---|---|
| 500 | $2,680 | $1,052 | $1,628 (61%) |
| 1,000 | $2,750 | $1,052 | $1,698 (62%) |
| 5,000 | $3,150 | $1,052 | $2,098 (67%) |
| 10,000 | $3,650 | $1,102 | $2,548 (70%) |
| 50,000 | $7,550 | $1,152 | $6,398 (85%) |
| 100,000 | $12,550 | $1,217 | $11,333 (90%) |
Break-even: Under 500 MAU. On-device is cheaper from essentially the first month, because the one-time fine-tuning cost ($5-50) is lower than even a single month of cloud API costs at any meaningful user count.
vs Gemini Flash (Cheapest Cloud API)
| MAU | Cloud Monthly | On-Device Monthly | Savings |
|---|---|---|---|
| 1,000 | $2,380 | $1,052 | $1,328 (56%) |
| 10,000 | $2,950 | $1,102 | $1,848 (63%) |
| 100,000 | $8,250 | $1,217 | $7,033 (85%) |
Even against the cheapest cloud API, on-device saves money from day one at any non-trivial user count.
The Scaling Advantage
The financial advantage of on-device compounds as you grow:
Cloud: Growing from 10K to 100K MAU adds $9,000-10,000/month in variable costs. On-device: Growing from 10K to 100K MAU adds ~$65-115/month in CDN costs.
This is the core insight. Cloud AI margins compress as you scale. On-device AI margins improve as you scale. The infrastructure cost is distributed across more users, each contributing $0 in variable cost.
Impact on App Business Models
Subscription Apps ($4.99/month)
| Model | AI Cost/User | As % of Revenue | Gross Margin Impact |
|---|---|---|---|
| Cloud (GPT-4o-mini) | $0.08 | 1.6% | -1.6% per user |
| Cloud (Gemini Flash) | $0.05 | 1.0% | -1.0% per user |
| On-device | ~$0.01 | 0.2% | -0.2% per user |
On-device reduces AI's margin impact by 5-8x.
Freemium Apps
Freemium apps are where the difference is starkest. Free users generate cost with zero revenue.
With cloud AI: Every free user costs $0.05-0.10/month in API calls. If 90% of users are free, paying users must cover 10x their own AI costs.
With on-device AI: Free users cost essentially nothing. The model runs on their device. The only cost was the one-time model download (~$0.08-0.15 CDN bandwidth).
This changes the freemium math entirely. You can offer AI features to free users without worrying about cost-per-free-user destroying your margins.
Ad-Supported Apps
Average ad revenue per user: $0.50-2.00/month. Cloud AI at $0.05-0.10/user eats 2.5-20% of ad revenue. On-device AI at ~$0.01/user eats 0.5-2%. The difference can be the margin between a sustainable and unsustainable business.
The Investment Payback
Think of on-device AI as a capital investment. The upfront cost ($2,000-6,500 for the full pipeline) pays back quickly:
| Cloud Cost Displaced | Payback Period |
|---|---|
| $500/month | 4-13 months |
| $1,000/month | 2-7 months |
| $3,000/month | Under 2 months |
| $10,000/month | Under 1 month |
At $3,000/month in cloud API costs (common at 30-50K MAU), the entire on-device investment pays for itself in less than two months.
Platforms like Ertas reduce the upfront investment by handling the fine-tuning infrastructure. You bring training data. Ertas provides the compute, training pipeline, and GGUF export. The one-time cost drops to the fine-tuning compute ($5-50) plus your time to prepare training data.
What to Model
Before committing to either approach, build a simple spreadsheet:
- Current cloud AI cost per user (from your billing dashboard)
- Projected user growth (monthly)
- Cloud cost curve (cost per user * projected MAU)
- On-device fixed cost (fine-tuning + integration + maintenance)
- Break-even month (when cumulative cloud costs exceed cumulative on-device costs)
For most mobile apps, the break-even is months, not years. The earlier you make the switch, the more you save over the lifetime of the product.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

AI API Pricing for Mobile: The Real Cost Per User
How to calculate the true cost of AI per mobile app user. Provider comparison, hidden multipliers, and the unit economics that determine whether your AI feature is sustainable.

Your AI API Bill Will 10x When Your App Gets Users
The cost math most AI tutorials skip. Your API bill scales linearly with every user, and the real multipliers are worse than the pricing page suggests. Here's what happens at 1K, 10K, and 100K MAU.

Why Your AI App Feels Slow: Network Latency Is the Bottleneck
AI API calls add 500-3,000ms of latency to every interaction. On mobile, that is the difference between a feature users love and one they abandon. Here is where the time goes and how to fix it.