Back to blog
    AI API Pricing for Mobile: The Real Cost Per User
    cost optimizationAPI pricingmobile AIunit economicssegment:mobile-builder

    AI API Pricing for Mobile: The Real Cost Per User

    How to calculate the true cost of AI per mobile app user. Provider comparison, hidden multipliers, and the unit economics that determine whether your AI feature is sustainable.

    EErtas Team·

    Your AI feature costs $0.003 per request. That sounds like nothing. But each user makes 3-5 requests per day. You have 10,000 MAU. The monthly bill is $2,700-$4,500. If your app charges $4.99/month, AI just ate 5-9% of gross revenue.

    The cost per user is the number that matters. Not cost per token, not cost per request. Cost per user per month determines whether your AI feature is sustainable at scale.

    Calculating Cost Per User

    The formula:

    Cost per user per month = (tokens per request) * (requests per user per day) * 30 * (price per token)

    But this formula only works if you account for all token sources. Most developers miss three of them.

    Token Sources Per Request

    User input: The actual text the user sends. Typically 50-300 tokens for mobile (short messages, search queries, brief prompts).

    System prompt: Sent with every request. Usually 800-1,500 tokens. This is your app's instructions to the model: persona, formatting rules, guardrails, context about the app.

    Conversation history: For chat-style features, all prior messages are re-sent with each request. A 5-turn conversation means turn 5 includes all 4 previous exchanges.

    RAG context: If you inject retrieved documents or product knowledge, add 500-3,000 tokens per request.

    Realistic Token Count Per Request

    ComponentTokensSent Every Request?
    System prompt1,200Yes
    User input200Yes
    Conversation history (avg)1,500Yes (chat features)
    RAG context1,000If applicable
    Model output400Yes
    Total (chat + RAG)4,300
    Total (single-turn)1,800

    The naive estimate of "1,000 tokens per request" undercounts by 2-4x.

    Provider Comparison: Cost Per User Per Month

    Using realistic token counts, 3 requests per user per day, 30 days per month.

    Single-Turn Features (no chat history)

    1,800 input + 400 output tokens per request. 90 requests per user per month.

    Provider/ModelInput CostOutput CostTotal/User/Month
    Gemini 2.0 Flash$0.016$0.014$0.030
    GPT-4o-mini$0.024$0.022$0.046
    GPT-4.1-mini$0.065$0.058$0.123
    Claude 3.5 Haiku$0.130$0.144$0.274
    GPT-4o$0.405$0.360$0.765
    Claude 3.5 Sonnet$0.486$0.540$1.026

    Chat Features (with conversation history)

    4,300 input + 400 output tokens per request. 90 requests per user per month.

    Provider/ModelInput CostOutput CostTotal/User/Month
    Gemini 2.0 Flash$0.039$0.014$0.053
    GPT-4o-mini$0.058$0.022$0.080
    GPT-4.1-mini$0.155$0.058$0.213
    Claude 3.5 Haiku$0.310$0.144$0.454
    GPT-4o$0.968$0.360$1.328
    Claude 3.5 Sonnet$1.161$0.540$1.701

    What This Means at Scale

    MAUGemini FlashGPT-4o-miniClaude HaikuGPT-4o
    1,000$53$80$454$1,328
    10,000$530$800$4,540$13,280
    50,000$2,650$4,000$22,700$66,400
    100,000$5,300$8,000$45,400$132,800

    The Sustainability Threshold

    If your app charges $4.99/month per user, what percentage of revenue does AI consume?

    ModelCost/User% of $4.99 RevenueSustainable?
    Gemini Flash (chat)$0.0531.1%Yes
    GPT-4o-mini (chat)$0.0801.6%Yes
    GPT-4.1-mini (chat)$0.2134.3%Marginal
    Claude Haiku (chat)$0.4549.1%Risky
    GPT-4o (chat)$1.32826.6%No
    Claude Sonnet (chat)$1.70134.1%No

    At 1-2% of revenue, AI costs are sustainable. At 5-10%, they compete with other cost centers. Above 10%, they threaten margins.

    But these numbers assume 3 requests per day per user. Power users who make 10-20 requests per day cost 3-7x more. If 10% of your users are power users, they can represent 30-50% of your AI spend.

    Hidden Cost Multipliers

    Retries

    At scale, 2-5% of API calls fail (rate limits, timeouts, server errors). Each retry re-sends the full payload. Budget an extra 3-5% on total token spend.

    Prompt Engineering Overhead

    As you iterate on your system prompt, it tends to grow. What starts at 500 tokens ends up at 1,500. Every added instruction, guardrail, or example multiplies across every request, every user, every day.

    Feature Expansion

    One AI feature becomes three. Chat, summarization, and smart suggestions each have their own API calls. Total requests per user per day grow from 3 to 10+.

    Free Tier / Freemium

    If your app has a free tier with AI features, those users generate cost with zero revenue. A freemium model where 90% of users are free means your paying users must cover 10x their own AI costs.

    The Break-Even: Cloud vs On-Device

    On-device inference has a fixed cost structure: one-time fine-tuning ($5-50) plus CDN distribution (~$0.08/GB per model download). Per-inference cost is $0.

    The break-even is simple: when your monthly cloud API bill exceeds the one-time cost of fine-tuning, on-device becomes cheaper.

    ScenarioMonthly Cloud CostOne-Time Fine-TuningBreak-Even
    500 MAU, GPT-4o-mini$40$10-30Month 1
    1K MAU, Gemini Flash$53$10-30Month 1
    5K MAU, GPT-4o-mini$400$10-30Month 1

    At any non-trivial user count, the math favors on-device. The question is not "if" but "when" in your growth trajectory you make the switch.

    Platforms like Ertas make the switch practical: upload your training data (which you can extract from your existing API logs), fine-tune with LoRA on cloud GPUs, and export a GGUF model ready for mobile deployment. The pipeline takes hours, not weeks.

    What To Track

    From day one, track these numbers in your analytics:

    1. Cost per user per month (total AI spend / MAU)
    2. Cost per paying user (if freemium, only count paying users)
    3. Requests per user per day (identify power users)
    4. Tokens per request (watch for system prompt growth)
    5. AI cost as % of revenue per user

    Set alerts. When cost per user crosses $0.10/month, start planning the on-device migration. When it crosses $0.50, execute.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading