OpenAI API for Mobile Apps: Quick Start and the Costs Nobody Mentions

Every tutorial on adding AI to a mobile app starts the same way: get an API key, make a POST request, display the response. Simple. What none of them mention is what happens to your bill when actual users start using the feature.

This guide gives you both halves. The quick-start integration you need to ship, and the cost math you need to survive scaling.

The Quick Start

Integrating OpenAI's API from a mobile app is straightforward. Here is the pattern for both platforms.

Swift (iOS)

func sendMessage(_ message: String) async throws -> String {
    var request = URLRequest(url: URL(string: "https://api.openai.com/v1/chat/completions")!)
    request.httpMethod = "POST"
    request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")
    request.setValue("application/json", forHTTPHeaderField: "Content-Type")

    let body: [String: Any] = [
        "model": "gpt-4o-mini",
        "messages": [
            ["role": "system", "content": systemPrompt],
            ["role": "user", "content": message]
        ]
    ]
    request.httpBody = try JSONSerialization.data(withJSONObject: body)

    let (data, _) = try await URLSession.shared.data(for: request)
    let response = try JSONDecoder().decode(ChatResponse.self, from: data)
    return response.choices.first?.message.content ?? ""
}

Kotlin (Android)

suspend fun sendMessage(message: String): String {
    val client = OkHttpClient()
    val json = JSONObject().apply {
        put("model", "gpt-4o-mini")
        put("messages", JSONArray().apply {
            put(JSONObject().put("role", "system").put("content", systemPrompt))
            put(JSONObject().put("role", "user").put("content", message))
        })
    }
    val request = Request.Builder()
        .url("https://api.openai.com/v1/chat/completions")
        .post(json.toString().toRequestBody("application/json".toMediaType()))
        .addHeader("Authorization", "Bearer $apiKey")
        .build()
    val response = client.newCall(request).execute()
    return JSONObject(response.body!!.string())
        .getJSONArray("choices")
        .getJSONObject(0)
        .getJSONObject("message")
        .getString("content")
}

That is the easy part. Ship it and it works. Now here is the part tutorials skip.

The Pricing Landscape (Early 2026)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Best For
GPT-4o	$2.50	$10.00	Complex reasoning
GPT-4.1-mini	$0.40	$1.60	Balanced quality/cost
GPT-4o-mini	$0.15	$0.60	Cost-sensitive apps

Output tokens are 2.5-4x more expensive than input tokens. This matters because most cost estimates undercount the output side.

The Naive Estimate vs Reality

The naive cost calculation: (input tokens + output tokens) * price per token * number of requests. This is wrong. It significantly undercounts the real cost because it ignores the hidden multipliers.

Hidden Multiplier 1: System Prompts

Your system prompt is sent with every single API call. It is not cached across requests. A typical mobile app system prompt runs 500-1,500 tokens. Some reach 2,000+.

At 1,000 tokens per system prompt and 10,000 MAU making 3 requests per day: that is 900 million extra input tokens per month just for the system prompt. At GPT-4o-mini rates, that alone costs $135/month.

Hidden Multiplier 2: Conversation History

Chat-based features include previous messages for context. By the third turn in a conversation, you are re-sending the first two turns plus the system prompt. By the fifth turn, you are re-sending everything.

A 5-turn conversation with 500 tokens per message sends: Turn 1 = 500 tokens, Turn 2 = 1,500, Turn 3 = 2,500, Turn 4 = 3,500, Turn 5 = 4,500. Total input: 12,500 tokens for what feels like 5 short messages.

Hidden Multiplier 3: Retries and Failures

At scale, 2-5% of API calls fail (rate limits, timeouts, server errors) and require retries. Each retry is a full re-send of the entire prompt including system prompt and history.

Hidden Multiplier 4: The Real Total

When you combine system prompts, conversation history growth, and retry overhead, real-world costs are typically 3-5x the naive estimate.

Cost Tables at Scale

Using a mobile AI assistant pattern: 3 interactions per day, 1,000 tokens per interaction (naive), with the 3x multiplier for hidden costs applied.

GPT-4o-mini ($0.15 / $0.60 per 1M tokens)

MAU	Naive Monthly Cost	Real Monthly Cost (3x)
1,000	$33.75	$101
5,000	$168.75	$506
10,000	$337.50	$1,013
50,000	$1,687.50	$5,063
100,000	$3,375.00	$10,125

GPT-4o ($2.50 / $10.00 per 1M tokens)

MAU	Naive Monthly Cost	Real Monthly Cost (3x)
1,000	$562.50	$1,688
5,000	$2,812.50	$8,438
10,000	$5,625.00	$16,875
50,000	$28,125.00	$84,375
100,000	$56,250.00	$168,750

If your app charges $4.99/month, at 10K MAU your revenue is $49,900. GPT-4o eats $16,875 of that, 34% of revenue, just for AI inference. GPT-4o-mini is better at $1,013, but that is still 2% of revenue growing linearly with users.

When the Math Stops Working

The fundamental issue is the cost structure, not the price per token. Cloud AI is a variable cost that grows with every user. Every pricing optimization (switching models, reducing prompt length, caching) buys time but does not change the underlying curve.

The alternative is on-device inference. Fine-tune a small model on your domain data, export as GGUF, run it locally via llama.cpp. One-time cost of $5-50 for fine-tuning. Zero per-inference cost after that, regardless of MAU.

A fine-tuned 3B model achieves 94% accuracy on domain-specific tasks versus 71% for GPT-4 with prompt engineering. For most mobile app use cases (chat assistants, classification, content drafting), the on-device model is not just cheaper. It is better at the specific task.

Platforms like Ertas handle the fine-tuning pipeline in a visual interface: upload your data, train on cloud GPUs, export GGUF, ship in your app. The transition from cloud API to on-device does not require an ML team.

What to Do Now

Start with GPT-4o-mini. It is the best balance of cost and capability for validation. Build the feature, ship it, confirm your users engage with it.

Track your actual token usage from day one. Not the naive estimate. The real numbers including system prompts, conversation history, and retries. Build a dashboard that shows cost per user per month.

When that number exceeds $0.05 per user per month, start planning the migration to on-device. Your API logs already contain the training data you need.

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →

OpenAI API for Mobile Apps: Quick Start and the Costs Nobody Mentions

The Quick Start

Swift (iOS)

Kotlin (Android)

The Pricing Landscape (Early 2026)

The Naive Estimate vs Reality

Hidden Multiplier 1: System Prompts

Hidden Multiplier 2: Conversation History

Hidden Multiplier 3: Retries and Failures

Hidden Multiplier 4: The Real Total

Cost Tables at Scale

GPT-4o-mini ($0.15 / $0.60 per 1M tokens)

GPT-4o ($2.50 / $10.00 per 1M tokens)

When the Math Stops Working

What to Do Now

Ship AI that runs on your users' devices.

Ship AI that runs on your users' devices.

Keep reading

How to Add AI to Your Mobile App: A Developer's Decision Guide

Google Gemini API for Mobile: Pricing, Limits, and When to Go On-Device

AI in iOS Apps: CoreML, Cloud APIs, and On-Device LLMs Compared