
OpenAI API for Mobile Apps: Quick Start and the Costs Nobody Mentions
A practical guide to integrating OpenAI's API into iOS and Android apps, with honest cost projections at 1K to 100K users that most tutorials skip.
Every tutorial on adding AI to a mobile app starts the same way: get an API key, make a POST request, display the response. Simple. What none of them mention is what happens to your bill when actual users start using the feature.
This guide gives you both halves. The quick-start integration you need to ship, and the cost math you need to survive scaling.
The Quick Start
Integrating OpenAI's API from a mobile app is straightforward. Here is the pattern for both platforms.
Swift (iOS)
func sendMessage(_ message: String) async throws -> String {
var request = URLRequest(url: URL(string: "https://api.openai.com/v1/chat/completions")!)
request.httpMethod = "POST"
request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")
request.setValue("application/json", forHTTPHeaderField: "Content-Type")
let body: [String: Any] = [
"model": "gpt-4o-mini",
"messages": [
["role": "system", "content": systemPrompt],
["role": "user", "content": message]
]
]
request.httpBody = try JSONSerialization.data(withJSONObject: body)
let (data, _) = try await URLSession.shared.data(for: request)
let response = try JSONDecoder().decode(ChatResponse.self, from: data)
return response.choices.first?.message.content ?? ""
}
Kotlin (Android)
suspend fun sendMessage(message: String): String {
val client = OkHttpClient()
val json = JSONObject().apply {
put("model", "gpt-4o-mini")
put("messages", JSONArray().apply {
put(JSONObject().put("role", "system").put("content", systemPrompt))
put(JSONObject().put("role", "user").put("content", message))
})
}
val request = Request.Builder()
.url("https://api.openai.com/v1/chat/completions")
.post(json.toString().toRequestBody("application/json".toMediaType()))
.addHeader("Authorization", "Bearer $apiKey")
.build()
val response = client.newCall(request).execute()
return JSONObject(response.body!!.string())
.getJSONArray("choices")
.getJSONObject(0)
.getJSONObject("message")
.getString("content")
}
That is the easy part. Ship it and it works. Now here is the part tutorials skip.
The Pricing Landscape (Early 2026)
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Best For |
|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | Complex reasoning |
| GPT-4.1-mini | $0.40 | $1.60 | Balanced quality/cost |
| GPT-4o-mini | $0.15 | $0.60 | Cost-sensitive apps |
Output tokens are 2.5-4x more expensive than input tokens. This matters because most cost estimates undercount the output side.
The Naive Estimate vs Reality
The naive cost calculation: (input tokens + output tokens) * price per token * number of requests. This is wrong. It significantly undercounts the real cost because it ignores the hidden multipliers.
Hidden Multiplier 1: System Prompts
Your system prompt is sent with every single API call. It is not cached across requests. A typical mobile app system prompt runs 500-1,500 tokens. Some reach 2,000+.
At 1,000 tokens per system prompt and 10,000 MAU making 3 requests per day: that is 900 million extra input tokens per month just for the system prompt. At GPT-4o-mini rates, that alone costs $135/month.
Hidden Multiplier 2: Conversation History
Chat-based features include previous messages for context. By the third turn in a conversation, you are re-sending the first two turns plus the system prompt. By the fifth turn, you are re-sending everything.
A 5-turn conversation with 500 tokens per message sends: Turn 1 = 500 tokens, Turn 2 = 1,500, Turn 3 = 2,500, Turn 4 = 3,500, Turn 5 = 4,500. Total input: 12,500 tokens for what feels like 5 short messages.
Hidden Multiplier 3: Retries and Failures
At scale, 2-5% of API calls fail (rate limits, timeouts, server errors) and require retries. Each retry is a full re-send of the entire prompt including system prompt and history.
Hidden Multiplier 4: The Real Total
When you combine system prompts, conversation history growth, and retry overhead, real-world costs are typically 3-5x the naive estimate.
Cost Tables at Scale
Using a mobile AI assistant pattern: 3 interactions per day, 1,000 tokens per interaction (naive), with the 3x multiplier for hidden costs applied.
GPT-4o-mini ($0.15 / $0.60 per 1M tokens)
| MAU | Naive Monthly Cost | Real Monthly Cost (3x) |
|---|---|---|
| 1,000 | $33.75 | $101 |
| 5,000 | $168.75 | $506 |
| 10,000 | $337.50 | $1,013 |
| 50,000 | $1,687.50 | $5,063 |
| 100,000 | $3,375.00 | $10,125 |
GPT-4o ($2.50 / $10.00 per 1M tokens)
| MAU | Naive Monthly Cost | Real Monthly Cost (3x) |
|---|---|---|
| 1,000 | $562.50 | $1,688 |
| 5,000 | $2,812.50 | $8,438 |
| 10,000 | $5,625.00 | $16,875 |
| 50,000 | $28,125.00 | $84,375 |
| 100,000 | $56,250.00 | $168,750 |
If your app charges $4.99/month, at 10K MAU your revenue is $49,900. GPT-4o eats $16,875 of that, 34% of revenue, just for AI inference. GPT-4o-mini is better at $1,013, but that is still 2% of revenue growing linearly with users.
When the Math Stops Working
The fundamental issue is the cost structure, not the price per token. Cloud AI is a variable cost that grows with every user. Every pricing optimization (switching models, reducing prompt length, caching) buys time but does not change the underlying curve.
The alternative is on-device inference. Fine-tune a small model on your domain data, export as GGUF, run it locally via llama.cpp. One-time cost of $5-50 for fine-tuning. Zero per-inference cost after that, regardless of MAU.
A fine-tuned 3B model achieves 94% accuracy on domain-specific tasks versus 71% for GPT-4 with prompt engineering. For most mobile app use cases (chat assistants, classification, content drafting), the on-device model is not just cheaper. It is better at the specific task.
Platforms like Ertas handle the fine-tuning pipeline in a visual interface: upload your data, train on cloud GPUs, export GGUF, ship in your app. The transition from cloud API to on-device does not require an ML team.
What to Do Now
Start with GPT-4o-mini. It is the best balance of cost and capability for validation. Build the feature, ship it, confirm your users engage with it.
Track your actual token usage from day one. Not the naive estimate. The real numbers including system prompts, conversation history, and retries. Build a dashboard that shows cost per user per month.
When that number exceeds $0.05 per user per month, start planning the migration to on-device. Your API logs already contain the training data you need.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

How to Add AI to Your Mobile App: A Developer's Decision Guide
A comprehensive guide covering every approach to adding AI features to iOS and Android apps. Cloud APIs, on-device models, and hybrid architectures compared with real cost and performance data.

Google Gemini API for Mobile: Pricing, Limits, and When to Go On-Device
Google's Gemini API offers aggressive pricing and native Android integration. Here's what the pricing actually looks like at scale, where the free tier ends, and when on-device models make more sense.

AI in iOS Apps: CoreML, Cloud APIs, and On-Device LLMs Compared
Three paths to AI in your iOS app. CoreML for Apple's ecosystem, cloud APIs for capability, and on-device LLMs via llama.cpp for cost and privacy. A practical comparison for Swift developers.