AI in React Native: From Cloud APIs to On-Device Models

React Native gives you one codebase for iOS and Android. Adding AI features should follow the same principle. But the path from "call an API" to "run inference on-device" is different in React Native than in native Swift or Kotlin. The JavaScript bridge, the native module system, and the cross-platform model delivery all require specific patterns.

This guide covers both approaches and the practical migration between them.

Cloud API Integration

The fastest way to add AI to a React Native app is calling a cloud API. React Native's fetch API works directly with OpenAI, Anthropic, Google Gemini, and other providers.

Basic Pattern

async function generateResponse(prompt: string): Promise<string> {
  const response = await fetch("https://api.openai.com/v1/chat/completions", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${API_KEY}`,
    },
    body: JSON.stringify({
      model: "gpt-4o-mini",
      messages: [{ role: "user", content: prompt }],
    }),
  });

  const data = await response.json();
  return data.choices[0].message.content;
}

This works on both iOS and Android with zero platform-specific code. For streaming responses, use the EventSource pattern or a library like react-native-sse.

Streaming for Chat UIs

import EventSource from "react-native-sse";

function streamResponse(prompt: string, onToken: (token: string) => void) {
  const es = new EventSource("https://api.openai.com/v1/chat/completions", {
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${API_KEY}`,
    },
    method: "POST",
    body: JSON.stringify({
      model: "gpt-4o-mini",
      messages: [{ role: "user", content: prompt }],
      stream: true,
    }),
  });

  es.addEventListener("message", (event) => {
    if (event.data === "[DONE]") return es.close();
    const parsed = JSON.parse(event.data);
    const token = parsed.choices[0]?.delta?.content;
    if (token) onToken(token);
  });
}

The Cloud API Ceiling

Cloud APIs work well for prototyping and low-volume apps. The problems are the same as native apps but slightly worse in React Native:

Network dependency: React Native apps are often built for cross-platform reach, including markets with unreliable connectivity
Latency: The JS bridge adds ~5-10ms on top of the 500-3,000ms network round trip
Cost scaling: Every user, every request, every token costs money
Privacy: User data crosses the network on every API call

On-Device AI in React Native

Running a model locally in React Native means using a native module that wraps llama.cpp. The JavaScript side sends prompts and receives tokens. The native side handles the actual inference on the device's CPU and GPU.

llama.rn

The llama.rn package provides React Native bindings for llama.cpp. It exposes a JavaScript API that loads GGUF models and runs inference natively on both iOS (Metal) and Android (CPU/Vulkan).

import { initLlama } from "llama.rn";

// Load the model
const context = await initLlama({
  model: modelPath, // Path to .gguf file on device
  n_ctx: 2048,
  n_threads: 4,
  n_gpu_layers: 32,
});

// Generate a response
const result = await context.completion({
  prompt: "Summarize this note: ...",
  n_predict: 256,
  temperature: 0.7,
});

console.log(result.text);

Streaming Tokens

const result = await context.completion(
  {
    prompt: userPrompt,
    n_predict: 512,
  },
  (token) => {
    // Called for each generated token
    setResponseText((prev) => prev + token.token);
  }
);

This gives you the same token-by-token streaming experience as a cloud API, but with 50-200ms time to first token instead of 500-3,000ms.

Model Delivery

Getting the GGUF model file onto the device is the main engineering challenge in React Native.

Bundled with app: Include the model in the app's assets. For iOS, add it to the Xcode project. For Android, place it in the assets directory or use Android Asset Delivery for files over 150MB. React Native's asset system can reference the file path at runtime.

Post-install download: Download the model on first launch. Use react-native-blob-util or expo-file-system for background downloads with progress tracking:

import * as FileSystem from "expo-file-system";

const modelUri = FileSystem.documentDirectory + "model.gguf";

const download = FileSystem.createDownloadResumable(
  MODEL_CDN_URL,
  modelUri,
  {},
  (progress) => {
    const pct = progress.totalBytesWritten / progress.totalBytesExpectedToWrite;
    setDownloadProgress(pct);
  }
);

const result = await download.downloadAsync();

Performance Expectations

On-device performance in React Native is nearly identical to native apps. The llama.cpp inference runs in native code, not through the JS bridge. The bridge is only used for sending prompts and receiving tokens.

Device	1B Model (tok/s)	3B Model (tok/s)
iPhone 15 Pro (A17)	35-45	18-25
iPhone 14 (A15)	25-35	12-18
Galaxy S24 (SD 8 Gen 3)	35-45	18-25
Pixel 8 (Tensor G3)	25-35	12-18
Mid-range Android (SD 7 Gen 3)	18-25	8-12

The JS bridge overhead for token delivery is negligible (under 1ms per token).

Architecture: Abstracting the AI Layer

The best React Native architecture abstracts the AI provider behind a common interface. This lets you swap between cloud and on-device without changing your UI code.

interface AiProvider {
  generate(prompt: string, onToken?: (token: string) => void): Promise<string>;
  isReady(): boolean;
}

class CloudProvider implements AiProvider {
  async generate(prompt: string, onToken?: (token: string) => void) {
    // Cloud API call
  }
  isReady() { return true; } // Always ready if online
}

class OnDeviceProvider implements AiProvider {
  async generate(prompt: string, onToken?: (token: string) => void) {
    // llama.rn inference
  }
  isReady() { return this.modelLoaded; }
}

This pattern supports a gradual migration. Start with CloudProvider for validation. Add OnDeviceProvider when you are ready. A/B test. Eventually make on-device the default with cloud as a fallback.

The Cross-Platform Advantage

React Native's cross-platform model works in your favor for on-device AI. You fine-tune one model, export one GGUF file, and deploy it to both iOS and Android. The inference library handles the platform differences (Metal on iOS, CPU/Vulkan on Android).

Compare this to cloud APIs where you pay per-token on every platform. On-device, you pay once for fine-tuning and model distribution. The cost does not multiply with platforms.

Migration Path

Start with cloud API via fetch. Validate the feature, collect usage data.
Add the abstraction layer (AiProvider interface) early. This costs nothing and pays off later.
Collect training data from your API logs. Every cloud API call is a potential training example.
Fine-tune a small model on your domain data. Platforms like Ertas provide a visual pipeline: upload data, train with LoRA, export GGUF.
Integrate llama.rn and add the on-device provider behind the same interface.
A/B test cloud vs on-device on real users.
Ship on-device as default with cloud fallback for unsupported devices.

The end result: one codebase, two platforms, AI that works offline, zero per-inference cost.

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →

AI in React Native: From Cloud APIs to On-Device Models

Cloud API Integration

Basic Pattern

Streaming for Chat UIs

The Cloud API Ceiling

On-Device AI in React Native

llama.rn

Streaming Tokens

Model Delivery

Performance Expectations

Architecture: Abstracting the AI Layer

The Cross-Platform Advantage

Migration Path

Ship AI that runs on your users' devices.

Ship AI that runs on your users' devices.

Keep reading

On-Device AI in React Native with llama.rn

AI in iOS Apps: CoreML, Cloud APIs, and On-Device LLMs Compared

AI in Android Apps: ML Kit, Cloud APIs, and On-Device LLMs Compared