Back to blog
    AI in React Native: From Cloud APIs to On-Device Models
    React Nativecross-platformcloud APIon-device AIllama.cppsegment:mobile-builder

    AI in React Native: From Cloud APIs to On-Device Models

    How to add AI features to React Native apps. Cloud API integration with fetch, on-device inference with llama.cpp bindings, and a practical migration path from one to the other.

    EErtas Team·

    React Native gives you one codebase for iOS and Android. Adding AI features should follow the same principle. But the path from "call an API" to "run inference on-device" is different in React Native than in native Swift or Kotlin. The JavaScript bridge, the native module system, and the cross-platform model delivery all require specific patterns.

    This guide covers both approaches and the practical migration between them.

    Cloud API Integration

    The fastest way to add AI to a React Native app is calling a cloud API. React Native's fetch API works directly with OpenAI, Anthropic, Google Gemini, and other providers.

    Basic Pattern

    async function generateResponse(prompt: string): Promise<string> {
      const response = await fetch("https://api.openai.com/v1/chat/completions", {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          Authorization: `Bearer ${API_KEY}`,
        },
        body: JSON.stringify({
          model: "gpt-4o-mini",
          messages: [{ role: "user", content: prompt }],
        }),
      });
    
      const data = await response.json();
      return data.choices[0].message.content;
    }
    

    This works on both iOS and Android with zero platform-specific code. For streaming responses, use the EventSource pattern or a library like react-native-sse.

    Streaming for Chat UIs

    import EventSource from "react-native-sse";
    
    function streamResponse(prompt: string, onToken: (token: string) => void) {
      const es = new EventSource("https://api.openai.com/v1/chat/completions", {
        headers: {
          "Content-Type": "application/json",
          Authorization: `Bearer ${API_KEY}`,
        },
        method: "POST",
        body: JSON.stringify({
          model: "gpt-4o-mini",
          messages: [{ role: "user", content: prompt }],
          stream: true,
        }),
      });
    
      es.addEventListener("message", (event) => {
        if (event.data === "[DONE]") return es.close();
        const parsed = JSON.parse(event.data);
        const token = parsed.choices[0]?.delta?.content;
        if (token) onToken(token);
      });
    }
    

    The Cloud API Ceiling

    Cloud APIs work well for prototyping and low-volume apps. The problems are the same as native apps but slightly worse in React Native:

    • Network dependency: React Native apps are often built for cross-platform reach, including markets with unreliable connectivity
    • Latency: The JS bridge adds ~5-10ms on top of the 500-3,000ms network round trip
    • Cost scaling: Every user, every request, every token costs money
    • Privacy: User data crosses the network on every API call

    On-Device AI in React Native

    Running a model locally in React Native means using a native module that wraps llama.cpp. The JavaScript side sends prompts and receives tokens. The native side handles the actual inference on the device's CPU and GPU.

    llama.rn

    The llama.rn package provides React Native bindings for llama.cpp. It exposes a JavaScript API that loads GGUF models and runs inference natively on both iOS (Metal) and Android (CPU/Vulkan).

    import { initLlama } from "llama.rn";
    
    // Load the model
    const context = await initLlama({
      model: modelPath, // Path to .gguf file on device
      n_ctx: 2048,
      n_threads: 4,
      n_gpu_layers: 32,
    });
    
    // Generate a response
    const result = await context.completion({
      prompt: "Summarize this note: ...",
      n_predict: 256,
      temperature: 0.7,
    });
    
    console.log(result.text);
    

    Streaming Tokens

    const result = await context.completion(
      {
        prompt: userPrompt,
        n_predict: 512,
      },
      (token) => {
        // Called for each generated token
        setResponseText((prev) => prev + token.token);
      }
    );
    

    This gives you the same token-by-token streaming experience as a cloud API, but with 50-200ms time to first token instead of 500-3,000ms.

    Model Delivery

    Getting the GGUF model file onto the device is the main engineering challenge in React Native.

    Bundled with app: Include the model in the app's assets. For iOS, add it to the Xcode project. For Android, place it in the assets directory or use Android Asset Delivery for files over 150MB. React Native's asset system can reference the file path at runtime.

    Post-install download: Download the model on first launch. Use react-native-blob-util or expo-file-system for background downloads with progress tracking:

    import * as FileSystem from "expo-file-system";
    
    const modelUri = FileSystem.documentDirectory + "model.gguf";
    
    const download = FileSystem.createDownloadResumable(
      MODEL_CDN_URL,
      modelUri,
      {},
      (progress) => {
        const pct = progress.totalBytesWritten / progress.totalBytesExpectedToWrite;
        setDownloadProgress(pct);
      }
    );
    
    const result = await download.downloadAsync();
    

    Performance Expectations

    On-device performance in React Native is nearly identical to native apps. The llama.cpp inference runs in native code, not through the JS bridge. The bridge is only used for sending prompts and receiving tokens.

    Device1B Model (tok/s)3B Model (tok/s)
    iPhone 15 Pro (A17)35-4518-25
    iPhone 14 (A15)25-3512-18
    Galaxy S24 (SD 8 Gen 3)35-4518-25
    Pixel 8 (Tensor G3)25-3512-18
    Mid-range Android (SD 7 Gen 3)18-258-12

    The JS bridge overhead for token delivery is negligible (under 1ms per token).

    Architecture: Abstracting the AI Layer

    The best React Native architecture abstracts the AI provider behind a common interface. This lets you swap between cloud and on-device without changing your UI code.

    interface AiProvider {
      generate(prompt: string, onToken?: (token: string) => void): Promise<string>;
      isReady(): boolean;
    }
    
    class CloudProvider implements AiProvider {
      async generate(prompt: string, onToken?: (token: string) => void) {
        // Cloud API call
      }
      isReady() { return true; } // Always ready if online
    }
    
    class OnDeviceProvider implements AiProvider {
      async generate(prompt: string, onToken?: (token: string) => void) {
        // llama.rn inference
      }
      isReady() { return this.modelLoaded; }
    }
    

    This pattern supports a gradual migration. Start with CloudProvider for validation. Add OnDeviceProvider when you are ready. A/B test. Eventually make on-device the default with cloud as a fallback.

    The Cross-Platform Advantage

    React Native's cross-platform model works in your favor for on-device AI. You fine-tune one model, export one GGUF file, and deploy it to both iOS and Android. The inference library handles the platform differences (Metal on iOS, CPU/Vulkan on Android).

    Compare this to cloud APIs where you pay per-token on every platform. On-device, you pay once for fine-tuning and model distribution. The cost does not multiply with platforms.

    Migration Path

    1. Start with cloud API via fetch. Validate the feature, collect usage data.
    2. Add the abstraction layer (AiProvider interface) early. This costs nothing and pays off later.
    3. Collect training data from your API logs. Every cloud API call is a potential training example.
    4. Fine-tune a small model on your domain data. Platforms like Ertas provide a visual pipeline: upload data, train with LoRA, export GGUF.
    5. Integrate llama.rn and add the on-device provider behind the same interface.
    6. A/B test cloud vs on-device on real users.
    7. Ship on-device as default with cloud fallback for unsupported devices.

    The end result: one codebase, two platforms, AI that works offline, zero per-inference cost.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading