Connecting React Native to Ollama: A Local AI Testing Guide

I’m using Ollama and Google’s latest Gemma 3 model. My goal is simple: Understand the difference between the /api/generate and /api/chat endpoints and learn how to bridge my React Native app to a local AI server.

esting in the terminal with curl is the fastest way to debug an LLM.

Ollama is the engine that serves the model.
Gemma 3 is the brain we are using.

Chat vs. Generate: Which one do you need?

/api/chat (The Easy Way): You send an array of messages with roles (system, user, assistant). Ollama handles the formatting for you. It’s perfect for building a ChatGPT-style interface.
/api/generate (The Raw Way): You send a single prompt string. It returns a context array (a bunch of numbers). If you want the AI to remember the next question, you have to send that context array back with your next request.

Streaming: Fast vs. Complete

stream: true (Default): The AI sends the answer piece by piece (token by token). This is how you get that "typing" effect in apps.
stream: false: The AI thinks in silence and sends one giant JSON object once it’s finished. Great for structured data (JSON) but feels slower to the user.

In React Native, the fetch call is straightforward, but the Android Emulator quirk is a trap. If you use localhost:11434, the app will look inside the phone and find nothing. You must use 10.0.2.2:11434 to reach your computer.

// The "Brain" of our Simple Chat App
const response = await fetch('http://10.0.2.2:11434/api/chat', {
  method: 'POST',
  body: JSON.stringify({
    model: 'gemma3',
    messages: [{ role: 'user', content: 'Hello!' }],
    stream: false,
  }),
});

spent an hour wondering why my api/generate response was so long. I realized the context array at the bottom of the response is just the "compressed memory" of the conversation. It’s not human-readable, but the model needs it to keep the conversation going!

The "Local First" Approach: Testing AI with Ollama Before Going Mobile

Chat vs. Generate: Which one do you need?

Streaming: Fast vs. Complete

Comments

AI for Mobile Developers: Learning Local LLMs

Learning AI as a Mobile Developer — Why I'm Exploring Local LLMs for Mobile Apps

More from this blog

Beyond the Chatbox: Structured Data and the Art of Prompt Compression

Prompt Engineering 101: How to Give Your Mobile AI a Memory (and a Brain)

GGUF, Quantization, and Pruning: The Three Keys to "Shrinking" an AI Brain

Temperature, System Prompts, and Why AI Has No Memory: The "Personality" of LLMs

Command Palette

Chat vs. Generate: Which one do you need?

Streaming: Fast vs. Complete

Comments

AI for Mobile Developers: Learning Local LLMs

Learning AI as a Mobile Developer — Why I'm Exploring Local LLMs for Mobile Apps

More from this blog