Tokens & Context Windows Explained for Mobile Devs

Journey Context

After deciding to go "Local," I realized I couldn't just throw a 50-page PDF at a mobile LLM and expect it to work. I had to go back to the basics of how these models "read."

LLMs are basically giant calculators. They don't see the word "Apple"; they see the ID 17180.

The Tokenizer is the translator.
The Context Window is the "workspace" (RAM) where the model keeps those numbers while it thinks.

If you go over the limit, the model starts "forgetting" the beginning of your conversation, exactly like a stack that’s reached its overflow point.

For mobile devs, the Context Window is a Memory Management issue. In a React Native app, if you use a model with a huge context window (like 128k tokens), the "KV Cache" (the model's short-term memory) can easily eat up 2GB to 4GB of RAM just to remember the conversation.

Tokens — What the Model Actually Sees

Tokens are small pieces of text.

They are not always full words.

Example:
"React Native is great"

["React", " Native", " is", " great"]

Or sometimes even more split depending on the tokenizer.

Tokenizer — The Translator

The tokenizer is what converts text into something the model understands.

Think of it like a compiler step for human language.

Breaks text into tokens
Maps each token to a numeric ID

["React", " Native"] → [5231, 8812]

Context Window — The Model’s Workspace

The context window is the maximum number of tokens a model can process at once.

RAM for the model while it is thinking.

It includes everything:

user input
chat history
system prompt
documents / code

Important Rule

Context Window = Input Tokens + Output Tokens

What Happens When You Exceed It?

The model starts dropping older tokens (usually from the beginning).

KV Cache

The KV Cache is how the model remembers previous tokens during a conversation.

The bigger the context window:

more tokens stored
more memory required

Closing Thoughts

Tokens define how the model reads.

Context window defines how much it can remember.

If you're building AI features in mobile apps, this isn’t optional knowledge.

performance
cost
scalability
user experience

Tokens and Context Windows: Why My App Can't Remember Everything

Journey Context

Tokens — What the Model Actually Sees

Tokenizer — The Translator

Context Window — The Model’s Workspace

What Happens When You Exceed It?

KV Cache

Closing Thoughts

Comments

AI for Mobile Developers: Learning Local LLMs

Temperature, System Prompts, and Why AI Has No Memory: The "Personality" of LLMs

More from this blog

The "Local First" Approach: Testing AI with Ollama Before Going Mobile

Beyond the Chatbox: Structured Data and the Art of Prompt Compression

Prompt Engineering 101: How to Give Your Mobile AI a Memory (and a Brain)

GGUF, Quantization, and Pruning: The Three Keys to "Shrinking" an AI Brain

Temperature, System Prompts, and Why AI Has No Memory: The "Personality" of LLMs

Command Palette

Journey Context

Tokens — What the Model Actually Sees

Tokenizer — The Translator

Context Window — The Model’s Workspace

What Happens When You Exceed It?

KV Cache

Closing Thoughts

Comments

AI for Mobile Developers: Learning Local LLMs

Temperature, System Prompts, and Why AI Has No Memory: The "Personality" of LLMs

More from this blog