Tokens and Context Windows: Why My App Can't Remember Everything

Journey Context
After deciding to go "Local," I realized I couldn't just throw a 50-page PDF at a mobile LLM and expect it to work. I had to go back to the basics of how these models "read."
LLMs are basically giant calculators. They don't see the word "Apple"; they see the ID 17180.
The Tokenizer is the translator.
The Context Window is the "workspace" (RAM) where the model keeps those numbers while it thinks.
If you go over the limit, the model starts "forgetting" the beginning of your conversation, exactly like a stack that’s reached its overflow point.
For mobile devs, the Context Window is a Memory Management issue. In a React Native app, if you use a model with a huge context window (like 128k tokens), the "KV Cache" (the model's short-term memory) can easily eat up 2GB to 4GB of RAM just to remember the conversation.
Tokens — What the Model Actually Sees
Tokens are small pieces of text.
They are not always full words.
Example:
"React Native is great"["React", " Native", " is", " great"]
Or sometimes even more split depending on the tokenizer.
Tokenizer — The Translator
The tokenizer is what converts text into something the model understands.
Think of it like a compiler step for human language.
Breaks text into tokens
Maps each token to a numeric ID
["React", " Native"] → [5231, 8812]
Context Window — The Model’s Workspace
The context window is the maximum number of tokens a model can process at once.
RAM for the model while it is thinking.
It includes everything:
user input
chat history
system prompt
documents / code
Important Rule
Context Window = Input Tokens + Output Tokens
What Happens When You Exceed It?
The model starts dropping older tokens (usually from the beginning).
KV Cache
The KV Cache is how the model remembers previous tokens during a conversation.
The bigger the context window:
more tokens stored
more memory required
Closing Thoughts
Tokens define how the model reads.
Context window defines how much it can remember.
If you're building AI features in mobile apps, this isn’t optional knowledge.
performance
cost
scalability
user experience



