Skip to main content

Command Palette

Search for a command to run...

Tokens and Context Windows: Why My App Can't Remember Everything

Updated
2 min read
Tokens and Context Windows: Why My App Can't Remember Everything
G
I'm a Senior Software developer who loves solving real-world problems and building meaningful products 💡 I currently focus on crafting clean, user-friendly experiences using React Native ⚛️ I enjoy working on challenging projects and constantly learning new things — whether it’s exploring a new framework or diving deeper into existing ones. This space is where I share my journey, the issues I tackle, and the lessons I pick up along the way 🚀

Journey Context

After deciding to go "Local," I realized I couldn't just throw a 50-page PDF at a mobile LLM and expect it to work. I had to go back to the basics of how these models "read."

LLMs are basically giant calculators. They don't see the word "Apple"; they see the ID 17180.

  • The Tokenizer is the translator.

  • The Context Window is the "workspace" (RAM) where the model keeps those numbers while it thinks.

If you go over the limit, the model starts "forgetting" the beginning of your conversation, exactly like a stack that’s reached its overflow point.

For mobile devs, the Context Window is a Memory Management issue. In a React Native app, if you use a model with a huge context window (like 128k tokens), the "KV Cache" (the model's short-term memory) can easily eat up 2GB to 4GB of RAM just to remember the conversation.

Tokens — What the Model Actually Sees

Tokens are small pieces of text.

They are not always full words.

Example:
"React Native is great"

["React", " Native", " is", " great"]

Or sometimes even more split depending on the tokenizer.

Tokenizer — The Translator

The tokenizer is what converts text into something the model understands.

Think of it like a compiler step for human language.

  • Breaks text into tokens

  • Maps each token to a numeric ID

["React", " Native"] → [5231, 8812]

Context Window — The Model’s Workspace

The context window is the maximum number of tokens a model can process at once.

RAM for the model while it is thinking.

It includes everything:

  • user input

  • chat history

  • system prompt

  • documents / code

Important Rule

Context Window = Input Tokens + Output Tokens

What Happens When You Exceed It?

The model starts dropping older tokens (usually from the beginning).

KV Cache

The KV Cache is how the model remembers previous tokens during a conversation.

The bigger the context window:

  • more tokens stored

  • more memory required

Closing Thoughts

Tokens define how the model reads.

Context window defines how much it can remember.

If you're building AI features in mobile apps, this isn’t optional knowledge.

  • performance

  • cost

  • scalability

  • user experience

AI for Mobile Developers: Learning Local LLMs

Part 2 of 7

AI for Mobile Developers: Learning Local LLMs is a public learning journey documenting how a React Native developer explores practical AI integration for mobile apps. This series focuses on understanding how Large Language Models work and how they can run directly on mobile devices using local inference. Instead of deep AI theory, the goal is to learn from a developer perspective — experimenting with tools, running models locally, and eventually integrating AI features inside mobile applications.

Up next

Temperature, System Prompts, and Why AI Has No Memory: The "Personality" of LLMs

Context I used to think LLMs were like databases—you ask a question, you get the stored answer. But after playing with Temperature and System Prompts, I realized they are more like improv actors. They

More from this blog

Govind Maheshwari's blog

12 posts