Memory Mosaics: How AI Networks Learn to Remember – The Executive Code Podcast

I’m sharing the first episode of The Executive Code podcast, a new project where my colleague Kirim and I explore the intersection of AI, engineering, and leadership. After posting about our second episode on RAPTOR technology earlier today, I wanted to also share our inaugural episode. We’ve created this podcast with the team at Snapshot AI and Flatiron Software to bridge technical AI concepts with practical business applications.

Stepping into these deeply technical AI discussions feels like a stretch sometimes. I’m not an AI researcher by training, and there’s always that voice in the back of my head questioning if I should be the one unpacking these concepts. But that tension helps drives this podcast forward — the process of learning openly and sharing what we discover.

In our first episode, Kirim and I unpack the innovative “Memory Mosaics” paper from the International Conference on Learning Representations (ICLR) 2025, breaking down how this architecture represents a significant shift in how neural networks remember information.

What Are Memory Mosaics?

Imagine if your neural network had an actual notebook – somewhere it could jot down important details from everything it reads, and then flip back to those notes when making a decision. That’s the intuition behind Memory Mosaics.

Unlike traditional models like GPT that use opaque self-attention mechanisms (where we can’t easily see what information the model is focusing on), Memory Mosaics make memory explicit through:

Associative memory units: Think of these as mini-notebooks that store key-value pairs
Two distinct memory types: Dynamic in-context memory (short-term, session-based) and persistent memory (long-term, learned during training)

This approach mirrors how humans remember – we temporarily store immediate context while drawing on our permanent knowledge base.

Why This Matters for Product and Engineering Leaders

The implications for AI applications are profound:

Greater interpretability: We can actually peek inside the model’s “notebook” to see what information it deemed important
Improved long-context handling: By separating memory types, the model can maintain focus on relevant information across much longer contexts
Meta-learning capabilities: The model doesn’t just learn facts; it learns how to take better notes for itself – effectively “learning how to learn”

For product managers and engineering leaders, this architecture could enable more personalized AI assistants that maintain coherent conversations over time while explaining their reasoning process.

The Technical Architecture, Simplified

During our conversation, Kirim expertly walked through how Memory Mosaics work:

An embedding layer converts words into numerical vectors (just like standard language models)
Multiple memory units work in parallel to store key-value pairs
A “leaky average” mechanism helps the model focus more on recent information while gracefully forgetting older context
An attention mechanism retrieves relevant memories by comparing the current context with stored keys
Both short-term and long-term memories contribute to each prediction

The entire model is trained end-to-end, learning not just language patterns but also optimal memory strategies.

Community Reaction and Future Potential

What’s particularly exciting is how Memory Mosaics addresses two critical challenges simultaneously:

Interpretability: Offering transparency into the model’s decision-making process
Context length: Providing a more efficient approach to handling longer texts

Some researchers have also noted the potential for “predictive disentanglement” – where different memory units might specialize in distinct aspects of the data (like a team of experts, each handling their part of the job).

While questions remain about scalability and real-world performance, the approach points in a promising direction for next-generation AI systems.

Thanks To

This podcast wouldn’t exist without my co-host Kirim, whose technical expertise brings depth to our conversations. I owe special thanks to Ana Clara, whose candid feedback and guidance helped shape this project from concept to execution.

Join the Conversation

If you found this discussion valuable, I’d appreciate your thoughts in the comments on our YouTube video . Your comments help the YouTube algorithm understand who might benefit from this content, and more importantly, they help us improve future episodes.

Subscribe to stay updated on future episodes exploring the frontiers of technology and leadership. We welcome your suggestions for topics you’d like us to cover.

The Executive Code is filmed across New York, London, and wherever the conversation takes us.