Store and Retrieve Is Not Intelligence
There are now at least a dozen products promising to give AI “memory.” Mem0, Zep, Letta, LangMem, MemoryMesh, MemCP, claude-mem — the list grows monthly. Each one sets out to solve the most universally frustrating thing about working with AI: your tools forget everything between sessions.
And they do solve part of it. They store things. They retrieve things. Some of them do it quite well.
But here’s what none of them do: get smarter.
Not one of these tools learns from how you work to give you better results over time. Not one reduces how much context you have to hand it as it works with you longer. Not one figures out what you’re actually asking for, routes the right context to the model, and improves its own accuracy from what happened last time.
They’re filing cabinets. Sophisticated ones — with vector search and semantic matching and elegant APIs — but filing cabinets nonetheless. You put things in. You pull things out. The cabinet itself never changes.
The AI memory market isn’t really a memory market. It’s a retrieval market. And retrieval is a solved problem.
What retrieval actually solves
To be clear: retrieval matters. Without persistent context, every AI session starts from zero. One developer captured it perfectly: “Every session started from scratch. Zero memory of what happened yesterday, last week, in a completely different project.”
The pain is real. Another user described it as “Groundhog Day, except I’m the one who has to repeat myself.” A novelist who’d spent months teaching ChatGPT about their work logged in one day to find it had forgotten everything: “Fred has no idea who Fred is. ‘I’m ChatGPT,’ it says.”
Retrieval tools address this. They persist context across sessions. They let you reference earlier conversations, store preferences, keep project state around. If you’ve been manually copy-pasting context between sessions — and if you use Claude Code, Codex, Gemini, or an AI assistant in VS Code, you have been — a retrieval tool will save you real time.
But retrieval tools share one fundamental limit: they don’t get better at their job. The hundredth time you use one is architecturally identical to the first. The system doesn’t learn which context matters for which situation. It doesn’t figure out your patterns. It doesn’t reduce how much you have to spell out, because it never actually understood your intent in the first place.
Store. Retrieve. Store. Retrieve. The tool doesn’t change. Only the data does.
The problem nobody else is solving
The real problem isn’t “my AI forgets.” The real problem is “my AI doesn’t learn.”
Those are different challenges. Forgetting is a storage problem. Not learning is an intelligence problem. And the entire AI memory market is building better storage while ignoring intelligence entirely.
Consider what “learning” would actually mean here:
Pattern recognition. After a week of working with you, the system should know that when you say “deploy,” you mean one specific workflow — yours. It shouldn’t retrieve your deployment docs and hope for the best. It should route straight to the right steps with your setup already loaded.
Effort calibration. When you ask a quick question, you want a quick answer — not a five-section report with an executive summary. When you kick off something genuinely hard, it should go deep without being asked. The system should learn the difference.
Context reduction. This is the counterintuitive one. A tool that actually learns should need less from you over time, not more. Once it’s picked up your preferences, your coding patterns, your architecture decisions, you shouldn’t have to re-explain them. The context you carry gets lighter as the tool gets smarter — not heavier as its store fills up.
Self-correction. When the system gets it wrong — treats a hard task as trivial, or over-answers a yes/no question — it should learn from that. Next time it should get it right without you stepping in.
None of this exists in a retrieval tool. Look under the hood of the major players and you find the same ceiling every time: retrieval with no feedback loop, no notion of intent, no learning signal.
Why retrieval is stuck
The architectural reason these tools can’t learn is that they’re stateless by design. They’re middleware — a layer that sits between your request and the model. Your request comes in, the tool searches its store, staples on some relevant context, and hands the padded prompt to the model. The model answers. The tool maybe saves the exchange. Done.
At no point in that flow does anything change about how the tool works. It doesn’t adjust its strategy. It doesn’t refine what “relevant” means for you specifically. It doesn’t build a model of how you work. It runs the same retrieval, with the same parameters, every single time.
This is RAG — retrieval-augmented generation — and RAG is useful. But RAG is a pattern, not intelligence. It’s the difference between a librarian who can find any book and a colleague who already knows what you need before you ask.
What directing the model looks like
Memory remembers. That’s the whole job of a filing cabinet, and it’s a real job. But the thing that actually makes your AI better isn’t a bigger cabinet. It’s a system that directs the model — that decides what the model should see for this request, right now, and gets out of the way.
That’s the line Anneal is built on. Before the model ever runs, the request gets read for what it actually is: what kind of task, how much effort it deserves, which of your tools and context it touches. Then — instead of dumping “everything relevant” into the window — it assembles just what this request needs. A coding task gets coding context. A research task gets research context. A quick question gets a light touch.
And it doesn’t stay fixed. When it reads a request wrong, that becomes a signal, and the next one lands closer. The accuracy climbs the more you use it, because it’s learning from how you work — not from the contents of your files.
That’s learning. Not retrieval. Learning. (Under the hood, Anneal runs on grāmatr, the intelligence layer that makes this possible — but you never have to think about that. You just get an AI workspace that keeps getting sharper.)
The category that should exist
Andrej Karpathy — former Tesla AI Director, OpenAI co-founder — put it plainly: “Context engineering is the delicate art and science of filling the context window with just the right information for the next step.” Tobi Lütke, CEO of Shopify, agreed: “I really like the term ‘context engineering’ over prompt engineering. It describes the core skill better.”
That’s the discipline these two are pointing at. Not storage. Not retrieval. The engineering of what reaches the model, when, and why.
The AI memory market is building retrieval tools and calling them memory. What’s actually missing — what almost nobody is building — is systems that classify, route, learn, and improve. Where the pipeline that feeds the model gets smarter from every interaction, instead of just accumulating a bigger pile of stuff to sift through.
The distinction isn’t branding. It’s architecture. Memory tools store and retrieve. Directing the model means understanding the request, delivering exactly the right context, watching what happened, and improving. The store is a component. The intelligence is the product.
Why this matters now
The timing isn’t accidental. Stack Overflow’s 2025 Developer Survey found that 66% of developers say their biggest frustration with AI tools is answers that are “almost right, but not quite.” Trust in AI output accuracy dropped from above 70% in 2023–2024 to 60% in 2025. IEEE Spectrum reported that “over the course of 2025, most of the core models reached a quality plateau, and more recently, seem to be in decline.”
The models aren’t leaping ahead the way they were. The next real gain in what you get out of AI isn’t coming from a bigger model — it’s coming from better context. From a system that understands what the model needs to know for the task in front of you, and hands it exactly that.
The retrieval market is building better filing cabinets for models that are hitting their ceiling. The harder, more valuable problem — making the models you already use dramatically more effective — starts somewhere else entirely.
It starts with the difference between remembering and directing.
Memory remembers. Anneal directs.