The hybrid memory architecture (and why it took a public debate to name it)

There’s a real debate happening in public about how AI memory should be built. It’s worth having. It’s also catching up to an architectural question we’ve been working through with the people who use Memoria for the better part of a year.

Two framings are doing most of the work right now.

Andrej Karpathy posted a sketch of an “LLM Wiki” — a folder of markdown pages where the model is the writer, integrating each new source into a living artefact. The repo idea was bookmarked tens of thousands of times in a single week. People recognised something they wanted.

Around the same time, Nate B. Jones — who built and writes about a system called OpenBrain — pushed back, charitably, in a video and a follow-up newsletter. His argument: a wiki the model maintains will quietly drift, and what you actually want is structured raw storage with a compile step on top. A hybrid.

Both of them are describing real failure modes. Memoria agrees with both. The hybrid framing is the right one, and the rest of this post is where we sit inside it.

Position 1: compile-time synthesis

Karpathy’s wiki is compile-time memory. New input arrives, the model reads it, updates ten or fifteen related concept pages, links new ideas to old ones, surfaces contradictions. The wiki is the artefact you read. The model has done the synthesis already, so the next query is fast and the prose is rich.

The appeal is real. Knowledge compounds. You can walk the artefact end-to-end. It reads like a thing a thoughtful colleague maintained.

The failure mode is also real, and Nate names it cleanly: a neglected wiki rots into active misinformation. Old syntheses keep their confident tone after the underlying facts have moved. Editorial choices made by the model on the way in get baked into the structure where you can’t see them. Multi-agent writes to the same page collide. Karpathy himself flags that the pattern starts to fray once you’re past a few thousand sources.

For one careful researcher with a hundred high-signal documents, this works beautifully. For an organisation, it’s how you get a wiki that nobody trusts six months later.

Position 2: query-time synthesis

OpenBrain is query-time memory. Information is filed into structured storage on the way in — cheap and lazy. When you ask a question, the model goes to the rows, pulls what’s relevant, and synthesises an answer fresh.

This is also genuinely good. Provenance is clean — every claim traces to a row with a timestamp. Multi-agent access is straightforward because reads don’t conflict. New input is cheap to ingest. Filtering — by date, project, source — happens before synthesis, which is where it belongs.

The failure mode here is subtler. The model re-derives the same understanding from the same rows every time you ask. It never gets smarter from having thought about it before. It just accumulates more raw to dig through. For a single user with a focused question, that’s fine. For a team building on prior decisions, the lack of compounding becomes the bottleneck.

Position 3: the hybrid

Nate’s recommendation, paraphrased: keep structured raw storage as the source of truth. Run a compile agent on a schedule. Let it produce a synthesised view — pages, summaries, indexes — that you can read like a wiki. But the wiki is never edited directly. If the wiki is wrong, you fix the source data and regenerate.

The key word is “regenerable”. The compiled view is a derived artefact. The database is what’s real. That single discipline — compiled output is never authoritative — is what stops the hybrid from collapsing back into a wiki that drifts.

Two principles sit underneath both Karpathy’s framing and Nate’s, and they’re worth saying out loud:

Own the artefact, not the tool. Memory belongs to the team it describes. Markdown over a SaaS UI; a database the team controls over a vendor’s lock-in.
AI as maintainer, not oracle. The model isn’t there to be asked and forgotten. It holds an ongoing job against the knowledge — curating, reconciling, surfacing what changed.

We agree with both of those without qualification.

Where Memoria sits

Memoria runs structured raw storage — relational plus vector — at the bottom of the stack. The query-time pipeline (DSPy ReAct against that store, with project, time, and source filtering applied before synthesis) is what teams use day to day. Multi-agent access goes through per-team gateways: Claude, ChatGPT, Cursor, scheduled jobs all touching the same store, no write conflicts.

The compile-time half is the Librarian — a cron-driven agent that reads across the store, synthesises entries into navigable pages, and keeps a curated view that holds up over time. It’s the Karpathy half of the hybrid, with the discipline that the compiled view is never the source of truth. If a page is wrong, the fix lives at the source.

Channel-agnostic ingestion sits underneath all of it — Slack, Microsoft Teams, WhatsApp, Telegram, Google Chat, email. This is the part of the hybrid both Karpathy and Nate assume rather than describe, because they’re talking about systems where the documents are already gathered. For an organisation, gathering is the work.

One layer sits above the hybrid that the public debate hasn’t reached yet: auditable retention. Memory that doesn’t curate becomes a digital landfill within eighteen months. Memory that curates without an audit trail is a different kind of liability — silent deletion is the failure mode regulated industries can’t accept. We’re building a forgetting layer that’s reversible by default, governed by customer-controlled retention policies, and logged immutably. More on that soon.

Where this lands

The public debate is converging on a shape we’ve been building toward because the people we work with kept asking for it. Channel-agnostic ingestion, structured storage at the bottom, a compile loop on top, regenerable views, multiple agents reading the same store, audit trails the team owns. Not because we’d read a clever blog post — because chat history doesn’t survive the conversation it came from, and a memory that doesn’t compound isn’t worth the storage cost.

Hybrids are easy to draw and slow to discipline. Three things will separate the architectures that survive from the ones that drift back into wiki-shaped problems. Structure: how strict the source-of-truth invariant actually is when shipping pressure mounts. Synthesis: whether the compile step earns its cost or quietly becomes a generator of confident-sounding noise. Audit: whether teams can see what was filed, what was compiled from it, and what was quietly dropped along the way.

Karpathy and Nate are pulling on the right thread from different ends. The hybrid is the answer. It’s harder to keep honest than it looks.

We’re talking to a small group of teams.