Writing 29 April 2026

Du's agent memory taxonomy and the org-wide memory gap

Pengfei Du's new survey gives agent memory a proper academic frame. Memoria's architecture maps onto it cleanly — and the org-wide gap the taxonomy surfaces is the part most current tooling doesn't reach.

Pengfei Du’s survey paper landed in March: “Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Emerging Frontiers”. Agent memory finally has a proper academic frame. Three-dimensional taxonomy, five mechanism families, the open challenges named precisely. It’s the kind of paper a category needs once it’s old enough to argue with itself.

There’s a question worth pulling out of it. Of the open challenges Du names, which apply to a single developer’s agent and which apply to an organisation’s agentic surface? They aren’t the same problem. The shape of the answer is what Memoria has been building toward.

The taxonomy in plain language

Du formalises memory as a write–manage–read loop coupled with perception and action — the agent observes, files something, manages what’s already filed, reads back when it acts. Three axes describe any given system: temporal scope (how far back memory reaches), representational substrate (how it’s stored), and control policy (who decides what to write, keep, surface, or drop).

Five mechanism families sit underneath. Context-resident compression keeps memory inside the prompt window. Retrieval-augmented stores file outside the window and pull on demand. Reflective self-improvement lets the agent rewrite its own prior conclusions. Hierarchical virtual context layers short-term against long-term. Policy-learned management is exactly what it sounds like — a learned controller deciding what the memory does.

Five open challenges close the survey: continual consolidation, causally grounded retrieval, trustworthy reflection, learned forgetting, and multimodal embodied memory. Each one is real. Each one shows up differently depending on whether the memory is one developer’s or a company’s.

Where Memoria sits in the taxonomy

Memoria’s architecture maps onto Du’s framework cleanly, and the map is worth drawing.

Representational substrate. Three substrates side by side, used for what each is best at. A relational store for filtered, structured retrieval where provenance and time matter. A vector store for semantic search across raw episodic memory. A graph layer (Neo4j, recently merged) for the relations between entities, decisions, and events that neither relational nor vector handles well. Du’s taxonomy treats representational substrate as a single axis; in production for an organisation it’s three.

Control policy. A per-customer Gateway sits in front of the store. Every write, every read, every transition is a Gateway action. Role-based access, scoping, audit, channel adapter — they all live there. This is the part the taxonomy describes as control policy and that org-wide deployment makes load-bearing rather than incidental.

Perception and action. The channel adapters — Slack, Microsoft Teams, WhatsApp, Telegram, Google Chat, email — are the perception layer. They’re how memory joins the conversations it’s part of. A single-developer system can assume one source; an organisation’s memory has to compose across them.

Reflective self-improvement and continual consolidation. The Librarian — a compile-time agent that reads across the store, synthesises related memories into navigable pages, surfaces contradictions — is the consolidation layer. It’s running in our household environment today; the page-writing path (MEM-122) is the gate before it ships to customer stacks. Honest about that: built and partly deployed, not running everywhere yet.

Learned forgetting. This is the open challenge that lines up most directly with our recent design work. Memoria’s auditable retention layer is a six-state lifecycle — active → demoted → archived → tombstoned → purged, plus a separate compliance-purged path — backed by an immutable forgetting-event log (FEL). Every forgetting decision logs which memory, why, by what trigger, by which actor, when. Reversible until the very last step. The design is complete; the FEL build is queued.

The mapping isn’t a victory lap. It’s a way to be specific about which problems Du names that the architecture is shaped to answer, and which it isn’t. Multimodal embodied memory, for one, isn’t the work we’re doing.

Personal vs organisational

Pith took the same paper through a developer lens. Their angle is real and good for their audience — a five-minute MCP install, local SQLite, contradiction detection, knowledge that compounds across a single developer’s sessions. We’re going through the paper from an organisation lens. Distinct problems, distinct shapes.

Some of Du’s open challenges are roughly equivalent at both scales. Representational substrate doesn’t change much: vector and graph indexing don’t care whether one person or a hundred are reading. Latency budgets are the same order of magnitude. Trustworthy reflection — does the agent’s self-revision drift toward confident nonsense? — applies as much to a single developer as to a team.

Other challenges intensify at organisational scale, sometimes by an order of magnitude.

Write-path filtering across channels. A developer’s agent has one or two input streams. An organisation’s memory composes across Slack, Teams, WhatsApp, email, and a handful of operational systems. What’s worth keeping is a cross-channel decision, not a per-channel one.

Multi-agent contention. Inside one developer’s machine, one agent reads and writes. Inside an organisation, several agents — Claude, ChatGPT, Cursor, scheduled jobs — touch the same store concurrently. Who wins on contradiction? Who’s allowed to demote whose write? Du names contradiction handling as a production concern; at org scale it’s a governance concern as well.

Audit and role-based access. A single developer’s memory doesn’t need RBAC. An organisation’s does, and it needs the audit trail that makes the access policy answerable. Most current agent-memory tooling doesn’t reach this layer because it doesn’t need to.

Privacy governance. The same taxonomy at organisational scale picks up regulatory weight. GDPR Article 17 erasure isn’t a feature; it’s a precondition. Retention policy is the customer’s, not the vendor’s, and the audit log has to make both halves of that visible.

These aren’t critiques of Du or of personal-agentic-memory tooling. They’re what changes when the agent surface is an organisation’s rather than a person’s, and they’re the part of the taxonomy where the empty space is largest.

What’s empty space

Cross-AI memory with role-based access, customer-owned, channel-agnostic. That description rules out most of the current field. Personal-agentic-memory tools (Mem0, Letta, Claude memory, Pith) are good at the developer or single-user case and don’t claim more. Vendor memory inside ChatGPT or Claude is scoped to one assistant and one account. Knowledge platforms are built for human search rather than agent consumption.

The space the taxonomy makes legible — multi-agent, multi-channel, governed, audit-trailed, customer-controlled — is where Memoria operates. Not as a critique of anyone, but because the people we work with kept asking for the parts that personal-agentic memory doesn’t try to cover.

Closing

Du’s paper is a useful frame for builders. The five mechanism families and three-axis taxonomy will reshape how teams talk about agent memory for a while; the open-challenges section gives the field a list of unresolved problems to attack rather than the marketing taxonomies it had before.

For Memoria, the paper anchors a position we’ve been arguing in slower forms for the better part of a year. Hybrid storage — vector, relational, graph. Channel-agnostic perception. A compile-time consolidation pass. Auditable retention as the answer to learned forgetting. A per-customer Gateway as control policy. The work in flight is what matters to keep honest about: Librarian on household only, page-writing gate (MEM-122) outstanding before it lands customer-side; auditable retention design complete, FEL build queued.

If the taxonomy makes anything cleaner, it’s the distinction between the personal-agentic-memory problem and the organisational one. Pith is solving the first well. The second is where we live.

We’re talking to a small group of teams.

Keep reading

Should you use vendor memory?

The decision framework, with the failure modes drawn out.

Read the framework