How it works Vs vendor memory Writing Register
Theme

Writing 9 May 2026

Pinecone is moving past RAG. The hybrid is what's next.

Pinecone shipped Nexus, a compile-then-serve substrate that looks a lot like the architecture Memoria's been building. Same shape; different posture on what the substrate is for. Where we agree, and where we diverge.

The company that made RAG mainstream just shipped what it calls the next layer.

Pinecone released Nexus on 4 May. Their launch post frames the work as moving reasoning upstream, from retrieval to knowledge compilation; they haven’t used the word obsolete, and they’re explicit that RAG is still the pattern they pioneered. But The New Stack picked it up the same week with the cleaner public framing: the company that made RAG mainstream is now betting against it. We’ve been arguing for a hybrid memory architecture in slower forms for the better part of a year; most recently in the Karpathy / Nate post. It’s a useful moment.

The shape Pinecone has shipped is recognisably the one Memoria has been building toward. Worth laying them side by side and noting where the work agrees, and where it diverges.

What Pinecone shipped

Three pieces.

Context Compiler. Raw enterprise data plus a task spec, transformed into “task-optimised knowledge artifacts” agents consume directly. Compile-time synthesis. The agent doesn’t sift raw documents at inference; the substrate has already done that work.

Composable Retriever. Serves the compiled artefacts at query time with typed fields, field-level citations with confidence tiers, deterministic predicates, and access-control enforcement. Output shaped to the agent’s request.

KnowQL. A declarative query language built for agents not humans, with six primitives: intent, filter, provenance, output shape, confidence, and budget. A structured surface agents call against, rather than a freeform text channel that hopes for the best.

Pinecone publishes a headline of “up to 90% reduction in token consumption per task” against traditional RAG agents, plus 30× faster time-to-completion. Worth saying clearly: that’s Pinecone’s own benchmark, not customer-validated and not independently reproduced. The architectural argument stands on its own; the numbers will get tested by people running it in anger.

The same shape, drawn from a different end

The map is striking. Pinecone’s Context Compiler is what we call the Librarian; a compile-time agent that reads across the substrate and produces synthesised artefacts navigable by both humans and agents. Their Composable Retriever is what our query pipeline does at query time against a hybrid relational + vector + graph store, with project, time, and source filtering applied before synthesis. KnowQL is a different shape from our gateway actions, but the intent is the same: a structured agent-facing query surface.

Compile-time synthesis on top, query-time retrieval underneath, structured agent surface in front. That’s the hybrid. MSITE-13 walks the why; this post is what happens once the largest vector-database company on the market shows up at the same answer.

Where Memoria diverges

Six distinctions are doing the load-bearing work, and each one matters for a different buyer.

Plural ingestion, with chat as the high-leverage path. Pinecone’s surface is data-source-shaped: pipelines pointed at databases, APIs, document stores. Memoria’s substrate is reachable from several directions: channel bots in Slack, Microsoft Teams, WhatsApp, Telegram, Google Chat, and email; the gateway MCP for direct agent reads and writes; an LLM proxy layer that captures context as a side-effect of model calls; and a curator-facing admin surface in the enterprise build. Channels are the highest-leverage path because most of an organisation’s actual knowledge lives in chat rather than documents — gathering is the work — but they aren’t the only path, and they aren’t what Memoria is. The substrate is the substrate; the ingestion paths are how it gets fed.

LLM-agnostic, AI-optional. Memoria base is a queryable, auditable team knowledge store. It works without an LLM in the loop — humans walk and search the substrate directly through the vault and the admin surface; the relational + vector + graph store is just a store. Agents are a first-class consumer when you want them, not a precondition for the substrate being useful. Nexus exists to feed an agent fleet; without one, there’s nothing to switch on. Memoria works either way, which matters for buyers who want the substrate now and the agents later — or never.

Customer-owned substrate. Pinecone Nexus is SaaS only. Memoria runs as a hosted service (MemoriaCloud) and, for enterprise deployments, in a customer’s own stack. The substrate is built so the team that owns the memory can also own where it sits. For regulated buyers (finance, health, defence, public sector), sovereignty isn’t a preference; it’s a procurement requirement. The substrate that holds a team’s accumulated thinking is not a thing you want sitting in a vendor’s tenancy you can’t audit, can’t pause, and can’t take with you.

Dual-readership. Building on the AI-optional substrate above: humans curate and query the same store agents use. The Obsidian vault for direct human reading, the gateway for agent reads, the Librarian compile loop above both; same underlying substrate, two equally first-class readers on top. Nexus reads agent-first; humans access through the agent. That’s a clean choice for an agent-fleet buyer, and a category split for a buyer whose team also wants to walk the artefact directly without an LLM in the way. More on the framing in the vendor-memory thread we’ve been pulling on.

Multi-vendor agent gateway. The same substrate, consumed by Claude, ChatGPT, Cursor, scheduled jobs, custom agents; all reading the same store, no write conflicts. Pinecone Nexus is one substrate per agent fleet, in a market where buyers are intentionally not single-vendor on the model layer. If you’re paying for Claude and ChatGPT and Cursor because you want the right model for the right job, you don’t want three siloed memories of the same team to match.

Auditable retention. Memoria’s forgetting is a signed event with a reason code, not silent deletion. The Forgetting Event Log records which memory, why, by what trigger, by which actor, and when; reversible until the very last step. Pinecone hasn’t talked about retention semantics in the Nexus announcement. For a regulated buyer, the absence of an audit trail is the absence of an answer.

None of these are critiques of Nexus. They’re what the same architectural thesis looks like once you build it for an organisation rather than an agent fleet, and once you build it where the buyer owns the substrate rather than rents it.

What this changes

Less than it looks, and more than it looks.

Less, because the shape isn’t new to us; the hybrid-memory post made the argument before Nexus shipped. Validation from a category leader is genuinely useful, but it doesn’t redirect work that was already built around the same answer.

More, because Nexus moves the conversation past the should we be doing this? question. The largest pure-play retrieval vendor in the category has answered. The interesting question now is the one underneath: who builds the substrate, who owns it, who sees inside it, and who can take it with them when the relationship ends.

Substrate-shape isn’t a moat. Substrate-ownership is.

We’re talking to a small group of teams.

Keep reading

Should you use vendor memory?

The decision framework, with the failure modes drawn out.

Read the framework