Engineering 13 min read

TencentDB Agent Memory's Layered Local Memory

MMNMNOTE
githubai-agentsagent-memorylocal-firstsqlitevector-search

Reference: TencentCloud/TencentDB-Agent-Memory — MIT · TypeScript

TencentDB Agent Memory is an open-source plugin that gives AI coding agents durable long-term memory without sending a single conversation to an external service. It runs a local SQLite store and distills raw chats into a four-layer pyramid plus a compact diagram of in-task state — saving tokens while keeping every fact traceable back to its source.

The project crossed 5,565 stars within two months of its first commit.1 That is fast for an infrastructure library, and the reason is timely: agents forget everything the moment a session ends, and long tasks quietly drown their own context window in tool logs. This repo is an opinionated answer to both problems at once.

What problem does agent memory actually solve?

Most "give your agent memory" projects do one of two things, and the README names both as failures. The first is brute-force history accumulation — keep every message, replay it next time. That overflows the context window and burns tokens. The second is irreversible lossy summarization — compress the history into a paragraph and throw the rest away. That saves tokens but destroys the evidence, so when recall is wrong you have nothing to drill into.

TencentDB Agent Memory rejects both. Its thesis is a single line from the documentation: "Reject Flat Storage, Embrace Layering and Symbolization."2 Instead of one big vector pile, memory is built as a hierarchy where the formation and the recall are both layered. The maintainers frame the goal plainly: "Memory is not about hoarding everything in the AI — it is about sparing humans from having to repeat themselves."2

The honest framing matters here. This is not a notes app or a personal knowledge base. It is plumbing that sits between a coding agent (it ships adapters for OpenClaw and Hermes) and a local database, doing capture, extraction, and recall automatically once installed.

How is the memory layered? (L0 to L3)

The long-term half of the system is a "semantic pyramid": raw conversations are distilled upward through four named layers, each cheaper and more abstract than the one below it, so recall reads the dense top layer first and pays for detail only when a question demands it. The layers, in one direction:

The README is explicit about why this beats a flat store: "The Persona layer carries day-to-day preferences; the system drills down to Atoms only when details matter."2 You read the cheap, dense top layer first, and only pay for precision when a question demands it.

What makes the design unusual is the storage split underneath. Lower layers (facts, logs, traces) live in a database for robust full-text retrieval. Upper layers (personas, scenes, canvases) are written as plain Markdown files. The documentation's summary is quotable: "Lower layers preserve evidence; upper layers preserve structure."3 Because the persona and scene blocks are readable Markdown under ~/.openclaw/memory-tdai/, a wrong recall becomes "a deterministic walk along the chain Persona to Scenario to Atom to Conversation until the root cause surfaces"4 — not a stare at an opaque list of vector scores.

flowchart TD
    L0["L0 Conversation<br/>(raw dialogue)"] --> L1["L1 Atom<br/>(atomic facts)"]
    L1 --> L2["L2 Scenario<br/>(scene blocks .md)"]
    L2 --> L3["L3 Persona<br/>(persona.md)"]
    L0 -.->|"stored in"| DB[("SQLite + sqlite-vec<br/>+ FTS5")]
    L1 -.->|"stored in"| DB
    L2 -.->|"plain Markdown"| FILES[("~/.openclaw/<br/>memory-tdai/")]
    L3 -.->|"plain Markdown"| FILES
    L3 ==>|"drill down on miss"| L2
    L2 ==>|"drill down"| L1
    L1 ==>|"drill down"| L0

The L0-L3 pyramid: facts and raw dialogue persist in the local database for retrieval; scenes and personas are written as human-readable Markdown. Recall reads top-down for cheapness, then drills back down the same chain to ground-truth evidence when precision is needed.

How does the Mermaid canvas save tokens?

The short-term half solves a different problem: in a long task, the biggest token consumers are verbose intermediate logs — search results, code, error traces. TencentDB Agent Memory's answer is "symbolic memory": keep the symbols in context, keep the bytes on disk.

The mechanism has three moves. Full tool logs are offloaded to external files (refs/*.md). The task's state transitions are re-encoded as a high-density Mermaid graph, "precise enough for LLMs to parse, concise enough for humans to read."5 Each node carries a node_id. The agent then reasons over the tiny graph, and "to verify a detail, it greps for the node_id and instantly retrieves the full raw text — cutting token cost while preserving full traceability."5

That last clause is the whole trick. Compression that you cannot reverse is a liability; compression with a node_id back-reference is just an index. This is the same instinct behind keeping notes in plain Markdown rather than a proprietary blob — the structure is in front of you, and the detail is one lookup away. We made a related argument in Markdown Notes as AI Memory.

Three design decisions worth stealing

Beyond the headline pyramid, three smaller choices in the codebase are the ones most worth borrowing for any system that juggles too much state: how it constrains an LLM, how it schedules expensive work, and how it keeps retrieval local and cheap by default. Each is a small, transplantable pattern.

1. Sandbox the agent by its workspace root, not by asking it nicely. The scene-extraction step is itself an LLM agent that reads and writes scene files. Rather than trusting a prompt to keep it in bounds, the code constrains the filesystem. From the source: "The LLM is sandboxed — workspaceDir is set to scene_blocks/ so it can ONLY operate on .md scene files. System files (checkpoint, scene_index, persona.md) are physically invisible to the LLM."6 A capability boundary enforced by the filesystem beats one enforced by good intentions.

2. Match the scheduler to the cost of the job. The pipeline hub runs three layers on three different timer disciplines, each fitted to how expensive that layer is. L1 extraction uses a resettable debounce timer — each new turn resets the countdown, so cheap ingest stays responsive. L2 scene rollups use a "downward-only timer" whose "scheduled fire time can only be moved earlier, never later"7 — guaranteeing a maximum interval while still reacting to fresh L1 output. L3 persona generation, the most expensive pass, runs behind a global mutex with concurrency of one. One cadence per cost profile.

3. Keep the cheap retrieval path local by default. Recall is hybrid: BM25 keyword search and vector similarity, merged with Reciprocal Rank Fusion. The implementation uses the canonical constant, RRF_K = 60, described in the code as the "Standard RRF constant from the original RRF paper"8 — the Cormack, Clarke, and Buettcher SIGIR 2009 method.9 Each item's fused score is the sum across lists of 1 / (k + rank + 1). The default backend is SQLite plus sqlite-vec plus FTS5, with no embedding API call required to run.

Is it production-ready, and how is it secured?

The repo presents itself as production engineering rather than a demo: a decoupled TdaiCore + HostAdapter design, a ready-to-use local backend, and agent tools (tdai_memory_search, tdai_conversation_search). It is, however, young — npm version 0.3.6, created roughly two months before this writing, and pre-1.0.

The benchmark numbers are striking but should be read as the maintainers' own published self-measurements, not independent results. Integrated with OpenClaw, the README reports cutting token usage by up to 61.38% and improving pass rate by a relative 51.52% on the WideSearch benchmark, and raising PersonaMem accuracy from 48% to 76%.1 Gains on other benchmarks are real but smaller — SWE-bench tokens drop about 33%.1 Treat "up to 61.38%" as a best case on one workload.

On security, be deliberate. The Hermes gateway listens on port 8420 and exposes capture, search, and recall HTTP routes. Authentication is opt-in: a Bearer-token API key (constant-time comparison, HTTP 401 on a miss) and a CORS allow-list, "both default to off."10 It is a localhost sidecar out of the box. Set TDAI_GATEWAY_API_KEY and an allow-list before you bind it to anything but loopback.

What it teaches

The reusable lessons generalize well beyond agents. Layer your state instead of flattening it, so cheap top-level reads cover most queries and precision is opt-in. Keep every compression auditable with a deterministic path back to the raw evidence. Put symbols in context and bytes on disk. Enforce boundaries with the filesystem, not with prose. And treat local-first as the default, not a privacy mode bolted on later — the entire pyramid here lives in readable files on your own machine, which is exactly the posture we argue for in own your data and in why your notes are already AI-ready.

Verdict

TencentDB Agent Memory is worth studying even if you never install it. The layered pyramid, the Mermaid offload, and the filesystem-sandboxed extractor are clean, teachable patterns that any system juggling too much context can borrow. It is young and its headline numbers are self-reported, so pilot it before you depend on it — but as a working argument that memory should be layered, local, and auditable, it is one of the more thoughtful designs in the space.

Frequently Asked Questions

What is TencentDB Agent Memory? It is an open-source, MIT-licensed memory plugin for AI coding agents (OpenClaw and Hermes). Once installed it automatically captures conversations, extracts structured memories into a layered L0-to-L3 pyramid, and recalls relevant context before the next turn — running on a local SQLite store by default.2

Does it send my data to the cloud? Not by default. The standard backend is local SQLite plus sqlite-vec plus FTS5, and the project describes itself as delivering "fully local long-term memory for AI Agents ... with zero external API dependencies."11 A Tencent Cloud VectorDB backend exists as an option, but the out-of-the-box configuration keeps memory on your own machine under ~/.openclaw/memory-tdai/.4

What is the L0 to L3 memory pyramid? A four-layer hierarchy: L0 Conversation (raw dialogue), L1 Atom (atomic facts), L2 Scenario (scene blocks), and L3 Persona (a user profile). Upper layers carry structure and direction; lower layers carry evidence and precision, with a drill-down path between them.2

How does the Mermaid canvas reduce token usage? Verbose tool logs are offloaded to external files; only a compact Mermaid graph of task state, tagged with node_ids, stays in the agent's context. The agent reasons over the small graph and greps a node_id to pull the full raw text only when it needs a detail.5

What storage and retrieval does it use? The default store is SQLite with sqlite-vec and FTS5. Recall is hybrid — BM25 keyword search and vector similarity merged via Reciprocal Rank Fusion with the standard constant k=60. You can select keyword, embedding, or hybrid strategy; hybrid is the recommended default.8

Is it production-ready, and what is the license? It is MIT-licensed and presents production engineering, but it is young (npm v0.3.6, pre-1.0). The reported benchmark gains are the maintainers' own self-measurements. Pilot it before depending on it, and enable gateway authentication before exposing the service beyond localhost.1

A memory worth keeping is one you can still open, read, and trace back to where it came from.


If you want your own notes to stay local, open, and yours — that is what we are building at mnmnote.com.

Footnotes

  1. TencentDB Agent Memory README, Highlights and benchmark table — token cut up to 61.38%, pass rate +51.52% relative, PersonaMem 48% to 76%, SWE-bench tokens 3474.1M to 2375.4M. https://github.com/TencentCloud/TencentDB-Agent-Memory 2 3 4

  2. TencentDB Agent Memory README, Overview and "Core Technology: Reject Flat Storage, Embrace Layering and Symbolization" — L0 to L3 pyramid. https://github.com/TencentCloud/TencentDB-Agent-Memory 2 3 4 5

  3. TencentDB Agent Memory README, "Heterogeneous storage and progressive disclosure" — "Lower layers preserve evidence; upper layers preserve structure." https://github.com/TencentCloud/TencentDB-Agent-Memory

  4. TencentDB Agent Memory README, "White-Box Debuggability" — layered artifacts under ~/.openclaw/memory-tdai/. https://github.com/TencentCloud/TencentDB-Agent-Memory 2

  5. TencentDB Agent Memory README, "Symbolic Memory: Maximum Semantics in Minimum Symbols (Mermaid Canvas)" — node_id tracing. https://github.com/TencentCloud/TencentDB-Agent-Memory 2 3

  6. Source comment, src/core/scene/scene-extractor.ts — "The LLM is sandboxed - workspaceDir is set to scene_blocks/". https://github.com/TencentCloud/TencentDB-Agent-Memory/blob/main/src/core/scene/scene-extractor.ts

  7. Source comment, src/utils/pipeline-manager.ts — L1 resettable timer, L2 downward-only timer, L3 global mutex. https://github.com/TencentCloud/TencentDB-Agent-Memory/blob/main/src/utils/pipeline-manager.ts

  8. Source, src/core/store/search-utils.ts — RRF_K = 60, "Standard RRF constant from the original RRF paper." https://github.com/TencentCloud/TencentDB-Agent-Memory/blob/main/src/core/store/search-utils.ts 2

  9. Cormack, Clarke, and Buettcher, "Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods," SIGIR 2009, pp. 758-759. https://dl.acm.org/doi/10.1145/1571941.1572114

  10. TencentDB Agent Memory README, "Gateway Security (optional)" — API key and CORS allow-list both default to off. https://github.com/TencentCloud/TencentDB-Agent-Memory

  11. TencentCloud/TencentDB-Agent-Memory repository description — "fully local long-term memory for AI Agents via a 4-tier progressive pipeline, with zero external API dependencies." https://github.com/TencentCloud/TencentDB-Agent-Memory