Agent Memory That Actually Sticks

Agent Memory That Actually Sticks

5 MIN READ

A fully local engine that cuts token usage by 61% and boosts agent task pass rates by 51%, without calling a single external API.

AI agents have a memory problem. Not the kind you fix with a bigger model or a longer context window. The kind that gets worse the more you use them.

Every time an agent tackles a complex task, it generates a stream of tool calls, reasoning traces, and intermediate results. That stream accumulates in the context window. After a few hours of work, the window is full of noise. The agent slows down. Costs spike. And the next time you start a session, everything is gone anyway.

TencentDB Agent Memory is a local, open-source engine designed to solve this from the ground up.

**Two memories in one system**

The architecture splits memory into two distinct systems. Short-term memory handles the current session: it offloads heavy tool logs to disk files instead of keeping them in the context window, and it tracks active task state via a Mermaid canvas that the agent can query at will. The context window stays lean.

Long-term memory is where things get interesting. Instead of dumping everything into a flat vector database (the usual approach), TencentDB organizes knowledge into a 4-tier pyramid.

**The pyramid**

At the base is L0: the raw conversation and tool logs. Everything the agent has done or said. This gets written to disk, not kept in memory.

L1 is the Atom layer. The system automatically extracts clear, atomic facts from the L0 logs. Not summaries. Specific, discrete statements like "user prefers Python over JavaScript" or "the database schema has a users table with an id column."

L2 is the Scenario layer. Atom-level facts get clustered into project or topic blocks. Related facts about the same codebase, workflow, or context group together.

At the top is L3: the Persona layer. A macro profile of who you are, how you work, and what patterns repeat across all your sessions. The agent gets a stable model of you that persists and grows over time.

**No cloud required**

The whole stack runs on SQLite and sqlite-vec. Nothing leaves your machine. Your personal profiles, your project knowledge, your workflow patterns. All local. All yours. No external API calls, no third-party model access, no data sent anywhere.

**Readable by design**

Most agent memory systems are black boxes. Vector databases store embeddings that look like lists of floats. You cannot open them and read what the agent actually remembers about you.

TencentDB stores everything in structured Markdown. You can open the files and read them. You can edit them directly. You can see exactly what the agent knows about you and your work, which makes debugging and auditing trivial.

**What this actually changes**

The benchmark numbers are direct: token usage drops 61% because the agent stops re-reading everything from scratch each session. Task pass rates rise 51% because the agent carries actual context about what worked before and what did not.

But the bigger shift is qualitative. An agent with persistent, readable memory stops being a disposable tool you reset every conversation. It becomes something more like a collaborator that learns from experience and actually retains what it learns.

The gap between "AI that assists" and "AI that remembers" turns out to be mostly a storage architecture problem. TencentDB just shipped a local solution to it.

Related Reads

iii: The End of Backend Fragmentation

A WebSocket-based engine that replaces API gateways, message queues, cron daemons and AI agent scaffolding with one unified runtime.

Book-to-Skill: Compile Any PDF Into Native Claude Knowledge

An open-source compiler that turns any technical book into a structured Claude Code skill. Pay 4,000 tokens per session instead of 200,000.

Markdown With Superpowers

One text file. Four output formats. No export pipelines, no tool switching, no compromises. Quarkdown turns plain Markdown into a Turing-complete typesetting language.