Memory that sleeps: a tiered memory architecture with daily consolidation

The reason most agentic systems stay amnesic is that long-term memory is harder than it looks. Stashing every conversation in a vector database and calling it “memory” creates something that can be searched but that does not actually know anything. My own setup went through about four versions of this mistake before arriving at a tiered architecture that is now stable. The architecture has a sleep-like consolidation pass and treats some information as authoritative in a way that most retrieval systems do not. This post is the worked version of how it fits together and why.

The architecture, in one diagram

~/.claude/projects/-Users-timtrailor-code/memory/
├── MEMORY.md              (index, auto-loaded every session, ≤200 lines)
├── topics/*.md            (one file per topic, curated current truth)
└── sessions/
    └── compressed/
        └── YYYY/*.md      (one file per closed session, append-only)

Indices, derivable from the above:
├── ChromaDB               (semantic embeddings, ~79,000 chunks)
└── SQLite FTS5            (keyword search, ~79,000 rows)

Four layers, two indices. The layers are the source of truth. The indices are derived and regeneratable. Layers and indices share a strict precedence rule, which I will come to.

Why two indices

Most projects pick one. I built both because each fails where the other succeeds, and the failure modes are visible in daily use.

Semantic search (ChromaDB with a local embedding model, no external Application Programming Interface (API) required) is right for questions like “what did we try for belt tension in the printer?” where I do not know the exact words the original conversation used. It returns semantically similar chunks even if the query language is different from the source language. It is wrong for queries where the answer is a specific token.

Keyword search (SQLite FTS5) is right for questions like “where did 192.168.0.108 appear?” or “which session mentioned the SAVE_CONFIG bug on 5 March?”. It returns exact matches against literal terms. It is wrong for paraphrase queries, where semantically similar content uses different vocabulary.

In early testing I found I was hitting a pattern: semantic search would return the topically-relevant ten chunks but none of them were the specific chunk that answered the question, because the question was about a specific Internet Protocol (IP) address or error code that the embedding model had smoothed into similar-looking tokens. Keyword search would return the specific chunk immediately. The reverse also happened: keyword search would miss genuinely relevant discussions that used different vocabulary from the query.

Running both in parallel is cheap (both indices cover the same corpus, each is under 100 megabytes) and produces strictly better retrieval than either alone. The Model Context Protocol (MCP) server that exposes them to Claude offers two tools: search_memory for semantic, search_exact for keyword. The agent chooses which one based on the shape of the query. Both are useful; both get called.

The lesson I would offer another builder: do not assume one retrieval strategy is sufficient. Instrument what you have, log which queries fail, and check whether the failures are in the class the other strategy would catch.

Why topic files, not just the indices

The topic files are the distinction that makes the system actually know things, rather than just remembering them.

Each topic file lives at memory/topics/<name>.md and contains the current truth about that topic: printer configuration, memory system architecture, school governors project, the UPS and disaster-recovery setup, and so on. There are thirty-six of them as of this writing. They are hand-maintained. Not autogenerated. Not automatically refreshed from session logs. The point of the topic files is that they contain only what is still true, written as a coherent document, edited as understanding evolves.

This is distinct from the session logs and compressed summaries, which are append-only records of what was true or attempted at a specific point in time. A session summary from two months ago that says “we decided to restart the daemon every hour” may have been true at the time and may now be wrong. The topic file says what is true now.

The consequence is that when the system looks something up, it cares about the layer. A fact from a topic file is authoritative. A fact from a compressed session summary is routing information: it tells the system where to look next, not what to conclude. A fact from a raw session log is arbitration: read it when the topic and the compressed summary disagree. The CLAUDE.md file at the top of the project encodes this precedence explicitly:

## Memory precedence

1. Topics. CURATED CURRENT TRUTH. Wins.
2. Compressed sessions. RECALL/ROUTING. Not authoritative on facts.
3. Raw JSONL. ARBITRATION when summary and topic disagree.
4. Chroma/FTS5. DERIVED, regenerate from source.

The rule matters because without it, the system eventually poisons itself. A bad summary written in a moment of model confusion becomes indistinguishable from a good summary written after careful thought. With the precedence rule, the topic is the authority and the summaries are routing. Poisoning at the summary layer does not reach the authority layer.

MEMORY.md, the index

MEMORY.md is a single file, under 200 lines, auto-loaded into every session’s system prompt. It is the index of topics, not the content of any of them. It lists what topic files exist, what each one is for, and where to read further. It also contains a handful of the highest-priority preferences and constraints (the canonical list of printer Internet Protocol addresses, for instance, and the statement that email sends are pre-authorised).

I keep it short because it costs context on every session start. Any detail that can live in a topic file does. The index’s job is to be a reliable table of contents for a collection of topic files, not to be a condensed version of them. A user who wants to know about the memory system reads the topic file; MEMORY.md tells them where.

This is the difference between the system having three hundred lines of context and three hundred thousand lines of searchable corpus. The three hundred lines are cheap and always present. The three hundred thousand lines are reachable on demand. The index is what makes the reaching-on-demand work.

/dream, the consolidation pass

The consolidation pass is what makes the system something more than a searchable archive.

/dream is a skill that runs on a schedule (currently every 24 hours, triggered by a session-start hook that checks for a .dream-pending flag file). When it runs, it reads recent session transcripts that have not yet been processed, extracts specific patterns (new facts that contradict existing topic content, new patterns that match repeated-error templates, new decisions with clear rationale), and either updates the relevant topic files or appends to lessons.md.

The key rule: /dream can promote insights from the session layer into the topic layer, but never the reverse. Topics are the authoritative tier; they receive content, they do not export content to the summary layer. This asymmetry is what prevents the poisoning failure mode. If /dream makes a bad promotion, the topic file is visibly wrong when I next look at it, and I fix it by hand. But a bad promotion never flows back into session summaries, so it cannot amplify.

The things /dream is designed to catch are:

Repeated mistakes. If the same category of error appears in two separate sessions, the pattern gets added to lessons.md (the subject of its own post on this site) and becomes a pre-flight check for future sessions.
Policy drift. If I made a decision two months ago that a recent session contradicts without explicit justification, /dream flags the contradiction for me to resolve.
Topic staleness. If a topic file has not been touched in months but recent sessions have extended its subject area, /dream surfaces a candidate update.

The consolidation is not autonomous editing. It produces a diff and a summary; I apply the changes by hand after reading. This is deliberate. The reason is the same reason I do not let /dream promote into topics directly without review: the authority layer has to stay authoritative, which means humans read and approve changes to it.

Why this feels sleep-like

The biomimetic framing is not serious but it is useful. A nightly pass that consolidates the day’s new information into longer-term stores, promotes patterns into general rules, and prunes detail that is not load-bearing is structurally similar to what sleep does for biological memory. The analogy does not hold at the mechanism level (no actual neural replay, no transfer between specific brain regions) but it holds at the functional level: without consolidation, a memory system accumulates undifferentiated experience. With it, experience becomes structure.

The practical consequence is that the system gets better without me actively maintaining it. If I run /dream regularly, the topics stay up to date, the lessons file stays current, and the compressed-session layer stays useful for routing. If I do not, the system slowly degrades. A few weeks of skipping /dream produces a noticeable drop in retrieval quality when tested against real queries. A few months produce a system that is still technically searchable but functionally stale.

The corner cases

A few things the architecture does not do, and a few failure modes worth naming.

It does not try to deduplicate information across topics. If two topics say the same thing, both say it. I prefer some redundancy over the complexity of a single canonical store. When contradictions arise, I edit one of the topic files by hand.

It does not chain-of-thought its own conclusions. /dream does not try to infer new facts from combinations of existing ones. That class of inference is error-prone and produces false positives that poison the topic layer. /dream only promotes things that are stated explicitly in session transcripts.

It does not retain permanently. The compressed-session layer is append-only but can be garbage-collected if the raw sessions are purged. I have not done this yet; at the current rate, storage is not a binding constraint for several more years.

It has exactly one failure mode that I have hit and fixed twice. The topic files can become internally inconsistent if I edit them in a hurry. The defence against this is a review agent that periodically reads the topic files and flags contradictions against each other and against the compressed session history. The agent runs weekly and its output is a short report I skim on Mondays.

What I would tell a past me

Build the topic layer first. It is the piece that turns a retrieval system into something that actually knows things. Without it, you have search.

Keep the index file (MEMORY.md in my case) under 200 lines. This file is loaded on every session; anything you put in it is a per-session tax. Put pointers, not content.

Do not pick between semantic and keyword retrieval. Build both. The cost is low and the coverage is materially better.

Write the consolidation pass (/dream) as a hand-reviewed promotion step, not an autonomous editor. The authority layer has to stay authoritative. Machines propose; humans dispose.

And run the consolidation regularly. The system is only as good as the maintenance it has received. Skipping a week costs little; skipping a month costs real precision on retrieval; skipping a quarter produces a haunted archive instead of a memory system.

The memory Model Context Protocol (MCP) server code lives at ~/code/memory_server.py on the Mac Mini. It is roughly 500 lines of Python, uses FastMCP as the server framework, ChromaDB as the semantic store, and the built-in SQLite FTS5 extension for keyword search. The embedding model is all-MiniLM-L6-v2 running locally via ONNX; no external API is called. Full source will be published with the control plane repo.