Tutorials Builder
Agent memory patterns: vector stores, knowledge graphs, and context engineering
Why agents forget, the four memory patterns that fix it, and a working retrieval loop you can run in plain Python before you buy a vector database.
An agent’s context window is working memory: fast, expensive, and wiped between sessions. Everything else — what it learned yesterday, what the user prefers, what happened in the last nine steps of a ten-step task — has to be engineered. That engineering has exactly four recurring patterns.
Step 1 — The four patterns
| Pattern | Answers | Typical store |
|---|---|---|
| Scratchpad | ”Where was I in this task?” | Plain file / task state object |
| Retrieval memory | ”What do I know that’s relevant right now?” | Vector store |
| Knowledge graph | ”How do these entities relate?” | Graph / triples |
| Profile memory | ”What’s durably true about this user/project?” | Small curated documents |
Most teams jump straight to a vector database. Wrong order: scratchpad
and profile memory are cheaper, more debuggable, and cover most real
failures. A NOTES.md the agent reads at session start is profile memory.
A todo list it updates mid-task is a scratchpad. Ship those first.
Step 2 — Build retrieval memory you can actually inspect
The retrieval loop — store, score, recall, inject — is identical whether
the scorer is bag-of-words or a 3,072-dimension embedding. So build it with
a scorer you can debug by eye. Create memory.py:
import math
import re
from collections import Counter
def _tokens(text: str) -> Counter:
return Counter(re.findall(r"[a-z0-9]+", text.lower()))
def _cosine(a: Counter, b: Counter) -> float:
dot = sum(a[t] * b[t] for t in a.keys() & b.keys())
norm = math.sqrt(sum(v * v for v in a.values())) * \
math.sqrt(sum(v * v for v in b.values()))
return dot / norm if norm else 0.0
class MemoryStore:
"""Retrieval memory with a transparent scorer.
Swap _cosine-over-token-counts for embedding similarity in production;
the interface — and every bug you'll meet — stays the same.
"""
def __init__(self):
self._items: list[tuple[str, Counter]] = []
def add(self, text: str) -> None:
self._items.append((text, _tokens(text)))
def search(self, query: str, k: int = 3) -> list[tuple[float, str]]:
q = _tokens(query)
scored = [( _cosine(q, vec), text) for text, vec in self._items]
return sorted(scored, reverse=True)[:k]
if __name__ == "__main__":
memory = MemoryStore()
memory.add("User prefers TypeScript over Python for new services.")
memory.add("The staging database refresh runs Sundays at 02:00 UTC.")
memory.add("Deploys require two approvals since the March incident.")
for score, text in memory.search("should we pick Python or TypeScript for new services?"):
print(f"{score:.2f} {text}")
Run python memory.py — the preference memory scores on top by a wide
margin (0.59 vs 0.00 for the unrelated entries on this toy corpus). Before the
model sees anything, you can print exactly why a memory was recalled.
That transparency is worth keeping until scale forces you off it.
Step 3 — Inject memories as context, not commands
How recalled text enters the prompt matters as much as retrieval quality. Injected memories should be labeled as background, with provenance:
Relevant notes from memory (may be stale, verify before acting):
- [2026-05-12] User prefers TypeScript over Python for new services.
Unlabeled injection is how agents end up obeying a note from three months ago over the user in front of them. Memories inform; the current conversation commands.
Step 4 — When to graduate each component
- To embeddings: when synonyms start missing (“DB” never matches
“database”). Use any embedding API or a local model; keep the
MemoryStoreinterface. - To a real vector DB (pgvector first if you already run Postgres): when memories outgrow RAM or need sharing across processes.
- To a knowledge graph: only when your questions become relational — “which services depend on the thing that changed?” Retrieval memory can’t answer joins; graphs exist for that.
Step 5 — Forgetting is a feature
Memory that only grows becomes noise that costs tokens. Production memory needs an eviction story: expire by age, deduplicate near-identical entries, and — most effective — have the agent periodically rewrite its memory files, compressing ten observations into one conclusion. Curation beats accumulation.
Troubleshooting
The agent keeps acting on stale memories
Add timestamps to every entry, label injected memories as potentially
stale (Step 3), and prefer recency-weighted scoring:
score * decay(age). Most “the agent is confused” reports are actually
“the memory is old.”
Retrieval returns plausible-looking but irrelevant memories
Top-k always returns something — even when nothing is relevant. Add a minimum-score threshold and inject nothing below it. An empty memory section beats a misleading one.
Was this guide useful?
Thanks — noted. It shapes what gets written next.
newsletter
One practical agentic-AI guide in your inbox. No news, no hype.
Tutorials and decision frameworks as they ship. Unsubscribe anytime.