RAG vs. agent memory vs. fine-tuning: which knowledge problem do you actually have?
Three ways to make a model "know" things, routinely confused with each other. A diagnostic for picking the right one — and the order to try them in.
“The model needs to know our stuff” is three different engineering problems wearing one sentence. Teams that don’t split them end up fine-tuning what should have been retrieved, or building vector databases for what should have been a system prompt. The diagnostic:
The three problems
Reference knowledge — facts that exist outside any conversation: product docs, policies, schemas, the codebase. Changes on its own schedule. → Retrieval (RAG).
Experiential knowledge — what the system learns by operating: user preferences, past decisions, what worked last time. Accumulates through use, per user or per project. → Agent memory.
Behavioral knowledge — how to respond: tone, format discipline, domain-specific reasoning moves, tool-use style. → Fine-tuning — and prompting covers most of it first.
The comparison that matters
| RAG | Agent memory | Fine-tuning | |
|---|---|---|---|
| Update latency | Minutes (re-index) | Immediate (write) | Weeks (retrain + re-eval) |
| Provenance | Strong — cite the chunk | Good — cite the note | None — baked into weights |
| Forgetting/removal | Delete the document | Delete the entry | Retrain (practically: hard) |
| Per-user personalization | Awkward | Native | Impractical |
| Fixes wrong style | No | No | Yes |
| Fixes missing facts | Yes | Session-learned ones | Poorly and dangerously |
The provenance row deserves the emphasis: a RAG answer can say why it believes something. A fine-tuned answer just believes it. The moment someone asks “where did that come from?” — a user, a reviewer, a court — retrieval-based knowledge is defensible and weights-based knowledge is a shrug.
The order to try them
- System prompt — if the knowledge fits in a few thousand tokens and rarely changes, put it in the prompt and stop. Context windows have made this the right answer far more often than in 2023.
- RAG — when the corpus outgrows the prompt or changes independently.
- Memory — when you notice the system re-learning the same things every session.
- Fine-tuning — when prompting demonstrably can’t fix a behavior, and you have the eval suite to prove the tuned model didn’t regress elsewhere. It’s the last resort, not the power move.
Most production agent systems end with all of RAG + memory and no fine-tuning at all. That’s not a compromise; that’s the architecture.
Was this guide useful?
Thanks — noted. It shapes what gets written next.
newsletter
One practical agentic-AI guide in your inbox. No news, no hype.
Tutorials and decision frameworks as they ship. Unsubscribe anytime.