RAG vs. agent memory vs. fine-tuning: which knowledge problem do you actually have?

“The model needs to know our stuff” is three different engineering problems wearing one sentence. Teams that don’t split them end up fine-tuning what should have been retrieved, or building vector databases for what should have been a system prompt. The diagnostic:

The three problems

Reference knowledge — facts that exist outside any conversation: product docs, policies, schemas, the codebase. Changes on its own schedule. → Retrieval (RAG).

Experiential knowledge — what the system learns by operating: user preferences, past decisions, what worked last time. Accumulates through use, per user or per project. → Agent memory.

Behavioral knowledge — how to respond: tone, format discipline, domain-specific reasoning moves, tool-use style. → Fine-tuning — and prompting covers most of it first.

The comparison that matters

	RAG	Agent memory	Fine-tuning
Update latency	Minutes (re-index)	Immediate (write)	Weeks (retrain + re-eval)
Provenance	Strong — cite the chunk	Good — cite the note	None — baked into weights
Forgetting/removal	Delete the document	Delete the entry	Retrain (practically: hard)
Per-user personalization	Awkward	Native	Impractical
Fixes wrong style	No	No	Yes
Fixes missing facts	Yes	Session-learned ones	Poorly and dangerously

The provenance row deserves the emphasis: a RAG answer can say why it believes something. A fine-tuned answer just believes it. The moment someone asks “where did that come from?” — a user, a reviewer, a court — retrieval-based knowledge is defensible and weights-based knowledge is a shrug.

The order to try them

System prompt — if the knowledge fits in a few thousand tokens and rarely changes, put it in the prompt and stop. Context windows have made this the right answer far more often than in 2023.
RAG — when the corpus outgrows the prompt or changes independently.
Memory — when you notice the system re-learning the same things every session.
Fine-tuning — when prompting demonstrably can’t fix a behavior, and you have the eval suite to prove the tuned model didn’t regress elsewhere. It’s the last resort, not the power move.

Most production agent systems end with all of RAG + memory and no fine-tuning at all. That’s not a compromise; that’s the architecture.