hubagenticai

Compare & Decide

RAG vs. agent memory vs. fine-tuning: which knowledge problem do you actually have?

Three ways to make a model "know" things, routinely confused with each other. A diagnostic for picking the right one — and the order to try them in.

updated 2026-07-04

“The model needs to know our stuff” is three different engineering problems wearing one sentence. Teams that don’t split them end up fine-tuning what should have been retrieved, or building vector databases for what should have been a system prompt. The diagnostic:

The three problems

Reference knowledge — facts that exist outside any conversation: product docs, policies, schemas, the codebase. Changes on its own schedule. → Retrieval (RAG).

Experiential knowledge — what the system learns by operating: user preferences, past decisions, what worked last time. Accumulates through use, per user or per project. → Agent memory.

Behavioral knowledgehow to respond: tone, format discipline, domain-specific reasoning moves, tool-use style. → Fine-tuning — and prompting covers most of it first.

The comparison that matters

RAGAgent memoryFine-tuning
Update latencyMinutes (re-index)Immediate (write)Weeks (retrain + re-eval)
ProvenanceStrong — cite the chunkGood — cite the noteNone — baked into weights
Forgetting/removalDelete the documentDelete the entryRetrain (practically: hard)
Per-user personalizationAwkwardNativeImpractical
Fixes wrong styleNoNoYes
Fixes missing factsYesSession-learned onesPoorly and dangerously

The provenance row deserves the emphasis: a RAG answer can say why it believes something. A fine-tuned answer just believes it. The moment someone asks “where did that come from?” — a user, a reviewer, a court — retrieval-based knowledge is defensible and weights-based knowledge is a shrug.

The order to try them

  1. System prompt — if the knowledge fits in a few thousand tokens and rarely changes, put it in the prompt and stop. Context windows have made this the right answer far more often than in 2023.
  2. RAG — when the corpus outgrows the prompt or changes independently.
  3. Memory — when you notice the system re-learning the same things every session.
  4. Fine-tuning — when prompting demonstrably can’t fix a behavior, and you have the eval suite to prove the tuned model didn’t regress elsewhere. It’s the last resort, not the power move.

Most production agent systems end with all of RAG + memory and no fine-tuning at all. That’s not a compromise; that’s the architecture.

newsletter

One practical agentic-AI guide in your inbox. No news, no hype.

Tutorials and decision frameworks as they ship. Unsubscribe anytime.