The agentic AI tech stack, layer by layer: what you actually need to build agents
Seven layers make up every serious agentic system — models, serving, harness, protocols, knowledge, evals, and controls. What each layer does, the main options in each, and where the choices actually matter.
Ask five vendors “what do I need for agents?” and you’ll get five diagrams where their box is the biggest. Here’s the vendor-free version: seven layers, what each does, the main options, and — most usefully — how much each choice actually matters.
flowchart TB
U["Users & apps"] --> H
subgraph stack ["The agentic stack"]
H["Harness / orchestration"] --> P["Protocols: MCP · A2A · AG-UI"]
P --> T["Tools & knowledge (your APIs, data, memory)"]
H --> G["Gateway & serving"]
G --> M["Models (hosted & local)"]
end
H -. traces .-> O["Evals & observability"]
T -. enforced by .-> C["Security & governance controls"]
Layer 1 — Models (the engine)
The reasoning engine: Claude, GPT, and Gemini at the frontier; Llama, Mistral, Qwen, DeepSeek-class open-weights models where you need local control or lower cost. The strategic point: agents are the layer where model choice is most swappable — a good harness talks to any of them, and most mature systems route different task tiers to different models. Full decision framework: choosing an LLM for agents.
Layer 2 — Serving & gateway (the fuel line)
Hosted APIs need a gateway (LiteLLM, OpenRouter, or a cloud provider’s equivalent) for routing, budgets, keys, and fallback. Local needs a serving engine: llama.cpp for single-box and edge, vLLM or SGLang for GPU throughput, Ollama for developer convenience. Nearly every serving option speaks the OpenAI-compatible protocol now — which is what makes Layer 1 swappable.
Layer 3 — Harness / orchestration (the chassis)
The loop that turns a model into an agent. Options in ascending machinery: your own ~40-line loop, provider SDKs (Claude Agent SDK, OpenAI Agents SDK), and full frameworks (LangGraph, CrewAI, AutoGen-lineage) when you need durable state and graph orchestration. The build-vs-buy call gets its own verdict.
Layer 4 — Protocols (the connectors)
MCP to connect agents to tools, A2A for agent-to-agent delegation across ownership boundaries, AG-UI for streaming agent state into frontends. One rule: protocols are seams, so keep adapters thin and your system survives the ecosystem’s churn.
Layer 5 — Knowledge & memory (what it knows)
Retrieval over your corpus (pgvector if you run Postgres; Qdrant, Weaviate, Chroma otherwise), agent memory for what the system learns in use, and plain files for profile knowledge. The RAG vs. memory vs. fine-tuning diagnostic sorts out which knowledge problem you have.
Layer 6 — Evals & observability (whether it works)
Golden-task evals gating changes, OTel GenAI-convention traces, and a trace store (Langfuse, Arize Phoenix, W&B Weave, or your existing observability stack). This layer is chronically under-invested and is where reliability actually comes from.
Layer 7 — Security & governance (whether it’s safe)
Tool-layer allowlists and validation, approval tiers, injection defenses, and cost controls. Enforced in Layers 2–4 — never in the prompt.
Where the choices actually matter
| Layer | Switching cost later | Advice |
|---|---|---|
| Model | Low (with a gateway) | Don’t agonize; route and re-evaluate |
| Serving/gateway | Low–medium | Standardize early, one per org |
| Harness | High | The decision to make deliberately |
| Protocols | Low if adapters are thin | Adopt MCP now; others as needed |
| Knowledge | Medium | Start boring (pgvector, files) |
| Evals | n/a — pure asset | Start on day one, never stop |
| Controls | Painful to retrofit | Design in from the first tool |
The two layers you can’t buy: your tools (they encode your business) and your evals (they encode your definition of “works”). Everything else is increasingly commodity — which is exactly where your innovation budget shouldn’t go.
Was this guide useful?
Thanks — noted. It shapes what gets written next.
newsletter
One practical agentic-AI guide in your inbox. No news, no hype.
Tutorials and decision frameworks as they ship. Unsubscribe anytime.