The agentic AI tech stack, layer by layer: what you actually need to build agents

Ask five vendors “what do I need for agents?” and you’ll get five diagrams where their box is the biggest. Here’s the vendor-free version: seven layers, what each does, the main options, and — most usefully — how much each choice actually matters.

flowchart TB
    U["Users & apps"] --> H
    subgraph stack ["The agentic stack"]
        H["Harness / orchestration"] --> P["Protocols: MCP · A2A · AG-UI"]
        P --> T["Tools & knowledge (your APIs, data, memory)"]
        H --> G["Gateway & serving"]
        G --> M["Models (hosted & local)"]
    end
    H -. traces .-> O["Evals & observability"]
    T -. enforced by .-> C["Security & governance controls"]

Layer 1 — Models (the engine)

The reasoning engine: Claude, GPT, and Gemini at the frontier; Llama, Mistral, Qwen, DeepSeek-class open-weights models where you need local control or lower cost. The strategic point: agents are the layer where model choice is most swappable — a good harness talks to any of them, and most mature systems route different task tiers to different models. Full decision framework: choosing an LLM for agents.

Layer 2 — Serving & gateway (the fuel line)

Hosted APIs need a gateway (LiteLLM, OpenRouter, or a cloud provider’s equivalent) for routing, budgets, keys, and fallback. Local needs a serving engine: llama.cpp for single-box and edge, vLLM or SGLang for GPU throughput, Ollama for developer convenience. Nearly every serving option speaks the OpenAI-compatible protocol now — which is what makes Layer 1 swappable.

Layer 3 — Harness / orchestration (the chassis)

The loop that turns a model into an agent. Options in ascending machinery: your own ~40-line loop, provider SDKs (Claude Agent SDK, OpenAI Agents SDK), and full frameworks (LangGraph, CrewAI, AutoGen-lineage) when you need durable state and graph orchestration. The build-vs-buy call gets its own verdict.

Layer 4 — Protocols (the connectors)

MCP to connect agents to tools, A2A for agent-to-agent delegation across ownership boundaries, AG-UI for streaming agent state into frontends. One rule: protocols are seams, so keep adapters thin and your system survives the ecosystem’s churn.

Layer 5 — Knowledge & memory (what it knows)

Retrieval over your corpus (pgvector if you run Postgres; Qdrant, Weaviate, Chroma otherwise), agent memory for what the system learns in use, and plain files for profile knowledge. The RAG vs. memory vs. fine-tuning diagnostic sorts out which knowledge problem you have.

Layer 6 — Evals & observability (whether it works)

Golden-task evals gating changes, OTel GenAI-convention traces, and a trace store (Langfuse, Arize Phoenix, W&B Weave, or your existing observability stack). This layer is chronically under-invested and is where reliability actually comes from.

Layer 7 — Security & governance (whether it’s safe)

Tool-layer allowlists and validation, approval tiers, injection defenses, and cost controls. Enforced in Layers 2–4 — never in the prompt.

Where the choices actually matter

Layer	Switching cost later	Advice
Model	Low (with a gateway)	Don’t agonize; route and re-evaluate
Serving/gateway	Low–medium	Standardize early, one per org
Harness	High	The decision to make deliberately
Protocols	Low if adapters are thin	Adopt MCP now; others as needed
Knowledge	Medium	Start boring (pgvector, files)
Evals	n/a — pure asset	Start on day one, never stop
Controls	Painful to retrofit	Design in from the first tool

The two layers you can’t buy: your tools (they encode your business) and your evals (they encode your definition of “works”). Everything else is increasingly commodity — which is exactly where your innovation budget shouldn’t go.