Tutorials Builder
Multi-agent orchestration explained: build an orchestrator and sub-agents from scratch
Build the agent harness pattern in ~80 lines of plain Python — an orchestrator that plans, delegates to specialist sub-agents, and synthesizes — with a mock model so you can run it instantly.
Multi-agent systems sound exotic until you build one and discover the whole trick: an “agent” is a loop, and an “orchestrator” is a loop that starts other loops. Frameworks hide this, which is exactly why people who start with frameworks stay confused. We’ll build the harness bare, watch it run deterministically, and only then talk about real models and frameworks.
Step 1 — Why split one agent into several?
One agent with twenty tools has three problems: the system prompt becomes a committee memo, every tool schema burns context on every call, and one bad step contaminates the whole trajectory. Splitting by role fixes all three — each sub-agent gets a short, focused prompt and only the tools its job needs. The orchestrator’s only tools are its sub-agents.
Step 2 — Build a mock model first
This is the step everyone skips and regrets. A deterministic fake model lets
you test the harness — routing, parsing, aggregation — without paying for
tokens or debugging two things at once. Create harness.py:
import json
class MockModel:
"""Deterministic stand-in for an LLM API.
Routes on the system prompt so the harness can be tested
end-to-end with zero API calls. Swap for a real client later.
"""
def complete(self, system: str, user: str) -> str:
if "You are a planner" in system:
return json.dumps({
"subtasks": [
{"agent": "researcher", "task": "List MCP's three primitives."},
{"agent": "writer", "task": "Explain them in one sentence each."},
]
})
if "You are a researcher" in system:
return "MCP exposes tools, resources, and prompts."
if "You are a writer" in system:
return f"Polished: {user}"
return "OK"
Step 3 — The sub-agent: a role, a model, a loop
class Agent:
def __init__(self, name: str, system_prompt: str, model):
self.name = name
self.system_prompt = system_prompt
self.model = model
def run(self, task: str) -> str:
# Real agents loop over tool calls here; the shape is identical.
return self.model.complete(self.system_prompt, task)
That’s genuinely it. A production sub-agent adds a tool-calling loop inside
run() — but the interface (task in, result out) is the entire contract
the orchestrator needs.
Step 4 — The orchestrator: plan, dispatch, synthesize
class Orchestrator:
PLANNER_PROMPT = (
"You are a planner. Decompose the user's goal into subtasks. "
"Respond with JSON: {\"subtasks\": [{\"agent\": ..., \"task\": ...}]}. "
"Available agents: researcher, writer."
)
def __init__(self, model, agents: dict[str, Agent]):
self.model = model
self.agents = agents
def run(self, goal: str) -> str:
# 1. Plan
raw = self.model.complete(self.PLANNER_PROMPT, goal)
plan = json.loads(raw)
# 2. Dispatch — results flow forward so later agents see earlier work
context = goal
results = []
for step in plan["subtasks"]:
agent = self.agents.get(step["agent"])
if agent is None:
results.append(f"[skipped: unknown agent {step['agent']!r}]")
continue
output = agent.run(f"{step['task']}\n\nContext so far:\n{context}")
results.append(f"### {agent.name}\n{output}")
context = output
# 3. Synthesize
return "\n\n".join(results)
if __name__ == "__main__":
model = MockModel()
orchestrator = Orchestrator(model, {
"researcher": Agent("researcher", "You are a researcher. Be factual.", model),
"writer": Agent("writer", "You are a writer. Be clear.", model),
})
print(orchestrator.run("Explain what an MCP server exposes."))
Run it:
python harness.py
You’ll see the researcher’s finding flow into the writer’s input — the core data flow of every multi-agent system you’ll ever build, visible in 80 lines with no dependencies.
Step 5 — Swap in a real model
The harness doesn’t change; only MockModel does:
import anthropic
class ClaudeModel:
def __init__(self):
self.client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY
def complete(self, system: str, user: str) -> str:
response = self.client.messages.create(
model="claude-sonnet-5",
max_tokens=1024,
system=system,
messages=[{"role": "user", "content": user}],
)
return response.content[0].text
Because the planner returns JSON, keep json.loads wrapped in a retry that
feeds the parse error back to the model — with real models, malformed JSON
is a when, not an if.
The same swap works for any provider: OpenAI and Gemini SDKs have the same
system + user → text shape, and any OpenAI-compatible endpoint — including
a locally served model — just changes the
client’s base_url. The harness never knows the difference; that
indifference is worth preserving as you grow
(how to choose the model).
Step 6 — What the frameworks add (and when you need them)
Now that you’ve seen the bare pattern, framework features map cleanly onto
it: parallel dispatch (run sub-agents concurrently), shared memory (a
smarter context than our last-output-wins), typed handoffs (schemas
instead of prose between agents), and retries/tracing around every
complete() call. Adopt a framework when you need three or more of those —
not before you understand what they’re wrapping.
Troubleshooting
json.JSONDecodeError when using a real model as planner
Real models wrap JSON in prose or markdown fences. Extract the first
{...} block before parsing, and on failure, re-prompt with the error
message and the malformed output. Two retries fix >95% of cases.
Sub-agents give great individual answers that don’t combine
Your context handoff is too lossy. Passing only the last output (as our minimal harness does) loses earlier results — accumulate a structured results list into each subsequent task prompt instead.
The system works but costs exploded
Every hop re-sends context. Keep sub-agent prompts short, pass summaries rather than transcripts between agents, and measure tokens per goal — the FinOps piece in the Enterprise section covers the metering setup.
Was this guide useful?
Thanks — noted. It shapes what gets written next.
newsletter
One practical agentic-AI guide in your inbox. No news, no hype.
Tutorials and decision frameworks as they ship. Unsubscribe anytime.