Multi-agent orchestration explained: build an orchestrator and sub-agents from scratch

Multi-agent systems sound exotic until you build one and discover the whole trick: an “agent” is a loop, and an “orchestrator” is a loop that starts other loops. Frameworks hide this, which is exactly why people who start with frameworks stay confused. We’ll build the harness bare, watch it run deterministically, and only then talk about real models and frameworks.

Step 1 — Why split one agent into several?

One agent with twenty tools has three problems: the system prompt becomes a committee memo, every tool schema burns context on every call, and one bad step contaminates the whole trajectory. Splitting by role fixes all three — each sub-agent gets a short, focused prompt and only the tools its job needs. The orchestrator’s only tools are its sub-agents.

Step 2 — Build a mock model first

This is the step everyone skips and regrets. A deterministic fake model lets you test the harness — routing, parsing, aggregation — without paying for tokens or debugging two things at once. Create harness.py:

import json

class MockModel:
    """Deterministic stand-in for an LLM API.

    Routes on the system prompt so the harness can be tested
    end-to-end with zero API calls. Swap for a real client later.
    """

    def complete(self, system: str, user: str) -> str:
        if "You are a planner" in system:
            return json.dumps({
                "subtasks": [
                    {"agent": "researcher", "task": "List MCP's three primitives."},
                    {"agent": "writer", "task": "Explain them in one sentence each."},
                ]
            })
        if "You are a researcher" in system:
            return "MCP exposes tools, resources, and prompts."
        if "You are a writer" in system:
            return f"Polished: {user}"
        return "OK"

Step 3 — The sub-agent: a role, a model, a loop

class Agent:
    def __init__(self, name: str, system_prompt: str, model):
        self.name = name
        self.system_prompt = system_prompt
        self.model = model

    def run(self, task: str) -> str:
        # Real agents loop over tool calls here; the shape is identical.
        return self.model.complete(self.system_prompt, task)

That’s genuinely it. A production sub-agent adds a tool-calling loop inside run() — but the interface (task in, result out) is the entire contract the orchestrator needs.

Step 4 — The orchestrator: plan, dispatch, synthesize

class Orchestrator:
    PLANNER_PROMPT = (
        "You are a planner. Decompose the user's goal into subtasks. "
        "Respond with JSON: {\"subtasks\": [{\"agent\": ..., \"task\": ...}]}. "
        "Available agents: researcher, writer."
    )

    def __init__(self, model, agents: dict[str, Agent]):
        self.model = model
        self.agents = agents

    def run(self, goal: str) -> str:
        # 1. Plan
        raw = self.model.complete(self.PLANNER_PROMPT, goal)
        plan = json.loads(raw)

        # 2. Dispatch — results flow forward so later agents see earlier work
        context = goal
        results = []
        for step in plan["subtasks"]:
            agent = self.agents.get(step["agent"])
            if agent is None:
                results.append(f"[skipped: unknown agent {step['agent']!r}]")
                continue
            output = agent.run(f"{step['task']}\n\nContext so far:\n{context}")
            results.append(f"### {agent.name}\n{output}")
            context = output

        # 3. Synthesize
        return "\n\n".join(results)


if __name__ == "__main__":
    model = MockModel()
    orchestrator = Orchestrator(model, {
        "researcher": Agent("researcher", "You are a researcher. Be factual.", model),
        "writer": Agent("writer", "You are a writer. Be clear.", model),
    })
    print(orchestrator.run("Explain what an MCP server exposes."))

Run it:

python harness.py

You’ll see the researcher’s finding flow into the writer’s input — the core data flow of every multi-agent system you’ll ever build, visible in 80 lines with no dependencies.

Step 5 — Swap in a real model

The harness doesn’t change; only MockModel does:

import anthropic

class ClaudeModel:
    def __init__(self):
        self.client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY

    def complete(self, system: str, user: str) -> str:
        response = self.client.messages.create(
            model="claude-sonnet-5",
            max_tokens=1024,
            system=system,
            messages=[{"role": "user", "content": user}],
        )
        return response.content[0].text

Because the planner returns JSON, keep json.loads wrapped in a retry that feeds the parse error back to the model — with real models, malformed JSON is a when, not an if.

The same swap works for any provider: OpenAI and Gemini SDKs have the same system + user → text shape, and any OpenAI-compatible endpoint — including a locally served model — just changes the client’s base_url. The harness never knows the difference; that indifference is worth preserving as you grow (how to choose the model).

Step 6 — What the frameworks add (and when you need them)

Now that you’ve seen the bare pattern, framework features map cleanly onto it: parallel dispatch (run sub-agents concurrently), shared memory (a smarter context than our last-output-wins), typed handoffs (schemas instead of prose between agents), and retries/tracing around every complete() call. Adopt a framework when you need three or more of those — not before you understand what they’re wrapping.

Troubleshooting

json.JSONDecodeError when using a real model as planner

Real models wrap JSON in prose or markdown fences. Extract the first {...} block before parsing, and on failure, re-prompt with the error message and the malformed output. Two retries fix >95% of cases.

Sub-agents give great individual answers that don’t combine

Your context handoff is too lossy. Passing only the last output (as our minimal harness does) loses earlier results — accumulate a structured results list into each subsequent task prompt instead.

The system works but costs exploded

Every hop re-sends context. Keep sub-agent prompts short, pass summaries rather than transcripts between agents, and measure tokens per goal — the FinOps piece in the Enterprise section covers the metering setup.