Human-in-the-loop patterns that scale: approval design for agent actions

“Human in the loop” is the most-cited and least-designed control in agentic AI. Done naively it degenerates fast: reviewers approve 200 items a day, attention drops to zero, and you’ve built a rubber stamp with an audit trail — the worst of both worlds, because now the human is accountable for approvals they never truly examined.

The failure mode to design against

Approval fatigue is not a discipline problem; it’s a base-rate problem. If 98% of agent actions are fine, a reviewer sees a real problem twice per hundred approvals — far below the vigilance threshold any human sustains. The scarce resource is reviewer attention, and every pattern below is a way of spending it where it changes outcomes.

Five patterns, in escalating trust

Pre-approval (act-with-permission). The agent proposes; nothing happens until a human confirms. Right for irreversible, material, outward-facing actions — and only those, or fatigue eats the control.
Batch review. The agent queues similar low-stakes actions; a human reviews the batch with sampling and spot-checks. Ten approvals become one decision about a pattern.
Post-hoc audit (act-then-review). The agent acts; a sampled percentage gets human review within an SLA. Right for reversible actions with good rollback — and the sample rate is your tuning knob.
Exception-only escalation. The agent acts autonomously inside defined bounds (amount limits, confidence thresholds, allowlisted targets) and escalates only boundary-crossers. The bounds — not the individual actions — are what humans review and periodically re-approve.
Kill-switch supervision. Full autonomy plus monitored trajectories and a tested pause mechanism. Legitimate only where the blast radius of any single action is genuinely small.

Choosing: the materiality matrix

Classify every tool the agent holds on two axes — reversibility and blast radius — and assign patterns accordingly:

	Small blast radius	Large blast radius
Reversible	Pattern 4–5 (autonomy in bounds)	Pattern 3 (act, audit sample)
Irreversible	Pattern 2–3 (batch or audit)	Pattern 1 (pre-approval, always)

The matrix lives at the tool layer, not the prompt: your MCP server classifies each tool call and routes it to the right approval channel. The agent doesn’t get to argue.

Making approvals reviewable, not clickable

A pre-approval request that shows the action alone (“send this email? Y/N”) invites rubber-stamping. A reviewable request shows: what the agent wants to do, why (its stated reasoning), what it looked at (trajectory summary — especially whether untrusted content was involved), and what happens if wrong (reversibility statement). Ten seconds of context turns an approval from a reflex into a decision.

The metrics that tune the loop

Override rate — approvals rejected by humans. Near zero for months? The pattern is too tight; graduate that action class one level. High? The agent isn’t ready for this action class at all.
Time-to-approve — creeping up means fatigue or unclear requests.
Sampled-audit disagreement — for patterns 3–5, how often post-hoc review disagrees with the agent. This is your early-warning drift signal.

Reviewing these quarterly and moving action classes between patterns — with sign-off — is what “human oversight” means operationally, rather than as a checkbox.