Human-in-the-loop patterns that scale: approval design for agent actions
Naive HITL either rubber-stamps everything or drowns reviewers. Five approval patterns, a materiality matrix for choosing, and the metrics that tell you when to loosen the loop.
“Human in the loop” is the most-cited and least-designed control in agentic AI. Done naively it degenerates fast: reviewers approve 200 items a day, attention drops to zero, and you’ve built a rubber stamp with an audit trail — the worst of both worlds, because now the human is accountable for approvals they never truly examined.
The failure mode to design against
Approval fatigue is not a discipline problem; it’s a base-rate problem. If 98% of agent actions are fine, a reviewer sees a real problem twice per hundred approvals — far below the vigilance threshold any human sustains. The scarce resource is reviewer attention, and every pattern below is a way of spending it where it changes outcomes.
Five patterns, in escalating trust
-
Pre-approval (act-with-permission). The agent proposes; nothing happens until a human confirms. Right for irreversible, material, outward-facing actions — and only those, or fatigue eats the control.
-
Batch review. The agent queues similar low-stakes actions; a human reviews the batch with sampling and spot-checks. Ten approvals become one decision about a pattern.
-
Post-hoc audit (act-then-review). The agent acts; a sampled percentage gets human review within an SLA. Right for reversible actions with good rollback — and the sample rate is your tuning knob.
-
Exception-only escalation. The agent acts autonomously inside defined bounds (amount limits, confidence thresholds, allowlisted targets) and escalates only boundary-crossers. The bounds — not the individual actions — are what humans review and periodically re-approve.
-
Kill-switch supervision. Full autonomy plus monitored trajectories and a tested pause mechanism. Legitimate only where the blast radius of any single action is genuinely small.
Choosing: the materiality matrix
Classify every tool the agent holds on two axes — reversibility and blast radius — and assign patterns accordingly:
| Small blast radius | Large blast radius | |
|---|---|---|
| Reversible | Pattern 4–5 (autonomy in bounds) | Pattern 3 (act, audit sample) |
| Irreversible | Pattern 2–3 (batch or audit) | Pattern 1 (pre-approval, always) |
The matrix lives at the tool layer, not the prompt: your MCP server classifies each tool call and routes it to the right approval channel. The agent doesn’t get to argue.
Making approvals reviewable, not clickable
A pre-approval request that shows the action alone (“send this email? Y/N”) invites rubber-stamping. A reviewable request shows: what the agent wants to do, why (its stated reasoning), what it looked at (trajectory summary — especially whether untrusted content was involved), and what happens if wrong (reversibility statement). Ten seconds of context turns an approval from a reflex into a decision.
The metrics that tune the loop
- Override rate — approvals rejected by humans. Near zero for months? The pattern is too tight; graduate that action class one level. High? The agent isn’t ready for this action class at all.
- Time-to-approve — creeping up means fatigue or unclear requests.
- Sampled-audit disagreement — for patterns 3–5, how often post-hoc review disagrees with the agent. This is your early-warning drift signal.
Reviewing these quarterly and moving action classes between patterns — with sign-off — is what “human oversight” means operationally, rather than as a checkbox.
Was this guide useful?
Thanks — noted. It shapes what gets written next.
newsletter
One practical agentic-AI guide in your inbox. No news, no hype.
Tutorials and decision frameworks as they ship. Unsubscribe anytime.