MCP servers vs. direct CLI calls: when each wins

Every agent harness can shell out. So why wrap anything in an MCP server at all? Because the two approaches pay different costs in different places — and the break-even point is calculable, not a matter of taste.

The trade-off in one table

Dimension	Direct CLI	MCP server
Setup cost	Zero — the tool already exists	Write + maintain a server
Context cost	Model must know/discover flags	Schema advertised compactly, once
Error behavior	Parse stderr, guess exit codes	Structured errors
Reuse across agents	Re-prompt every harness	Register once per client
Access control	Whatever the shell allows	Choke point you own
Auditability	Shell history, if that	One log, structured

The token math (framework)

The comparison that matters is total tokens per successful task:

CLI path ≈ (discovery turns: --help output, man pages) + (attempt turns: malformed flags, stderr round-trips) + (output parsing overhead — CLI output is formatted for humans, so it’s verbose for models).
MCP path ≈ (tool schemas loaded into context every session) + (one structured call per use, high first-try success rate).

CLI costs are per-task and front-loaded in retries; MCP costs are per-session and front-loaded in schema overhead. So: infrequent, varied tasks favor CLI; repeated, structured tasks favor MCP. A tool called once a week doesn’t earn its schema slot in every session’s context. A tool called ten times per session pays for itself immediately.

Coming in an update: measured token counts for five representative tasks (git operations, database queries, file search, cloud CLI, ticket system) run both ways. The framework above is how to reason; the numbers will show where the break-even actually lands.

The decision rule

Choose direct CLI when: the task is exploratory or one-off; the tool’s interface is stable and famously well-known to models (git, grep); you’re prototyping and the server would be speculation.

Choose MCP when any of these is true:

The same operation happens more than a handful of times per session.
Multiple agents or teammates need the same access — write once, register everywhere.
The tool touches anything that needs auth, redaction, or an audit trail.
The raw CLI output is huge and you want the server to filter before it hits context.