MCP servers vs. direct CLI calls: when each wins
Agents can reach tools through an MCP server or by running CLI commands directly. A decision framework with the token economics of each approach.
Every agent harness can shell out. So why wrap anything in an MCP server at all? Because the two approaches pay different costs in different places — and the break-even point is calculable, not a matter of taste.
The trade-off in one table
| Dimension | Direct CLI | MCP server |
|---|---|---|
| Setup cost | Zero — the tool already exists | Write + maintain a server |
| Context cost | Model must know/discover flags | Schema advertised compactly, once |
| Error behavior | Parse stderr, guess exit codes | Structured errors |
| Reuse across agents | Re-prompt every harness | Register once per client |
| Access control | Whatever the shell allows | Choke point you own |
| Auditability | Shell history, if that | One log, structured |
The token math (framework)
The comparison that matters is total tokens per successful task:
- CLI path ≈ (discovery turns:
--helpoutput, man pages) + (attempt turns: malformed flags, stderr round-trips) + (output parsing overhead — CLI output is formatted for humans, so it’s verbose for models). - MCP path ≈ (tool schemas loaded into context every session) + (one structured call per use, high first-try success rate).
CLI costs are per-task and front-loaded in retries; MCP costs are per-session and front-loaded in schema overhead. So: infrequent, varied tasks favor CLI; repeated, structured tasks favor MCP. A tool called once a week doesn’t earn its schema slot in every session’s context. A tool called ten times per session pays for itself immediately.
Coming in an update: measured token counts for five representative tasks (git operations, database queries, file search, cloud CLI, ticket system) run both ways. The framework above is how to reason; the numbers will show where the break-even actually lands.
The decision rule
Choose direct CLI when: the task is exploratory or one-off; the tool’s interface is stable and famously well-known to models (git, grep); you’re prototyping and the server would be speculation.
Choose MCP when any of these is true:
- The same operation happens more than a handful of times per session.
- Multiple agents or teammates need the same access — write once, register everywhere.
- The tool touches anything that needs auth, redaction, or an audit trail.
- The raw CLI output is huge and you want the server to filter before it hits context.
Was this guide useful?
Thanks — noted. It shapes what gets written next.
newsletter
One practical agentic-AI guide in your inbox. No news, no hype.
Tutorials and decision frameworks as they ship. Unsubscribe anytime.