MCP 还没做完,又要 CLI 了 — The Token-Budget Debate Behind AI Agent Interfaces

wy Lv3

MCP 还没做完,又要 CLI 了

Based on a Xiaohongshu post from 爱吉尼亚羊 titled "MCP 还没做完,又要 CLI 了""MCP isn't finished yet, and now it's CLI." The author opens with a line that sets the tone:

不做猜测理论分享,全部来自于一线实战。
("Not speculation, not theory — everything here comes from first-line practice.")

Their setup: they had just finished building out an MCP integration at their company, and now their team is ripping out the old Console they were about to ship and replacing it with a CLI. The author paused their own project to figure out whether to continue. The rest of the post is their three questions and three answers, written as an engineer-to-engineer field report rather than a trend piece.

This post walks through the author's three questions, adds the real numbers behind them, integrates the useful comments the post attracted, and ties the whole thing back to the four-post arc on agent architecture.


0. The Author's Three Questions

爱吉尼亚羊 frames the whole post around three questions they want to answer from first-hand experience:

  1. 为什么突然都在做这个事情? — Why is everyone suddenly doing this (moving from Console/MCP to CLI)?
  2. 这个 CLI 与 MCP 有什么关系,有 MCP 为什么还要 CLI? — What is CLI's relationship to MCP; if we have MCP, why do we still need CLI?
  3. 对前端/客户端工程师有什么影响? — What is the impact on front-end / client engineers?

I'll walk through each with the author's own answer first, then add my take.


1. The Meme: "MCP Is Dead, Long Live CLI"

The phrase propagated in 2026 Q1 after three things happened almost simultaneously:

  1. Anthropic's own team published benchmarks showing that letting Claude write programmatic tool calls (i.e. bash scripts that call APIs directly) produced 98.7% token savings vs. loading the full MCP tool catalog.
  2. Vercel's CEO rauchg tweeted "CLIs are the de-facto MCPs for agents" — essentially saying that a well-designed command-line tool provides everything MCP provides, minus the protocol overhead.
  3. OpenClaw's founder (Kaili Gu, who shortly after joined OpenAI) shipped a popular agent runtime that was built entirely on Skills + CLI, no MCP. It became one of the top-starred agent frameworks of the quarter.

Chinese tech Twitter (and Xiaohongshu, and Zhihu) reacted with some version of: "Wait, we just finished standing up the MCP server, do we now need to rewrite everything as CLI?" That's where the "MCP 还没做完,又要 CLI 了" title comes from — half exhausted, half sardonic.

The question that gets lost in the meme is: what specifically is wrong with MCP, and for whom?


2. What MCP Actually Costs

Let's put real numbers on this, because the debate mostly runs on vibes.

Token consumption at a glance

Extension mechanism Tokens paid on every turn Notes
GitHub MCP Server (90+ tools) ~55,000 JSON schemas for every tool, loaded on session init
Playwright MCP Server ~16,000 (≈8% of 200K) Smaller than GitHub but still non-trivial
Claude Code Skill catalog (10 skills × ~300 chars) ~750 Just the name: description list; bodies lazy-loaded
Per-skill body (loaded on use) 500 – 10,000 Paid only when Skill(skill="...") is called
Claude Code built-in tools (Bash, Read, Edit, Grep, Glob, Write, WebFetch, WebSearch, TodoWrite, Task) ~5,000 Full JSON schemas, always present
Single CLI tool call (e.g., gh pr create) 200–500 The model already knows gh; no schema needed

A 200K-token context window is a lot. But:

  • GitHub MCP alone eats 27% of it before you start work.
  • Add Playwright MCP, a file-system MCP, a database MCP, and you're past 40% of context taken by tool schemas, leaving less room for the actual task.
  • Every one of those tokens is charged on every turn (modulo prompt caching).

The "98.7% savings" claim, demystified

Anthropic's own internal benchmark (loosely paraphrased from their team posts): a task that took 78 KB of MCP-mediated tool schemas + turns could be accomplished by writing a single bash script that calls the GitHub REST API via curl, consuming ~1 KB for the whole task. The model already knows curl; it already knows the GitHub API; it doesn't need a schema to be told about either.

The 98.7% number is cherry-picked for the specific case where the task is well-served by a CLI. It doesn't generalize. But it illustrates the direction: for tasks inside the model's pre-training distribution, the schema is redundant overhead.


2b. 思考 1 — Why Is Everyone Suddenly Doing This?

Here is 爱吉尼亚羊's own answer to their first question, paraphrased:

"This thing is like a wind that just blew in. Even DingTalk's CEO said at their recent keynote: break DingTalk apart, rewrite the underlying code, CLI-ify dozens of product lines across the business, and rebuild with AI. Let me share my personal understanding:

* The main reason is OpenClaw (龙虾 / "Lobster"). OpenClaw gave everyone a glimpse of what AGI might look like — agents can be deployed locally, can use various CLI tools the way a human does. If this is the future trend, then every SaaS / ToB / ToC product has to ship its own CLI. Future SaaS is not only human-facing but also agent-facing. No popup buttons needed. CLI + Skill is enough to cover the need.

* At the same time, CLI's progressive disclosure aligns with the current mainstream direction — it saves context and tokens."

This is a strong claim and worth sitting with: SaaS is going to have to ship CLIs because its users will increasingly be agents, not humans. The UI (buttons, forms, popup dialogs) has zero surface area for an agent; the CLI has 100%. If that prediction holds, a huge slice of existing SaaS product surface becomes dead weight, and a CLI story becomes a product requirement rather than a developer convenience.

The DingTalk example the author cites is concrete: an enterprise communications platform publicly committing to CLI-ify its entire product line because "AI agents should be able to drive it." If that becomes a pattern — Salesforce, Feishu, Notion, Airtable, Lark all shipping product-cli binaries — the question stops being "MCP or CLI?" and becomes "does your product have a CLI at all?"

The OpenClaw reference is the same one we saw in the Claude Code design philosophy post: Kaili Gu's agent runtime showed that a locally-deployed agent with CLI access + Skills could do real work without an MCP server in sight. The author is watching the same demonstration and drawing the same conclusion.


2c. 思考 2 — What Is CLI's Relationship to MCP?

The author's own answer (paraphrased):

"Both CLI and MCP can provide capabilities. MCP's advantages are standardization and better security. CLI's advantages are simplicity and progressive disclosure. Currently we see Google and other large enterprises ship both CLI and MCP. From what I'm observing: if the agent is deployed on the client side, it uses CLI; on the server side, it uses MCP. For now, the two standards coexist — it's not a replacement story."

This is the most useful sentence in the entire Xiaohongshu post, and it's the sentence I wish more of the "MCP is dead" discourse led with:

Client-side agents → CLI. Server-side agents → MCP. The two coexist.

The logic is clean:

  • A client-side agent runs on the user's machine. It has access to a shell, an OS, installed binaries, environment variables, a filesystem. Every CLI the user has installed is effectively free capability. MCP overhead is pure waste here — you're paying token tax for schemas that re-encode capability the shell already has.
  • A server-side agent runs in a datacenter, usually stateless, usually multi-tenant. It needs a standardized, auth-aware, auditable way to access tools. There's no "shell" in a meaningful sense; there's a Kubernetes cluster and a bunch of RPC endpoints. MCP's formality earns its overhead here — it gives you the governance/auth/observability story you need for production.

So the correct framing isn't "MCP vs. CLI" — it's "where does the agent live?" That determines which interface is appropriate.

This also explains why Anthropic (with Claude Code running on your laptop) leans CLI-heavy, while enterprise platforms building server-side agents are still investing in MCP. They're not disagreeing; they're optimizing for different deployment topologies.


2d. 思考 3 — What Is the Impact on Front-End / Client Engineers?

The author's own answer (paraphrased):

"This impact is significant. At least from my own experience — since I started using Cursor, I haven't needed to ask the front-end team to do anything. The future trend is definitely not going to need a lot of pages, client-side interactions, etc. You can't purely do front-end anymore. In the face of AI, technology doesn't have a moat anymore. (在AI面前技术已经没有护城河.)

Finally: although technology and products iterate every day, the big direction is clearly toward AGI.

Previously, my company had an internal agent-build platform — probably other companies had them too. Whatever the requirement, you'd build an agent. After a while there were hundreds of agents, hard to maintain, scattered everywhere. At the time I thought this form had problems. Looking back now, those built agents were essentially Skills. Combined with OpenClaw's form, it actually solves the earlier problem. I just need a general-purpose agent + Skills.

Recently I've been thinking: AI is developing toward AGI, technology's moat is shrinking — where is our opportunity? (AI往AGI方向发展,技术的护城河降低了,我们的机会在哪?)"

That last question is the one that gives the post its urgency. This isn't a neutral analysis of protocols — it's an engineer looking at their own career and trying to figure out what still matters.

The implicit argument, reading between the lines:

  1. "Agent build platforms" are dead — they were an attempt to encode capabilities as bespoke agents. In hindsight, those were all Skills, and Skills are 100× lighter to author and maintain.
  2. Front-end as a pure career is endangered — Cursor (and its successors) compress the iteration cost of UI work enough that an engineer who can also write prompts does front-end + back-end + ops without needing a specialist.
  3. The "moat" question is real — if your skill was "I know React really well," AI compresses that into a prompt. If your skill was "I know how to design a clean CLI for an agent to drive," that's the new specialty.

The future, per this analysis, belongs to people who can design the interface between products and agents — the CLI shape, the Skill content, the tool boundaries. That's a different job from classical application engineering.


2e. What the Comments Added

The comment thread surfaced several useful points beyond the post body:

Comment from "C" (03-19, Beijing, 65 likes):

"CLI 比 MCP 稳太多了,调试成本低不少 🥲"
("CLI is way more stable than MCP, debugging cost is much lower.")

This is the unglamorous truth nobody writes blog posts about: MCP servers crash, hang, version-drift, have flaky auth, and are hard to introspect. A CLI that fails tells you in stderr. An MCP server that fails returns a JSON-RPC error code you have to look up.

Reply from "时间漫游" (Time Drifter, 03-20, 13 likes):

"这俩没本质区别吧,cli 后边也不得 http 调用吗?"
("Aren't these two essentially the same? Doesn't CLI also have to make HTTP calls under the hood?")

This is the correct skeptic's response, and it deserves a direct answer: yes, both CLI and MCP eventually make HTTP calls — but the cost distribution is totally different. CLI hides the HTTP call behind a compiled binary with its own auth state; the model never sees the HTTP schema. MCP exposes the HTTP-ish surface as JSON-RPC tool schemas in the system prompt, and the model pays token cost for all of it every turn.

Same destination; very different bill.

Author's self-reply to C (03-19, Anhui):

"没错,缺点也有就是 cli 多了后维护也麻烦,我现在装了一堆。"
("True, but the downside is that once you have lots of CLIs installed, maintenance gets messy. I have a whole pile installed now.")

This is the honest cost of the CLI-first approach: it pushes complexity from protocol to installation. Every tool you want an agent to use has to be installed, versioned, updated, and configured on every machine the agent runs on. The author admits they've already accumulated "a whole pile" of CLIs — which means they've partially re-created the dependency-management problem that protocols were supposed to solve.

Reply from "gin" (03-21, Sichuan):

"替换的动力是 mcp 太消耗 context window,极大消耗 token."
("The motivation for replacing [MCP] is that MCP consumes too much context window and enormous amounts of tokens.")

gin gets to the bottom line in one sentence. The entire debate — stripped of the "AGI", "moat", and "protocol war" framing — is about who pays the token tax. MCP puts it on the model every turn. CLI puts it on installation once.


2f. Deeper into the Comment Threads — The Skeptic Camp

The post's 91-comment thread kept going. Beyond the first four, several replies push back on the "CLI wins" narrative much harder, and those pushbacks deserve a direct answer. This section walks through the substantive ones.

The pipe-composition point — Joe Black (03-21, 山东, 4 likes)

After 锦鲤而已 asks the obvious skeptical question "为啥 cli 不耗 token 呢?" ("Why doesn't CLI consume tokens?"), Joe Black answers with the single most technically important comment in the thread:

"cli 可以管道组合调用,结果不用传回来占用上下文空间"
("CLI can be composed via pipes; the intermediate results don't have to be returned to occupy context space.")

This is the part most English-language "CLI vs MCP" takes miss. The structural token difference isn't just "CLI has no schema, MCP has schema." It's also this:

Workflow MCP CLI
"Filter logs, count errors, find the worst file" 3 tool calls, each result returned to the model, each result inflates context grep ERROR *.log | sort | uniq -c | sort -rn | head -1 — one call, only the final line returns

With MCP, every intermediate result is a round-trip through the model. The model has to see the whole output, reason about it, decide what to call next, pass parameters. The context balloons with every step.

With a shell pipeline, intermediate results stay inside the shell. The 10,000-line grep output never enters context; only the final top-1 count does. For any multi-step pipeline, this compounds: 5 pipe stages = 5× less context pressure.

时间漫游's counter — "MCP could do that too"

Same user replies:

"本质是内部函数互调吧,mcp 也可以抽象一个执行 pipeline 函数出来,只需要一次调用就可以"
("Essentially this is internal function-calling, right? MCP could also abstract an execute-pipeline function — only one call needed.")

Technically correct, practically rare. Yes, you could write an MCP server that exposes a single exec_pipeline(stages: List[...]) meta-tool. But:

  1. Nobody ships MCP servers this way. In practice MCP catalogs are flat lists of single-purpose tools, because that's the design pattern the spec encourages and the tutorials demonstrate.
  2. Even if you did, you've essentially re-invented bash inside JSON-RPC — with worse debuggability, worse streaming, and none of the ecosystem of existing Unix tools.

The "MCP could do X" line is a recurring pattern in this thread. It's always technically true and almost always practically irrelevant, because the question isn't what MCP is capable of, it's what the MCP ecosystem has actually converged on shipping.

胖虎's question (03-20, 北京) — "How does CLI do progressive disclosure?"

This is the sharpest question in the thread, because "progressive disclosure" was the author's second main argument for CLI.

The replies unpack it well:

无名无字 (03-20, 上海, 20 likes):

"披露的是触发 cli 的提示词而不是 cli 本身"
("What's disclosed is the prompt that triggers the CLI, not the CLI itself.")

一颗奇异豆 (03-21, 上海):

"llm 早已内置了无数的 cli 技能,所以不用详细描述"
("LLMs already have countless CLI skills built in, so you don't need to describe them in detail.")

锦鲤而已 (03-21, 美国, 3 likes):

"都能渐进的话,cli 为啥比 mcp 好?"
("If both can do progressive disclosure, why is CLI better than MCP?")

This is a genuinely important clarification. Progressive disclosure doesn't happen at the CLI level — it happens at the Skill level. The Skill catalog advertises a short description ("use this for git operations"); the Skill body, loaded on demand, says "run git log --oneline | grep ...." The CLI itself is never "disclosed" because it doesn't need to be — the model already knows it from pre-training.

So the correct answer to 锦鲤而已's challenge is: CLI is better than MCP for progressive disclosure not because CLI discloses progressively, but because CLI requires no disclosure at all. Pre-training makes the schema free.

For custom tools that the model doesn't know — your company's internal deployer or loganalyzer CLI — progressive disclosure has to come from somewhere. In the Claude Code model, that "somewhere" is the Skill body. In the MCP model, it's the JSON Schema shipped every turn. Same information, different billing.

卡佩推's enterprise architecture sketch (03-21, 上海)

This comment is worth reading carefully because it describes a real implementation pattern:

"在企业内部上传到统一的 npm 仓库,skill 中自动维护版本,在 ai agent 所在的机器上直接用命令行调用,不需要 http,能省去每次交互时 mcp 的冗余,理论上是一种比 mcp 更优的设计吧(仅限企业内部,更通用的环境不了解,不评价)"

("Inside the company, upload [CLIs] to a unified npm repository. Versions are auto-maintained in the Skill. On the machine where the AI agent runs, invoke directly via command line — no HTTP needed. This saves MCP's per-interaction overhead. Theoretically, it's a better design than MCP (for enterprise-internal only; general-purpose environments, I'm not commenting).")

This is a concrete, shippable architecture:

  1. npm (or equivalent) as the distribution layer — solves the "pile of CLIs" problem the author admitted to. Versions pinned, updates automatic.
  2. Skill pins the CLI version — the Skill body references my-corp-deployer@^3.2; when the Skill is loaded, the agent uses the right binary.
  3. Direct CLI invocation — no HTTP, no JSON-RPC, no auth middleware on the critical path.
  4. Honest scope qualification — explicitly says "enterprise internal; I don't know about general environments."

That last point is important and often missing from both sides of this debate. A design that works for 10 internal teams at one company is not necessarily a design that works for 1,000 third-party tool authors in a public marketplace. The author of 卡佩推's comment is careful about this. Most of the "MCP is dead" crowd isn't.

小红薯's reality check (03-21, 四川)

Replying to Joe Black:

"cli 模型没有专门训练过,成功率存疑,而且 programing tool call 也可以编排 tool 调用。最大的疑问题是,一说就是这样节约 token,可是实际有没有 case 能作证这样效果确实好呢 🥲"

("CLI models haven't been specifically trained [for this use case], so success rate is questionable. And programmatic tool-calling can also orchestrate tool invocations. The biggest question: everyone says CLI saves tokens, but is there actually a published case proving it genuinely works better? 🥲")

This is the most important skeptic comment in the entire thread. The entire CLI-over-MCP narrative rests on two empirical claims:

  1. CLI consumes fewer tokens.
  2. CLI achieves equivalent or better task performance.

Claim (1) is easy to verify — count the schema bytes. Claim (2) is much harder. Anthropic's "98.7% token savings" number is from their own internal benchmark, not peer-reviewed. The Zhihu device-operations post I cited in §6 claims "equivalent outcomes," but their methodology isn't rigorous.

The honest answer is: most "CLI wins" evidence is anecdotal or from people with a stake in the narrative. 小红薯 is right to push on this. The field needs proper benchmarks — paired task sets, multiple models, measured token cost and accuracy — before anyone can claim CLI beats MCP on the merits rather than on aesthetics.

My own prior: for tasks the model was pre-trained on (git, curl, sqlite3, docker), CLI probably wins on both axes. For tasks with novel tools (your company's proprietary inventory API), the picture is less clear — the model doesn't know your CLI either, so it has to be told somehow, and Skill-body vs. MCP-schema becomes a much closer call.

The "old wine in new bottles" camp

Three comments run the strongest skepticism:

maksim0930 (03-22, 浙江):

"Skill 也能做到串一堆 mcp,如果觉得 mcp 占 context window,完全可以用 scripts 代替,scripts 想怎么写怎么写,然后将最终的结果给到 agent 即可。那是不是可以理解这个 scripts 就是 cli?如果是,那这俩有什么区别?只不过就是大厂又没事了,硬凹出来一个新名词。🥲"

("Skill can chain a bunch of MCPs too. If you think MCP eats context, just replace it with scripts — write whatever you want, hand the final result to the agent. So isn't 'scripts' just CLI? If yes, what's the actual difference between these two? This is just big tech having nothing better to do and manufacturing a new term. 🥲")

maksim0930 (again, replying to 卡佩推):

"和 http 没关系。mcp 也能做到只用本地命令。个人观点,就是新提个概念,旧酒装新瓶?"
("Has nothing to do with HTTP. MCP can also work with purely local commands. Personal view: it's just a new concept, old wine in new bottles.")

离我沙发远点 (6天前, 北京):

"本质说是省 token,规范设计的 mcp 其实也能省,所以说到底是生态之争"
("At its core, this is about saving tokens. A well-designed MCP can also save tokens. So ultimately it's an ecosystem war.")

包青蛙 (03-21, 重庆):

"那一大堆 cli 和一大堆 mcp 有啥区别,都是只披露描述,可能 cli 功能聚合一些?"
("What's the actual difference between a pile of CLIs and a pile of MCPs? Both just disclose descriptions. Maybe CLIs aggregate functionality a bit more?")

This camp deserves to be taken seriously, because they're not wrong. Let me steelman their argument:

  1. Yes, you can build a Skill that just shells out to scripts. At that point "CLI" vs "ad-hoc script" is purely a naming convention.
  2. Yes, MCP servers can also run on stdio (local IO mode), no HTTP required. Emyoya's comment elsewhere in the thread makes this point directly: "mcp 本身就提供了本地 io 调用模式" ("MCP itself already provides a local-IO invocation mode").
  3. Yes, a carefully designed MCP server could be as token-efficient as a CLI. If you ship one meta-tool instead of 90 tool schemas, your context cost collapses.
  4. Yes, the real contest is ecosystem control, not raw technical capability. Whoever controls the "right" agent interface captures the distribution layer for tool authors.

So where does the substantive difference actually live? In my read:

Dimension MCP CLI
Authoring cost Write a server (Python/TypeScript), run it, handle versioning Write a script, put it in $PATH
Model familiarity Zero (has to be told) High (pre-training)
Default packaging JSON-RPC over stdio or HTTP Binary in $PATH
Ecosystem culture Catalog of tools, formal schemas Unix toolbox, --help, man pages
Capability ceiling Equal in theory Equal in theory

The last row is where the skeptics are right: in theory, these are isomorphic. In practice, they represent different cultures and different economic structures. MCP invites you to write formal schemas, run servers, think about auth layers. CLI invites you to write scripts, expose them, move on.

离我沙发远点's "ecosystem war" framing is probably the most accurate single-sentence diagnosis of the whole debate. It's not about which protocol is better; it's about whether the agent ecosystem will consolidate around something that looks more like the npm/pypi model (protocol-mediated, central registry) or the bash/GNU coreutils model (distributed, OS-native, convention-over-standard).

MCP's irreplaceable scenarios

A separate mini-thread argues the other direction:

Leisniy (03-23, 上海):

"看具体场景,有些时候必须 MCP"
("Depends on the scenario. Sometimes MCP is mandatory.")

C (03-24, 北京):

"对 MCP 确实有不可替代的场景,比如需要实时读取外部数据源或者跨系统调度的时候。不过对大部分老板来说,先把简单的规则型流程自动化跑通再考虑 MCP,ROI 更高"

("Yes, MCP does have irreplaceable scenarios — for instance, when you need real-time reading of external data sources or cross-system orchestration. But for most bosses, automating the simple rule-based processes first and then considering MCP gives higher ROI.")

These are the right scenarios to reach for MCP:

  • Real-time data sources where you need a long-running stateful connection (database cursors, streaming APIs, subscription-based data feeds). A one-shot CLI call doesn't fit.
  • Cross-system orchestration with shared auth context across many backends. The governance layer needs to live somewhere, and a central MCP broker is a natural place.
  • Multi-tenant / user-facing agent platforms where you can't assume the agent runs on a machine you control.

C's "ROI" comment is also worth internalizing: even in companies where MCP would eventually be the right answer, it's rarely the right first answer. Automate the rule-based flows with Skills + CLI, then reach for MCP when you actually hit the wall that MCP solves.


2g. What This All Adds Up To

If I synthesize the 91-comment thread into three claims:

  1. The pipe-composition advantage is real and structural. (Joe Black) This isn't just aesthetic — the token difference is non-trivial for multi-step workflows and compounds with pipeline length.
  2. Progressive disclosure happens at the Skill layer, not the CLI layer. (无名无字, 一颗奇异豆) The CLI is never disclosed because pre-training makes it free; Skills disclose the procedural wrapper that tells the model when to reach for which CLI.
  3. The real battle is ecosystem, not protocol. (离我沙发远点, maksim0930) MCP and CLI have equal theoretical expressive power. What differs is the culture of tool authorship and the economics of distribution. Whoever captures the authoring pipeline captures the market.

And one honest question from 小红薯 that the field should answer before the meme cycle moves on: where are the benchmarks? Token-count comparisons are easy; end-to-end task performance is hard; and the aesthetic arguments on both sides have outrun the evidence.


3. Why CLI Wins for the Common Case

Four structural advantages:

(a) Pre-training distribution

Claude has seen approximately all of GitHub, all of Stack Overflow, all of man pages, all of --help output ever written. When you tell it "use git log to find the commit where this bug appeared," it just does it. No schema required. The model has the schema in its weights.

Contrast with an MCP tool that wraps the same git operation: the model needs the schema loaded into context every turn to remember that the tool exists, what its parameter names are, what it returns.

(b) Composability (Unix pipes)

CLI tools compose natively through pipes and substitution:

1
git log --oneline | grep 'bug' | head -5 | awk '{print $1}' | xargs -I {} git show {}

An MCP-tool equivalent requires either (a) one mega-tool that does all of this, (b) the agent to make five separate MCP tool calls with intermediate state, or (c) the agent to write a helper script that chains them — at which point you're just writing shell anyway.

Unix compositionality doesn't take up any extra context tokens because the model learned piping during pre-training.

(c) Transparency and debuggability

A CLI tool's behavior is: you can see the exact command the model executed. You can re-run it yourself. You can set -x it, strace it, tee its output.

An MCP tool call is a JSON-RPC message between two opaque processes. You need MCP-specific tracing, and when it fails you're debugging both the agent and the server.

(d) Mature authentication

CLI tools inherit decades of accumulated auth machinery: SSH keys, OAuth tokens, keyring integration, ~/.netrc, ~/.aws/credentials, GITHUB_TOKEN env var conventions. The model knows these conventions because it learned them.

MCP's auth story is newer, less standardized, and varies per server implementation. In 2026 you're still reading blog posts titled "How to properly auth your MCP server."


4. Why MCP Is Not Actually Dead

The "MCP is dead" meme is propagating mostly among solo developers and small teams — the group for whom CLI's advantages are cleanest. For larger organizations, the picture inverts.

The team scale threshold

A key insight from the Tencent News article on this debate: the MCP-vs-CLI argument is really an argument about organizational scale. As soon as you're building a system that:

  • Multiple teams depend on
  • Multiple tech stacks access
  • Needs audit trails
  • Needs fine-grained permissions
  • Needs centralized credentials management
  • Needs SLAs on tool availability

... MCP is actually better positioned than CLI, because:

  • MCP's HTTP-based deployment model lets you run the server centrally (behind load balancers, with monitoring)
  • MCP has a natural place to attach auth/authz/logging middleware
  • MCP's schema serves as a contract between agent versions and tool versions
  • MCP's tool namespacing (mcp__server__tool) prevents collisions when multiple teams add tools

The CLI story for the same requirements is: "just ship binaries and config files." That works at 1 developer, gets painful at 10, and is untenable at 100.

Enterprise middleware is where MCP probably settles

The most likely endgame, per most analyses:

Context Winning interface
Solo developer, prototype, disposable project CLI + maybe Skills
Small team, production agent CLI + Skills with occasional MCP for hard-to-integrate services
Enterprise platform, multi-team, audited MCP as the standardized boundary; CLI behind it
Agent marketplace / third-party integrations MCP (vendor-neutral protocol is the whole point)

So MCP doesn't die — it descends a layer, from "direct agent tool" to "middleware for enterprise integration."


5. The Three-Way Comparison: MCP vs CLI vs Skills

This is what the Xiaohongshu post is really gesturing at. In 2025 the choice was MCP vs. nothing. In 2026 it's a three-way: MCP, CLI, or Skills — and the answer depends on what exactly you're trying to do.

Dimension MCP CLI Skills
What it is Protocol for AI ↔ tool integration Existing command-line programs the model already knows Markdown file advertising a procedural capability
Primary audience AI systems + tool servers Developers (used to be); now both AI agents (via the Skill dispatcher tool)
Token cost at rest Full schema (5K–55K per server) Built-in tools only (~5K) Tiny (~100 tokens per skill description)
Token cost when used Same as at rest Each invocation: 200–500 tokens Skill body loaded lazily (500–10K)
Progressive disclosure No (schemas always present) N/A (no schema) Yes (catalog short, body lazy)
Composability Via agent orchestration Via Unix pipes (native) Via agent following markdown instructions
Auth / logging Centralized possible Per-tool (diverse) Inherits from underlying tools
Right for Enterprise integration, external APIs, multi-team Most developer tasks Procedural knowledge ("how to do X")

The key insight: they're solving different problems

Notice from the table that these three mechanisms aren't really substitutes for each other — they operate on different layers:

  • MCP standardizes the connection to external systems.
  • CLI is the execution layer — what the model types into the terminal.
  • Skills encode procedural knowledge — the methodology for doing a task.

The apparent substitution ("use CLI instead of MCP") only works because for most developer tasks, the external system (GitHub, the filesystem, Docker) already has a CLI — so the connection layer is unnecessary, and the procedural knowledge can live in a Skill body that tells the model which CLI commands to use.

A cleaner way to think about it:

MCP = USB driver (protocol for connecting to a device)

CLI = the device itself (the thing you actually operate)

Skills = the user manual (the procedural knowledge for operating it correctly)

You don't need the driver if your device already exposes a standard interface the host speaks natively.


6. A Real Benchmark: Device Operations Scenario

One of the most interesting Chinese tech blog pieces (from Zhihu) ran a real comparison: a device operations scenario where an AI agent has to diagnose a server problem, identify anomalies, and recommend spare parts.

The author implemented the same workflow three ways:

  • CLI — shell scripts and standard tools
  • MCP — a custom MCP server wrapping diagnostic APIs
  • Skills — a markdown file describing the diagnostic methodology

Results

All three methods produced equivalent outcomes:

  • Same 6 anomalies detected
  • Same spare parts recommendations
  • Same time-to-diagnosis (within noise)

Differences were entirely in cost and developer experience:

Approach Token cost per session Build time Iteration cost
CLI (shell + standard tools) Lowest 0 (tools exist) Low (edit scripts)
MCP (custom server) Highest (schema overhead) 1 week (server dev) High (server redeploy)
Skills (markdown) Low-medium 1 day (write markdown) Lowest (edit markdown)

The blog's conclusion: the three converge on the same answer because the underlying reasoning happens in the LLM's head, not in the protocol. The protocol just changes the cost structure and the developer experience.

That aligns with what we saw in Claude Code's design philosophy — the protocol is a decision surface for the model, and minimizing that surface (via progressive disclosure) is usually the right call.


7. Connecting to the Trilogy

This debate is the natural closing frame for the four-post arc we've been writing:

Post Main idea Connection
Claude Code Skills mechanism Skills = Function Calling + system prompt The wire-level proof that Skills aren't magic
Design philosophy Progressive disclosure, cognitive budget, protocol standardization The "why" behind the wire format
QED multi-agent pipeline CLI subprocesses + structured prose do the work; no framework An independent validation of the "ship shell pipes, not frameworks" bet
AReaL async RL "Just replace base_url" — standardize the protocol, not the framework The training-layer version of the same principle
This post MCP vs CLI is a debate about where you pay the token tax The industry-level view of the same architectural question

The thread that runs through all of them: the right abstraction layer for AI agents is the one that minimizes the decision surface the model sees while maximizing composability with the ecosystem that already exists.

  • Claude Code Skills got this right: lazy-loaded markdown bodies (minimize surface) + OpenAI tool-use protocol (compose with the ecosystem).
  • QED got this right: one dispatcher subprocess per vendor (minimize surface) + CLIs that already exist (compose with the ecosystem).
  • AReaL got this right: OpenAI-compatible proxy (minimize surface) + existing agent frameworks unchanged (compose with the ecosystem).
  • CLI-over-MCP gets this right: no custom schema per tool (minimize surface) + bash + Unix tools (compose with literally 40 years of ecosystem).

Where MCP still fits

MCP gets this right in enterprise contexts — where the decision surface isn't any single task's schema, but the longer-term governance surface of having 50 teams ship tools consistently. In that scope, "write docs for your custom schema" is smaller than "every team invents their own deployment story for their custom CLI tool."

The mistake wasn't MCP. The mistake was treating MCP as the universal answer when most actual agent work fits comfortably on top of existing CLIs.


8. My Take

If I'm building for myself or a small team in 2026:

  • Start with CLI. Most things I want to do (git, curl, sqlite3, docker, kubectl, the filesystem) have excellent CLIs that the model already knows.
  • Add Skills for procedural knowledge. When my task has specific methodology ("to deploy, first run tests, then bump version, then tag, then push"), that's a skill body. ~300 chars of description + 2000 chars of body > 20 MCP tools.
  • Reach for MCP only when:
    • The target system doesn't have a CLI (or the CLI is too weak)
    • I need centralized auth/logging across many agents
    • I'm building something multiple teams will use
    • The tool really benefits from a schema (e.g., complex structured outputs)

If I'm building enterprise infrastructure:

  • MCP is still the right answer for the integration boundary. Don't let the 2026 meme cycle convince you to rewrite everything as CLI — you'll regret it when you need audit trails and SOC2 compliance.
  • CLI + Skills are the right answer for everything behind that boundary. Let each team ship their internal tooling as CLIs + Skills; have the MCP server wrap the external contract.

The meta-lesson

Every 6–12 months the agent ecosystem produces a new "protocol/framework/paradigm" and the discourse flips from "this is the future" to "actually this was a mistake." MCP in late 2024 → CLI in early 2026 is just the latest cycle.

The robust strategy, for both frameworks and personal tech choices, is:

  1. Bet on what pre-dates LLMs — Unix pipes, HTTP, OpenAI-compatible APIs, markdown. These survived because they compose well; they'll outlive every agent framework.
  2. Minimize the "new surface" you're adding. Every protocol layer, every custom schema, every framework-specific DSL is a future migration cost.
  3. Watch what the largest labs are doing, not what they're saying. Anthropic saying "MCP is great" while Claude Code's system prompt reveals they use it sparingly internally is the signal.
  4. Remember that the winning interface is usually the one the LLM was trained on. In 2024 that was Python + bash. In 2026 it's still Python + bash. In 2028 it'll probably still be Python + bash.

MCP isn't dead. But it's not the center of gravity anymore, and it probably shouldn't have been sold as such. The Xiaohongshu title captures the mood: we finished building the thing we were told was essential, and now we're being told it's optional. The right response isn't to feel tricked — it's to notice the pattern, and for the next wave, wait to see what survives.


9. Back to the Author's Closing Question

爱吉尼亚羊's post ends with a question, not an answer:

AI 往 AGI 方向发展,技术的护城河降低了,我们的机会在哪?
("AI is developing toward AGI, technology's moat is shrinking — where is our opportunity?")

That question isn't rhetorical. It's genuinely open, and it's the one the whole Chinese engineering community is sitting with.

My honest attempt at an answer, consistent with everything we've seen across the five-post arc:

The moat has moved. It used to be "I know framework X really well." That's compressible by AI. It's no longer a moat.

The new moats, as far as I can see them:

  1. Interface design. The person who can specify a CLI, a Skill catalog, a tool boundary, a schema — the person who understands what surface agents should see and at what cost — is genuinely rare and genuinely valuable. This is part product, part systems, part prompt engineering, and it doesn't have an established job title yet.
  2. Evaluation taste. Telling whether an agent's output is right, fast. Watching logs, catching hallucinated citations, noticing when the model silently weakened the problem (as the QED post elaborated). This is a craft skill that's hard to automate because it requires strong priors about what correct looks like.
  3. Deployment & ops at the agent boundary. Running agent workloads reliably — token accounting, prompt-cache management, rate-limit handling, multi-model failover, resume-ability, audit trails. This is CS fundamentals applied to a new problem domain, and the demand is only going up.
  4. Domain expertise that agents still need to be told about. The narrower and more proprietary the domain, the more valuable the person who can write the Skill body. A general agent doesn't know your firm's compliance rules. A good Skill author does.
  5. Taste for what to build at all. Agents make building 10× faster, so the question "what is worth building?" gets exponentially more leverage. The person who picks well spends their speed advantage on meaningful work; everyone else just ships more garbage faster.

If you're a front-end engineer reading the author's post in panic: you weren't paid for JSX, you were paid for "making the human–product interface work." That skill has a direct translation: make the agent–product interface work. The artifacts change (CLI instead of React, Skill body instead of UI copy, audit-trail UX instead of user-facing UX), but the core competency — taste about what a user needs and how to give it to them — is the same. The user just went from carbon to silicon.


References and Further Reading

  • Title: MCP 还没做完,又要 CLI 了 — The Token-Budget Debate Behind AI Agent Interfaces
  • Author: wy
  • Created at : 2026-04-22 19:30:00
  • Updated at : 2026-04-22 16:05:03
  • Link: https://yue-ruby-w.site/2026/04/22/MCP-vs-CLI-The-Interface-Debate/
  • License: This work is licensed under CC BY-NC-SA 4.0.