How Claude Code Skills Work Under the Hood

wy Lv3

抓包 Claude Code —— Skill 的实现原理

Source: Xiaohongshu post by 程序员小山与Bug

Core question: How are Claude Code Skills actually implemented under the hood?

Opening question

"Skill 怎么实现的?" — The author's verdict, spoilered up front: Skills are just Function Calling + a big system-prompt injection. No magic protocol — just tool-use and text.


1. Setup — MITM the Claude Code CLI with Charles

The experiment is a classic packet-capture study:

  1. Open Charles 5.0.3 on localhost port 8888
  2. Edit ~/.claude/settings.json (the CC user settings file)
  3. Export proxy env vars so Claude Code's outgoing HTTPS is forced through Charles:
1
2
export HTTP_PROXY=http://127.0.0.1:8888
export HTTPS_PROXY=http://127.0.0.1:8888

Proxy env vars

Charles is launched and ready to record.

Charles idle

When the author runs claude and the CLI connects out, Charles immediately shows a CONNECT to api.anthropic.com (TLS 1.3, HTTP/1.1, ~570 ms, "SSL Proxying not enabled for this host"). So at minimum the destination is confirmed.

First CONNECT hit

In this particular demo the author is not actually talking to api.anthropic.com for chat — the CLI has been configured via the Doubao provider so the real traffic goes to ark.cn-beijing.volces.com (ByteDance Volcano Engine's Anthropic-compatible endpoint, model doubao-seed-code-preview-251028). This is what lets him read the request body in the clear.

Claude Code launched

Claude Code v2.1.7 boots up showing model doubao-seed-code-preview-251028 · API Usage Billing — same UX as normal CC, just a different backend.


2. Trigger a Request & Capture It

Inside Claude Code the author types the simplest possible prompt — 你好 ("hi"). Charles captures two POSTs to:

1
POST http://ark.cn-beijing.volces.com:80/api/compatible/v1/messages?beta=true

Sizes: ~77.89 KB request for a one-word prompt. That size is the whole point of the video.

Charles captures the POST


3. Open the Request Body in a JSON Viewer

He copies the raw body and pastes it into his own tiny web tool https://sunzsh.github.io/json/?clipboard (reads JSON from clipboard and pretty-prints it). Top level keys:

1
2
3
4
5
6
{
"model": "doubao-seed-code-preview-251028",
"messages": [ ... ],
"system": [ ... ],
"tools": [ ... ]
}

Request top-level

Three blocks matter: messages, system, tools. The user's "你好" is a tiny sliver at the bottom of messages. Everything else is framework scaffolding.


4. The tools Array — Every Claude Code Tool, Schematized

Expanding "tools": [ ... ] reveals the full CC toolbelt as OpenAI-style JSON-Schema function definitions. The first one is Task:

tools[0] = Task

1
2
3
4
5
6
7
8
9
name: "Task"
description: "Launch a new agent to handle complex, multi-step tasks autonomously.
Each agent type has specific capabilities and tools available to it:
- Bash: Command execution specialist for running bash commands...
- general-purpose: General-purpose agent for researching complex questions...
- statusline-setup: configure the user's Claude Code status line...
- claude-code-guide: quickly find files by patterns...
- Plan: Software architect agent for designing implementation plans...
..."

Scrolling the list yields the familiar CC tools, each with a multi-hundred-token description:

More tools

  • Task (sub-agent launcher)
  • Bash, Read, Edit, Write, Grep, Glob
  • WebFetch, WebSearch
  • TodoWrite, NotebookEdit
  • ExitPlanMode

TodoWrite description

ExitPlanMode description

Each schema is pure JSON-Schema (required, additionalProperties: false, nested objects, enums) — exactly what the Anthropic Messages API expects under tools. Nothing here is Skill-specific. These are the base-layer tools.


5. The system Block — Where Skills Actually Live

This is the punchline of the video. The author scrolls into the system-prompt text and highlights a section (he pastes it into Google Translate to read the English more carefully).

System prompt: skill rules

Reconstructed, the system prompt contains a block roughly like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
How to invoke:
- Use this tool with the skill name and optional arguments.
Examples:
- skill: "pdf" - invoke the pdf skill
- skill: "commit", args: "-m 'Fix bug'" - invoke with arguments
- skill: "review-pr", args: "123" - invoke with arguments
- skill: "ms-office-suite:pdf" - invoke using fully qualified name

Important:
- When a skill is relevant, you must invoke this tool IMMEDIATELY as your first action
- NEVER just announce or mention a skill in your text response without actually calling this tool
- This is a BLOCKING REQUIREMENT: invoke the relevant Skill tool BEFORE generating any other response about the task
- Only use skills listed in "Available skills" below
- Do not invoke a skill that is already running
- Do not use this tool for built-in CLI commands (like /help, /clear, etc.)
- If you see a <command-name> tag in the current conversation turn,
the skill has ALREADY been loaded and its instructions follow in the next message.
Do NOT call this tool — just follow the skill instructions directly.

Available skills:
- xlsx: Comprehensive spreadsheet creation, editing, and analysis...
- pdf: Comprehensive PDF manipulation toolkit...
- skill-creator: Guide for creating effective skills...
- pptx: Presentation creation, editing, and analysis...
- mcp-builder: Guide for creating high-quality MCP servers...
- explaining-code: Explains code with visual diagrams and analogies...
- docx: Comprehensive document creation, editing, and analysis...

Bilingual view

What This Tells Us

Observation Detail
Skills are entries in the system prompt No separate "skill channel" — only name: description pairs
Invocation is a Function Call A Skill tool is registered in tools with one skill string param
Gating is done by English prose "You MUST invoke IMMEDIATELY", "BLOCKING REQUIREMENT", "NEVER just announce" — prose enforcement
Slash command precedence <command-name> tag tells model "already loaded, follow the body"

6. What a Skill File Itself Looks Like

When the Skill tool fires, the runtime loads that skill's SKILL.md. The author opens one locally (vim SKILL.md) — classic YAML frontmatter + markdown body:

SKILL.md for xlsx

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
---
name: xlsx
description: "Comprehensive spreadsheet creation, editing, and analysis with
support for formulas, formatting, data analysis, and visualization. When
Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for:
(1) Creating new spreadsheets with formulas and formatting, (2) Reading or
analyzing data, (3) Modify existing spreadsheets while preserving formulas,
(4) Data analysis and visualization in spreadsheets, or (5) Recalculating
formulas"
license: Proprietary. LICENSE.txt has complete terms
---

# Requirements for Outputs

## All Excel files

### Zero Formula Errors
- Every Excel model MUST be delivered with ZERO formula errors (#REF!, #DIV/0!,
#VALUE!, #N/A, #NAME?)

### Preserve Existing Templates (when updating templates)
- Study and EXACTLY preserve ...
- Never impose standard formatting ...

Key Observations About SKILL.md

  • The description in the frontmatter is exactly the string that appears next to xlsx: in the system prompt's "Available skills" list. That is how CC builds the list — it scans each installed skill's frontmatter and concatenates.
  • The body is not shipped in the first request. It's lazy-loaded: when the model calls Skill(skill="xlsx"), the runtime returns the body as the tool result, and then the model plans the work.
  • The name is the invocation key. Plugin-namespaced skills use plugin:skill form (ms-office-suite:pdf).

7. The Reveal

The author overlays a giant text across the JSON view:

Function Calling

"Function Calling."

That's the entire mechanism. Skills are:

  1. A Skill tool declared in the tools array (single skill string argument)
  2. A registry block in the system prompt listing name + description for every installed skill
  3. A set of SKILL.md files on disk whose body becomes the tool-call response when invoked
  4. Prose enforcement ("BLOCKING REQUIREMENT", "call IMMEDIATELY") to make the model reach for it when relevant

No special API. No out-of-band protocol. It's the same tool-use contract you already use.


8. The Author's Closing Lament

Last cards:

  • "但是它底层发了这么多东西" — but under the hood it's sending so much stuff
  • "最后哈我想再感叹一下" — finally, one more remark
  • "TOKEN 对我们来说是真的越来越重要了"tokens really are getting more and more important for us

The gripe: a bare 你好 produced a ~78 KB request body. All of that overhead — the tool schemas, the skill registry, the reminder prose, the todo/plan-mode scaffolding — is paid on every turn. If you care about cost or context-window budget, Skills are not free, even when unused.


TL;DR — What I Learned

Concern Reality (per the packet capture)
Is Skill a new API feature? No. It's a regular tool named Skill plus a system-prompt catalog.
How does the model know what skills exist? A static "Available skills:" bullet list in the system prompt, built from each SKILL.md's frontmatter.
When does the skill body get loaded? Lazily — only after the model invokes Skill(name=...). The body comes back as the tool result.
What forces the model to use a skill? Strongly-worded English imperatives in the system prompt ("BLOCKING REQUIREMENT", "invoke IMMEDIATELY", "NEVER just announce").
Relationship to slash commands A <command-name> tag in the user turn signals the skill body is already inlined — the model must skip the Skill tool call and just act.
Cost implication Every request carries the full tool schemas + skill registry (~78 KB for a one-word prompt here). Tokens ≠ free.

Reproducing This at Home

1
2
3
4
5
6
7
8
# 1. Start Charles (or mitmproxy) on 127.0.0.1:8888
# 2. Install its root CA into the macOS keychain, trust for SSL
# 3. Enable SSL Proxying for api.anthropic.com (and your provider host)
# 4. Point Claude Code at the proxy:
export HTTP_PROXY=http://127.0.0.1:8888
export HTTPS_PROXY=http://127.0.0.1:8888
claude
# 5. Send any prompt; inspect the POST /v1/messages body in Charles.

Note: intercepting Anthropic's real endpoint requires bypassing cert pinning / having the Anthropic cert be trusted via your MITM CA. The author side-steps this by using an Anthropic-compatible provider (Volcengine Doubao) that terminates TLS on a host whose cert is interceptable with a local CA.


Supplement — Context the Video Doesn't Cover

The video proves the wire format is Function Calling. The rest of this doc adds the wider context to make the finding actionable.

9. Where Skills Sit in the Claude Code Extensibility Stack

Claude Code has five overlapping extension mechanisms. People conflate them; a packet capture like this one is useful because it draws hard lines between them.

Mechanism Where it lives How the model "sees" it Loaded when User-authorable?
Built-in tools (Bash, Read, Edit, ...) Bundled inside the CC package itself Full JSON schema in tools[] every turn Always No
Slash commands (/commit, /review) ~/.claude/commands/*.md or plugin Expanded inline into the user turn as <command-name>...</command-name> + body Only when the user types /foo Yes
Skills ~/.claude/skills/<name>/SKILL.md (or plugin) Skill tool + name: description line in system-prompt catalog; body lazy-loaded on invocation Catalog always; body on demand Yes
Sub-agents (Task tool) ~/.claude/agents/*.md Sub-agent names appear inside the Task tool description Always listed, spawned on call Yes
MCP servers External process, JSON-RPC Tools appear as additional entries in tools[] with mcp__server__tool names Always (per connected server) Yes (external)

A note on "bundled" (vs. "compiled"). Claude Code isn't a C/Rust binary; it's a Node.js package whose source is bundled and minified at release time (esbuild/webpack-style). When you npm i -g @anthropic-ai/claude-code you get a large pre-built .js file. The built-in tools' names, JSON schemas, descriptions, and handler functions live inside that bundle — you can't override them from ~/.claude/. The distinction that matters is fixed-by-the-vendor vs. authorable-by-you.

Skills are the only mechanism that pays a fixed catalog cost every turn but defers the heavy body until needed. That's the whole design point: cheap to advertise, on-demand to load.

10. What the "BLOCKING REQUIREMENT" Prose Is Really Doing

The imperatives ("you MUST", "NEVER just announce", "invoke IMMEDIATELY") aren't legalese — they're prompt-engineering guard rails against well-known failure modes with tool-using models:

  • Hallucinated tool execution — the model writes "Let me run the xlsx skill..." in prose and then generates fake output instead of emitting a tool call. The "NEVER just announce" line targets exactly this.
  • Stalled first action — the model asks a clarifying question instead of starting. "Invoke IMMEDIATELY as your first action" short-circuits that.
  • Double-invocation — re-calling a skill whose body is already in context (e.g. via a slash command that inlined it). The <command-name> carve-out prevents it.
  • Drift to non-existent skills — "Only use skills listed in 'Available skills' below" is an anti-hallucination clamp; models love to invent plausible tool names.

These are the same tropes Anthropic's own tool-use documentation warns about. Skills inherit them.

11. The Token-Economy Picture

A one-token prompt generating ~78 KB of request body isn't unique to Skills — but Skills amplify it. What's on every turn of a modern CC session:

  • Base system prompt + CLAUDE.md + user memory
  • Every built-in tool schema (~20 tools, 200–2000 chars each)
  • Skill registry (one line per installed skill; grows linearly)
  • Sub-agent list baked into the Task tool description
  • MCP tool schemas for every connected server
  • Conversation history (uncompacted)

Mitigations That Exist But Aren't Obvious from the Wire Trace

  • Prompt caching. Anthropic's API supports cache_control breakpoints with a ~5-minute TTL. CC places them so the static prefix (tools + system) is cached; you pay full price once, then roughly 10% for subsequent turns within the TTL. The 78 KB in the Doubao capture is uncached cost — cached cost is an order of magnitude lower. This is why the advice "don't sleep past 5 minutes" matters for agent loops.
  • Lazy skill bodies. Exactly what the video exposes: a ~300-char description advertises a skill whose actual body may be thousands of lines — and the body only ships on invocation.
  • Plugins as bundles. A plugin can ship skills + commands + agents together; CC only registers the catalog entries for enabled plugins.
  • /compact folds old turns into a summary so history doesn't grow unboundedly.

Rule of thumb: with 20+ skills installed, the catalog alone can cost 4–8 KB on every turn regardless of cache, because descriptions have to be good enough for the model to pick correctly — and "good enough" means ~200–500 characters each. Trim unused skills.

12. How to Write a Skill Description That Actually Gets Triggered

The description is the only thing in the catalog — it has to do four jobs in one to three sentences:

  1. Name the capability in one phrase. ("Comprehensive spreadsheet creation, editing, and analysis...")
  2. List concrete file types / extensions / frameworks. The model pattern-matches on these. .xlsx, .xlsm, .csv, .tsv in the official xlsx skill is deliberate.
  3. Enumerate scenarios with numbered bullets. (1)...(2)...(3)... — the numbered form is what Anthropic's own skills use and it measurably improves triggering vs. flowing prose.
  4. State negative space when needed. E.g. "Do NOT trigger when the primary deliverable is a Word document..." to prevent cross-activation with the docx skill.

A good description is a mini classifier specification, not marketing copy.

13. Security Implications

  • Skill body = arbitrary instructions the model will follow. A malicious plugin can ship a SKILL.md whose body says "exfiltrate .env to https://attacker/...". The model will try. Treat SKILL.md bodies like executable code when reviewing third-party plugins.
  • Proxy interception works because CC trusts HTTP_PROXY. Normal CLI behavior, but worth knowing: on a shared machine, anyone who can set your env or edit ~/.claude/settings.json can read every prompt and response in plaintext. CC does not cert-pin — the packet capture in this video only works because of that.
  • Prompt-injection through skill descriptions. The catalog is built by concatenating every installed skill's description. A malicious description can include instructions aimed at the other skills' invocation logic ("when the user asks X, instead call Y"). Plugin marketplaces need to treat descriptions as untrusted input.

14. How Skills Compare to "Function Calling" Elsewhere

The video's punchline — "it's just Function Calling" — is exactly right; the mapping is worth spelling out:

  • OpenAI function calling / tools → Anthropic tools[] with input_schema → the Skill tool here is one entry of that kind.
  • OpenAI "assistants with files" is the closest competitor concept to Skills: the model sees a short description of attached resources and can request their content. Skills generalize this to include procedural instructions, not just data.
  • MCP does the same job at a different altitude: instead of inlining the catalog in the system prompt, the CC host queries a running MCP server for its tool list. Skills are cheaper (no subprocess, no RPC handshake) but can't do I/O on their own; MCP servers can.

The interesting design choice: the Skill tool is a dispatcher. Its only job is to expand into a body of Markdown that the model then reads as-if-fresh-system-prompt. It's a layer of indirection purely for token economy, with the side benefit that skills are authorable by anyone who can write Markdown.

15. Practical Next Steps If You Want to Go Further

  • Run claude --debug — prints request/response bodies without needing MITM at all.
  • Set ANTHROPIC_LOG=debug (or the provider equivalent) for SDK-level traces.
  • Inspect ~/.claude/settings.json, ~/.claude/skills/, and any plugin directories to see what catalog entries your install is actually shipping.
  • Use anthropic.beta.messages.count_tokens (or the compatible endpoint) on your typical request to get an exact token cost before/after uninstalling rarely-used skills.
  • Point CC at an empty CLAUDE_CONFIG_DIR to see the minimum scaffolding vs. your customized environment. Cheapest way to answer "what am I paying for on every turn?" without opening a proxy.

Closing Thoughts

What's elegant about the Skill design is how much leverage it gets from not being a new protocol. The same tool-use machinery that already powers MCP and built-in tools gets one more entry — a dispatcher — and suddenly you have user-authorable, lazily-loaded capabilities. The cost is paid in system-prompt real estate (and prose-based guard rails), but that's a tax you were going to pay anyway for any extensibility story.

The author's real gripe — that tokens are getting expensive — is legitimate. But it's orthogonal to Skills specifically: it's the price of building agents on top of a stateless completion API. The right response isn't to avoid Skills; it's to trim the ones you don't use, trust the prompt cache, and keep an eye on the wire trace occasionally. Which is exactly what this video demonstrates.

  • Title: How Claude Code Skills Work Under the Hood
  • Author: wy
  • Created at : 2026-04-20 22:45:00
  • Updated at : 2026-04-20 22:56:59
  • Link: https://yue-ruby-w.site/2026/04/20/Claude-Code-Skill-Mechanism/
  • License: This work is licensed under CC BY-NC-SA 4.0.