@allenwu06/tokenstack
v0.1.0
Published
Per-layer Claude Code token attribution — by MCP server, subagent, skill, and tool family. Beyond per-file.
Maintainers
Readme
tokenstack
Per-layer Claude Code token attribution — by MCP server, subagent, skill, and tool family. Beyond per-file.
Other Claude Code token trackers report cost per file or per tool. tokenstack reports cost per layer: which MCP server, which subagent, which skill, which tool family is actually eating your context budget.
══════════════════════════════════════════════════════════════════════════════
tokenstack · session cdf67cff-…
model: claude-opus-4-7
──────────────────────────────────────────────────────────────────────────────
session-level totals:
input: 87.0k
output: 3188.4k
cache_create: 12131.2k
cache_read: 374880.8k
──────────────────────────────────────────────────────────────────────────────
per-layer "result-tokens" (cost of tool returns flowing IN):
Subagents (104.2k tokens, 103 calls) ← what main thread SEES
general-purpose 101.0k (92 calls)
Trend Researcher 3.2k (11 calls)
File I/O (36.5k tokens, 150 calls)
Read 19.2k Bash 14.8k Edit 2.0k Write 575
Web 9.3k tokens, 17 calls
Skills 51 tokens, 4 calls
▼ subagent INTERNAL work (drill-into 108 .output files)
these tokens are the subagents' OWN work — NOT included
in the "Subagents" return-tokens line above:
input: 328.4k
output: 1679.9k ← what main thread DOESN'T see
cache_create: 22762.2k
cache_read: 404821.2k ← 4000× the visible return-blob!
internal tool-result tokens: 1474.0k
internal tool calls: 3850
top 5 most-expensive subagent runs (by cache_read):
a7e5784f195a9e651 cache_read= 40.1M output= 47.7k tools=192
a816f4b671e96f18f cache_read= 32.6M output= 86.5k tools=151
…
══════════════════════════════════════════════════════════════════════════════Install
As a Claude Code plugin (recommended)
# In Claude Code
/plugin marketplace add anthropics/claude-plugins-community # once
/plugin install tokenstack@claude-community # when acceptedUntil accepted into the community marketplace, install from this repo:
/plugin marketplace add https://github.com/allenwu-blip/tokenstack
/plugin install tokenstackOr load locally for development:
claude --plugin-dir /path/to/tokenstackThen in any session, invoke:
/tokenstack:budgetYou'll also get a per-turn one-line JSON summary written to
~/.claude/tokenstack/<session-id>.log automatically.
As a standalone CLI
npm install -g @allenwu06/tokenstack
tokenstack # auto-detect latest session
tokenstack ~/.claude/projects/.../<session-id>.jsonl
tokenstack --json # machine-readableOr zero-install:
npx -y @allenwu06/tokenstack(Note: the npm package is scoped to @allenwu06 because the bare name tokenstack was already taken on npm. The bin command after install is still tokenstack.)
What it measures (honestly)
- Session-level totals are exact — they're summed from the
usagefield of every assistant message in the transcript. - Per-layer "result-tokens" are an approximation — chars / 4 of each
tool_resultcontent block, attributed to the layer of the tool that produced it (MCP server / subagent / skill / file I/O / web / other). Real BPE varies, so treat layer numbers as a relative-comparison signal, not a billing number. - Subagents are reported on TWO axes: (a) the result blob flowing back from a
Task/Agentcall (shown in the "Subagents" line, attributed bysubagent_type), and (b) the subagent's own internal work (shown in the "subagent INTERNAL work" section, aggregated across all.outputfiles in/private/tmp/claude-*/.../tasks/). The internal work is typically 1000–4000× larger than the return blob — that's the gap other trackers miss. - MCP servers are pulled from the
mcp__<server>__<tool>naming convention. - Skills are attributed via the
Skilltool'stool_resultsize — note this captures the skill's return cost, not the skill's expanded-prompt content (which is delivered via system reminders, not tool_results). This is a known v1 gap.
How it works
- Every Stop hook fires at the end of a turn.
- The hook receives the session's
transcript_pathand parses the JSONL. - For each
tool_usein an assistant message, we record its ID and layer. - For each
tool_resultin a user message, we attribute its content size to the matching tool's layer. - Summary is appended to
~/.claude/tokenstack/<session-id>.log(one JSON per turn). /tokenstack:budgetre-parses on demand and prints the full table.
No network. No telemetry. The plugin writes only to ~/.claude/tokenstack/ on your machine.
Roadmap
- v0.1: main-thread attribution + Stop hook +
/budgetslash command. ✅ - v0.2 (current): drill-into-subagent-internals — parses each subagent's own
.outputJSONL and surfaces aggregate internal-work cost + top-N most-expensive subagent runs. ✅ - v0.3: HTML report export + per-subagent-type drill-down (map .output files to subagent_type via timestamp/content matching).
- v0.4: predictive pre-tool warnings — PreToolUse hook that blocks expensive Reads before they happen.
- v1.0 (paid tier): team / multi-user rollup, per-org cost attribution dashboard hosted at tokenstack.dev.
The OSS plugin is free under MIT, forever. The paid tier — when it exists — is a hosted multi-user dashboard, not a feature gate on the local tool.
Why drill-into matters
Main-thread attribution sees only the return-blob of a subagent call (typically a few thousand tokens). The subagent itself may have run hundreds of internal tool calls, read megabytes of context, and burned tens of millions of cache_read tokens on its own. In real sessions we've measured subagent internal work running 1000–4000× larger than the return-blob that the main thread sees. tokenstack v0.2 is the only Claude Code token tracker that surfaces this gap.
License
MIT. See LICENSE.
Feedback
Open an issue at https://github.com/allenwu-blip/tokenstack/issues. Real misattributions and missing layers are the most useful thing you can report.
