@allenwu06/tokenstack

v0.1.0

Published

21 days ago

Per-layer Claude Code token attribution — by MCP server, subagent, skill, and tool family. Beyond per-file.

0High
0Medium
0Low

allenwu06

claude-code claude-code-plugin tokens cost-attribution mcp subagent observability

tokenstack

Per-layer Claude Code token attribution — by MCP server, subagent, skill, and tool family. Beyond per-file.

Other Claude Code token trackers report cost per file or per tool. tokenstack reports cost per layer: which MCP server, which subagent, which skill, which tool family is actually eating your context budget.

══════════════════════════════════════════════════════════════════════════════
  tokenstack  ·  session cdf67cff-…
  model: claude-opus-4-7
──────────────────────────────────────────────────────────────────────────────
  session-level totals:
    input:              87.0k
    output:           3188.4k
    cache_create:    12131.2k
    cache_read:      374880.8k
──────────────────────────────────────────────────────────────────────────────
  per-layer "result-tokens" (cost of tool returns flowing IN):

  Subagents  (104.2k tokens, 103 calls)        ← what main thread SEES
    general-purpose          101.0k  (92 calls)
    Trend Researcher           3.2k  (11 calls)
  File I/O   (36.5k tokens, 150 calls)
    Read       19.2k  Bash 14.8k  Edit 2.0k  Write 575
  Web         9.3k tokens, 17 calls
  Skills        51 tokens, 4 calls

  ▼ subagent INTERNAL work (drill-into 108 .output files)
     these tokens are the subagents' OWN work — NOT included
     in the "Subagents" return-tokens line above:
     input:             328.4k
     output:           1679.9k       ← what main thread DOESN'T see
     cache_create:    22762.2k
     cache_read:      404821.2k      ← 4000× the visible return-blob!
     internal tool-result tokens:  1474.0k
     internal tool calls:             3850
     top 5 most-expensive subagent runs (by cache_read):
       a7e5784f195a9e651     cache_read= 40.1M  output= 47.7k  tools=192
       a816f4b671e96f18f     cache_read= 32.6M  output= 86.5k  tools=151
       …
══════════════════════════════════════════════════════════════════════════════

Install

As a Claude Code plugin (recommended)

# In Claude Code
/plugin marketplace add anthropics/claude-plugins-community   # once
/plugin install tokenstack@claude-community                   # when accepted

Until accepted into the community marketplace, install from this repo:

/plugin marketplace add https://github.com/allenwu-blip/tokenstack
/plugin install tokenstack

Or load locally for development:

claude --plugin-dir /path/to/tokenstack

Then in any session, invoke:

/tokenstack:budget

You'll also get a per-turn one-line JSON summary written to ~/.claude/tokenstack/<session-id>.log automatically.

As a standalone CLI

npm install -g @allenwu06/tokenstack
tokenstack                                    # auto-detect latest session
tokenstack ~/.claude/projects/.../<session-id>.jsonl
tokenstack --json                             # machine-readable

Or zero-install:

npx -y @allenwu06/tokenstack

(Note: the npm package is scoped to @allenwu06 because the bare name tokenstack was already taken on npm. The bin command after install is still tokenstack.)

What it measures (honestly)

Session-level totals are exact — they're summed from the usage field of every assistant message in the transcript.
Per-layer "result-tokens" are an approximation — chars / 4 of each tool_result content block, attributed to the layer of the tool that produced it (MCP server / subagent / skill / file I/O / web / other). Real BPE varies, so treat layer numbers as a relative-comparison signal, not a billing number.
Subagents are reported on TWO axes: (a) the result blob flowing back from a Task/Agent call (shown in the "Subagents" line, attributed by subagent_type), and (b) the subagent's own internal work (shown in the "subagent INTERNAL work" section, aggregated across all .output files in /private/tmp/claude-*/.../tasks/). The internal work is typically 1000–4000× larger than the return blob — that's the gap other trackers miss.
MCP servers are pulled from the mcp__<server>__<tool> naming convention.
Skills are attributed via the Skill tool's tool_result size — note this captures the skill's return cost, not the skill's expanded-prompt content (which is delivered via system reminders, not tool_results). This is a known v1 gap.

How it works

Every Stop hook fires at the end of a turn.
The hook receives the session's transcript_path and parses the JSONL.
For each tool_use in an assistant message, we record its ID and layer.
For each tool_result in a user message, we attribute its content size to the matching tool's layer.
Summary is appended to ~/.claude/tokenstack/<session-id>.log (one JSON per turn).
/tokenstack:budget re-parses on demand and prints the full table.

No network. No telemetry. The plugin writes only to ~/.claude/tokenstack/ on your machine.

Roadmap

v0.1: main-thread attribution + Stop hook + /budget slash command. ✅
v0.2 (current): drill-into-subagent-internals — parses each subagent's own .output JSONL and surfaces aggregate internal-work cost + top-N most-expensive subagent runs. ✅
v0.3: HTML report export + per-subagent-type drill-down (map .output files to subagent_type via timestamp/content matching).
v0.4: predictive pre-tool warnings — PreToolUse hook that blocks expensive Reads before they happen.
v1.0 (paid tier): team / multi-user rollup, per-org cost attribution dashboard hosted at tokenstack.dev.

The OSS plugin is free under MIT, forever. The paid tier — when it exists — is a hosted multi-user dashboard, not a feature gate on the local tool.

Why drill-into matters

Main-thread attribution sees only the return-blob of a subagent call (typically a few thousand tokens). The subagent itself may have run hundreds of internal tool calls, read megabytes of context, and burned tens of millions of cache_read tokens on its own. In real sessions we've measured subagent internal work running 1000–4000× larger than the return-blob that the main thread sees. tokenstack v0.2 is the only Claude Code token tracker that surfaces this gap.

License

MIT. See LICENSE.

Feedback

Open an issue at https://github.com/allenwu-blip/tokenstack/issues. Real misattributions and missing layers are the most useful thing you can report.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme