@cocaxcode/token-optimizer-mcp
v0.4.6
Published
Orchestration + observability + coach layer for Claude Code token optimization. Measures tool usage, enforces budgets, advises on complementary tools (serena, RTK), and proactively surfaces savings tips.
Downloads
2,001
Maintainers
Readme
Quick Overview
An MCP server that sits between Claude Code and your tools, measuring every interaction, enforcing token budgets, and actively coaching you on features you may not be using — like /opusplan, /compact, plan mode, or model switching.
This is not a replacement for serena (symbolic file reads) or RTK (Bash output filtering). It orchestrates with them: detects whether they are installed, measures how much they save, suggests installing them when they would help, and reports everything with honest labels — splitting Medido (measured) from Estimado (estimated) so you always know what is real.
Three hooks record everything silently in a per-project SQLite database. Thirteen MCP tools let the AI (and you) query stats, set budgets, search sessions, prune unused MCPs, and get proactive coaching tips. Nine CLI subcommands let you manage everything from the terminal. All data stays on your machine — nothing is synced, nothing is tracked, nothing leaves your disk.
Works with Claude Code, Claude Desktop, Cursor, Windsurf, VS Code, Codex CLI, Gemini CLI, and any MCP-compatible client.
Just Talk to It
You don't need to memorize tool names. Just say what you need.
Know your costs
"How many tokens did I spend today?"
-> Breakdown by tool, by source, Sonnet-Opus cost range
"Show me the report for this week"
-> Per-source estimation_method labels + Medido vs Estimado split
"What optimizations am I missing?"
-> Probes serena, RTK, MCP pruning, prompt caching — shows install commandsControl your budget
"Set a budget of 50k tokens for this session in warn mode"
-> Alerts when Bash approaches the limit
"Switch to block mode at 100k tokens"
-> Bash commands blocked when budget exceeded
"What's my budget status?"
-> Spent / remaining / percent / modeGet coached
"Any tips for me?"
-> 18 tips: opusplan, /compact, plan mode, serena, RTK, skills migration...
-> Active rules detect: too many searches, huge file reads, Opus on simple tasks
"Explain the use-opusplan tip"
-> Full detail: what, when, how to invoke, estimated savings, source of the claim
"I'm at 80% context — what should I do?"
-> Coach fires: /compact now, save state with mem_save, use Agent ExploreManage MCP pruning
"Which MCPs am I not using?"
-> Generates allowlist from 14-day history: used vs inactive servers
"Apply the allowlist"
-> Writes to settings.local.json with backup — not settings.json
"Undo that"
-> Byte-identical rollback from timestamped backup
"What was the impact?"
-> Before/after average tokens per event since the last snapshotRecover after /compact
(Claude Code auto-compacts the context)
-> SessionStart:compact hook injects: recent files, recent commands, budget status
-> You pick up where you left offInstallation
Claude Code (recommended)
Step 1 — Register the MCP server:
claude mcp add --scope user token-optimizer -- npx -y @cocaxcode/token-optimizer-mcp@latest --mcpStep 2 — Install globally (required for hooks):
npm install -g @cocaxcode/token-optimizer-mcpWhy? The 3 hooks (
PreToolUse,PostToolUse,SessionStart) run vianpx @cocaxcode/token-optimizer-mcp --hook <name>. Without a global install,npxcan't find the binary and the hooks fail silently — no RTK bridge, no analytics, no compact recovery. The MCP server itself works fine withnpx -y, but hooks need the package in PATH.
Step 3 — Set up hooks:
npx @cocaxcode/token-optimizer-mcp installStep 4 — Restart Claude Code (hooks are loaded at session start).
Step 5 — Verify:
npx @cocaxcode/token-optimizer-mcp doctorPer-project analytics data is stored in {project}/.token-optimizer/ and auto-added to .gitignore.
Claude Desktop
Add to your config file (~/Library/Application Support/Claude/claude_desktop_config.json on macOS, %APPDATA%\Claude\claude_desktop_config.json on Windows):
{
"mcpServers": {
"token-optimizer": {
"command": "npx",
"args": ["-y", "@cocaxcode/token-optimizer-mcp", "--mcp"]
}
}
}Add to .cursor/mcp.json or .windsurf/mcp.json:
{
"mcpServers": {
"token-optimizer": {
"command": "npx",
"args": ["-y", "@cocaxcode/token-optimizer-mcp", "--mcp"]
}
}
}Add to .vscode/mcp.json:
{
"servers": {
"token-optimizer": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@cocaxcode/token-optimizer-mcp", "--mcp"]
}
}
}codex mcp add token-optimizer -- npx -y @cocaxcode/token-optimizer-mcp --mcpAdd to ~/.gemini/settings.json:
{
"mcpServers": {
"token-optimizer": {
"command": "npx",
"args": ["-y", "@cocaxcode/token-optimizer-mcp", "--mcp"]
}
}
}Uninstall
npx @cocaxcode/token-optimizer-mcp uninstall
npx @cocaxcode/token-optimizer-mcp uninstall --purge --confirm # also delete stored dataCoach Layer
The coach combines a static knowledge base of 18 tips with a dynamic detector of 11 rules that fire based on your recent activity.
18 tips
| Category | Tips |
|---|---|
| Model/mode | use-opusplan, use-plan-mode, use-fast-mode, default-to-sonnet, use-haiku-for-simple |
| Context | use-compact-long-session, use-clear-rename-resume, use-sessionstart-compact-hook, use-memory-save |
| Tools | use-agent-explore, use-todowrite-long-task, use-skill-trigger, install-serena, install-rtk |
| Config | use-mcp-prune, migrate-claudemd-to-skills, use-settings-local, use-prompt-caching |
Every tip includes: description, exact invocation command, when it applies, honest savings estimate, and the source of that claim.
11 detection rules
| Rule | Fires when | Suggests |
|---|---|---|
| detect-context-threshold | Context > 50/75/90% | /compact |
| detect-long-reasoning-no-code | 10+ events without edits | plan mode, opusplan |
| detect-repeated-searches | 3+ Grep/Glob in 20 events | Agent Explore |
| detect-huge-file-reads | Read > 50k tokens | install serena |
| detect-many-bash-commands | 11+ Bash in 100 events | install RTK |
| detect-opus-for-simple-task | Opus active, no edits | switch to Sonnet/Haiku |
| detect-clear-opportunity | Topic pivot detected | /rename + /clear + /resume |
| detect-post-milestone-opportunity | 5+ edits + tests + context > 40% | /compact at natural breakpoint |
Context meter (3-source fallback)
- Transcript JSONL — parses
~/.claude/projects/{key}/{session}.jsonlfor real API tokens →measured_exact - xray HTTP —
GET /sessions/{id}/tokenswith 300ms timeout →measured_exact - Cumulative DB —
SUM(tokens_estimated) + 15k baseline→estimated_cumulative
Measurement Honesty
Every event in the analytics DB carries an estimation_method tag from day one. No mixing real numbers with guesses.
| Method | Meaning | Sources |
|---|---|---|
| measured_exact | Counted directly | builtin, own, mcp, xray |
| estimated_rtk_db | Read from RTK tracking.db | RTK |
| estimated_rtk_marker | Parsed from [rtk: filtered N tokens] | RTK |
| estimated_serena_shadow | fs.stat delta vs output size | serena (opt-in) |
| estimated_cumulative | SUM from DB + baseline | coach context meter |
| reference_measured | Public data, verifiable source | reference table |
| unknown | No authoritative source | prompt caching |
Reports always split:
Resumen: Medido: 45,230 tokens · Estimado: 12,100 tokensReference data (public, verified 2026-04-11)
| Feature | Savings | Source | |---|---|---| | Model switching (opusplan / default-to-sonnet) | 60-80% cost | mindstudio.ai, verdent.ai | | Progressive disclosure skills | ~15k tokens/session | claudefast.com | | Prompt caching read hit | 10x cheaper | Anthropic docs | | Claude Code Tool Search | ~85% schema reduction | observed (77k to 8.7k) | | MCP pruning on top of Tool Search | ~5-12% per turn | internal estimate |
Complementary Tools: serena + RTK
token-optimizer-mcp does not install anything automatically. doctor detects and suggests — the user decides. These two tools work alongside token-optimizer to reduce token consumption at different levels.
serena-mcp — symbolic file reads
Instead of reading entire files (500+ lines), serena uses LSP to read only the symbols (classes, functions, methods) you actually need. Saves 20-30% on large file reads.
Step 1 — Install serena as MCP server:
# Claude Code
claude mcp add --scope user serena -- serena start-mcp-server --context=claude-code --project-from-cwd
# Or manually in ~/.claude.json → mcpServers:
{
"serena": {
"type": "stdio",
"command": "serena",
"args": ["start-mcp-server", "--context=claude-code", "--project-from-cwd"]
}
}Requires
serenainstalled:pip install serenaorpipx install serenaoruvx --from git+https://github.com/oraios/serena serena start-mcp-server
Step 2 — Register your project (required per-project):
Serena needs to know which projects to index. The first time you use serena in a project:
> "Activate this project in serena"
→ Claude calls mcp__serena__activate_project with the project path
→ Serena indexes the codebase via LSPOr create .serena/project.yml in the project root for auto-detection:
# .serena/project.yml — minimal config
name: my-projectWithout a registered project, serena returns "No active project" and cannot do symbolic reads.
Step 3 — Verify with token-optimizer:
npx @cocaxcode/token-optimizer-mcp doctorExpected output when fully configured:
[serena] ✓ conf=0.40 signals: claude-json-registered, project-registered-for-cwdIf you see ✓ but no project-registered-for-cwd, serena is installed but the current project is not registered.
How token-optimizer integrates: the optimization_status tool and doctor CLI detect serena presence across 5 signals (global settings, ~/.claude.json, project settings, local settings, project registration). The coach detect-huge-file-reads rule fires when a Read exceeds 50k tokens and suggests using serena instead.
Security note: serena includes execute_shell_command among its tools. Review the configuration before enabling.
RTK — Bash output filtering
RTK is a Rust CLI that filters and compresses command output before it reaches Claude Code. Instead of 500 lines of build output, RTK returns only errors, failures, or a compact summary. Saves 15-25% on build/test cycles.
Step 1 — Install RTK binary:
# macOS
brew install standard-input/tap/rtk
# Windows — download signed binary from GitHub releases:
# https://github.com/standard-input/rtk/releases
# Place rtk.exe somewhere in your PATH (e.g., C:\tools\rtk\)
# Verify
rtk --versionStep 2 — token-optimizer bridge (automatic via global install):
The PreToolUse hook acts as an RTK bridge — but it requires npm install -g @cocaxcode/token-optimizer-mcp (see Installation):
- Claude wants to run
git status - The hook calls
rtk rewrite "git status" - RTK returns
rtk git status(exit 0 = auto-allow) - The hook sets
updatedInputso Claude runs the RTK-wrapped version - Output is filtered before it enters the context window
This happens automatically for every Bash command — no manual rtk invocation needed.
Important: After installing, restart Claude Code. Hooks are loaded at session start — if the package wasn't installed when the session started, hooks won't fire until the next session.
RTK exit codes (all handled by the bridge):
- 0 — rewrite + auto-allow (e.g.,
npm run build→rtk npm run build) - 1 — no RTK equivalent → passthrough (command runs as-is)
- 2 — deny rule → passthrough
- 3 — rewrite + allow (e.g.,
git status→rtk git status,find→rtk find)
Both exit 0 and 3 set
permissionDecision: "allow"so Claude Code applies the rewrite. Without this field, Claude Code ignoresupdatedInput.
Step 3 — Verify with token-optimizer:
npx @cocaxcode/token-optimizer-mcp doctorExpected output when fully configured:
[rtk] ✓ conf=0.40 signals: rtk-binary-in-path, token-optimizer-bridge-activeIf you see rtk-binary-in-path but no token-optimizer-bridge-active, RTK is installed but the token-optimizer hooks are not — run npx @cocaxcode/token-optimizer-mcp install to set them up.
What RTK can wrap (partial list): ls, tree, git, gh, test, err, json, diff, grep, docker, kubectl, pnpm, dotnet, psql, aws, and more. Run rtk --help for the full list.
Security note: RTK publishes GPG-signed releases. Compiled in Rust. Open source at github.com/standard-input/rtk.
Tool Reference
Budget (3)
| Tool | Description |
|---|---|
| budget_set | Create/update a token budget (scope: session or project, mode: warn or block) |
| budget_check | Current spent / remaining / percent / mode |
| budget_report | Consumption grouped by tool and source for a period |
Session (1)
| Tool | Description |
|---|---|
| session_search | FTS5 full-text search (BM25) over session events |
Orchestration (7)
| Tool | Description |
|---|---|
| mcp_usage_stats | Token usage by tool and source |
| mcp_cost_report | Cost estimate with Sonnet-Opus range |
| optimization_status | Probe results for serena, RTK, MCP pruning, prompt caching |
| mcp_prune_suggest | Generate allowlist from history (read-only) |
| mcp_prune_apply | Apply allowlist to settings.local.json (confirm: true required) |
| mcp_prune_rollback | Restore from timestamped backup (confirm: true required) |
| mcp_prune_clear | Remove allowlist entirely (confirm: true required) |
Coach (1)
| Tool | Description |
|---|---|
| coach_tips | Active hits + full knowledge base + context measurement + reference data |
TOON (2)
| Tool | Description |
|---|---|
| toon_encode | Encode to compact JSON (token-efficient, round-trip lossless) |
| toon_decode | Decode compact JSON back to formatted object |
CLI Reference
| Command | Description |
|---|---|
| install | Register MCP server + 3 hooks in ~/.claude/settings.json |
| uninstall | Remove entries; --purge --confirm deletes stored data |
| doctor | Run all probes + schema measurer + advisor (always exits 0) |
| status | Install detection, DB path, events today, tokens by source, budget |
| report | Per-source breakdown with Medido/Estimado split + reference table |
| budget | set, get, clear from terminal |
| prune-mcp | --generate-from-history, --apply, --rollback, --clear, --impact |
| coach | status, list, explain <tip_id>, reset |
| config | get [key], set <key> <value> with dotted paths |
Hooks
| Hook | Matcher | What it does |
|---|---|---|
| PreToolUse | Bash | Checks budget (passthrough / warn), then RTK rewrite. Sets updatedInput + permissionDecision: "allow" when RTK rewrites (exit 0/3). Budget warn always wins over RTK. |
| PostToolUse | * | Async analytics to SQLite. Fire-and-forget to xray. Target p95: 10ms. |
| SessionStart | compact | Injects markdown: recent files, commands, budget. Token-capped at 2000. |
Storage
~/.token-optimizer/ # Global
├── config.json # User preferences (coach, shadow, RTK)
{project}/.token-optimizer/ # Per-project (auto-gitignored)
└── analytics.db # SQLite WAL — 8 tables + FTS5
{project}/.claude/
└── settings.local.json # MCP allowlist (prune-mcp, gitignored by default)What is NOT stored
No credentials, no secrets, no full source code. Content is truncated to 4KB, input summaries to 512B. No data leaves your machine — xray integration is opt-in and local.
Configuration
~/.token-optimizer/config.json:
{
"shadow_measurement": { "serena": false },
"rtk_integration": { "rtk_db_path": null },
"coach": {
"enabled": true,
"auto_surface": true,
"posttooluse_throttle": 20,
"sessionstart_tips_max": 3,
"context_thresholds": { "info": 0.5, "warn": 0.75, "critical": 0.9 },
"dedupe_window_seconds": 60,
"stale_tip_days": 90
}
}npx @cocaxcode/token-optimizer-mcp config set coach.enabled false
npx @cocaxcode/token-optimizer-mcp config get coach.context_thresholdsArchitecture
src/
├── index.ts # Entry — --mcp | --hook X | subcommand
├── server.ts # createServer() — registers 13 tools
├── cli/ # 9 subcommands + dispatcher
├── hooks/ # pretooluse, posttooluse, sessionstart
├── tools/ # budget, session, orchestration, coach, toon
├── services/ # analytics-logger, budget-manager, session-retriever,
│ # stats, rtk-reader, serena-shadow, xray-client
├── orchestration/ # detector (probes), schema-measurer, advisor
├── coach/ # knowledge-base (18), rules (11), detector,
│ # context-meter, reference-data, surface (dedupe)
├── lib/ # types, paths, storage, token-estimator, response
└── db/ # schema (DDL), connection (WAL), queriesStack: TypeScript 5 strict ESM · @modelcontextprotocol/sdk ^1.27 · better-sqlite3 ^11 (WAL + FTS5) · Zod 3.25 · Vitest 3.2+ · tsup · Node >=20
253 tests across 29 suites. All tools tested via InMemoryTransport.
License
MIT
