ultracost
v0.4.5
Published
Per-stage model routing for Claude Code dynamic workflows (ultracode). Quality-first policy, CLAUDE.md rule injection, and a workflow-script guard that catches subagent stages that would silently inherit Opus.
Maintainers
Readme
Stop a single ultracode fan-out from running dozens of subagents on Opus by accident.
When ultracode is on, the session is pinned to Opus @ xhigh and a single dynamic
workflow fans out to dozens of subagents that inherit that session model unless every
stage is pinned. ultracost makes per-stage routing explicit, policy-driven, and
verifiable — quality-first, so coding and reasoning stay on Opus while pre-planned
mechanical work drops to Sonnet. No telemetry. No network on the hot path. MIT.
Built for
ultracode(Opus @xhighdynamic workflows) — the only place this fan-out happens. ultracost routes by tier (opus/sonnet), not a pinned version, so it tracks whatever Opus your session runs.
Security & trust. ultracost has zero runtime and dev dependencies, so there is no
supply chain to compromise — Snyk Open Source and npm audit report 0 vulnerabilities.
Releases publish to npm with OIDC Trusted Publishing and signed provenance, every
GitHub Action is pinned to a commit SHA, and CodeQL + OpenSSF Scorecard run in CI.
The installer touches only its own files and is fully reversible. See SECURITY.md.
Install
Claude Code plugin (recommended):
/plugin marketplace add danielkremen818/ultracost
/plugin install ultracost@ultracostnpm CLI (CI / scripting):
npx ultracost initVia ClaudePluginHub (one command — adds the marketplace and installs the plugin):
npx claudepluginhub danielkremen818/ultracost --plugin ultracostFirst command, in Claude Code: /ultracost:check ./path/to/workflow.js — flag any
agent() stage that would silently inherit Opus. Every verb is also a slash command, so
plugin users need nothing on PATH.
The problem
When ultracode is on, Claude Code runs the session on Opus @ xhigh (the only model that supports xhigh) and auto-orchestrates dynamic workflows that fan out to dozens — up to 1,000 — subagents. Two defaults compound:
- Subagents inherit the session model. No per-stage override → every stage runs on the session's Opus.
- The built-in workflow guidance tells Claude to omit the per-agent model. So inheritance wins.
The documented result: one prompt spawning 46 Opus subagents and ~3M tokens with no warning. A grep sweep and a per-file verifier do not need Opus. (Why ultracode makes this worse →)
The evidence: nobody pins a stage
This is the default behavior, not user error. In a scan of ~22 real ultracode workflow scripts, almost none pinned model: on any stage — even Anthropic's own bundled deep-research workflow pins zero. Reproduce it on your own history in one command:
npx ultracost audit ~/.claude/projectsWhat ultracost does
- A quality-first policy. Coding and reasoning stay on Opus @ xhigh; pre-planned mechanical work and search/collection drop to Sonnet; Haiku is never used. You own it in one JSON file.
- Always-on routing guidance. A
SessionStarthook injects the policy as context at the start of every session (and after compaction) — present when Claude authors a workflow, no reliance on the model opening a skill. - The Workflow Guard. A static analyzer that flags any
agent()stage missing amodel:pin. Run it by hand, via/ultracost:check, or in CI. No other tool does this. - A pre-flight cost gate. A default-on
PreToolUsehook estimates every workflow launch and pauses (or denies) it before a single subagent runs. - A closed loop. It reads its own runs back from local transcripts to reconcile, calibrate, and tally savings — offline.
Architecture
One shared core in src/, two delivery surfaces (a Claude Code plugin and an npm CLI), a runtime verification layer (the guard + cost gate), and a closed loop — all compiled from the same policy.json.
The plan lives in data (policy.json), not prose buried in a prompt. The guard is the layer the model can't talk its way out of. Full picture: docs/architecture.md.
The Workflow Guard
ultracost check ./wf.js # or /ultracost:check in Claude Code| Code | Meaning | Severity |
|------|---------|----------|
| UC001 | agent(x) with no options object | error |
| UC002 | options object present, no model | error |
| UC003 | model resolves to a banned model (e.g. haiku) | error |
| UC004 | model: 'inherit' while allowInherit is false | error |
| UC005 | model/options is a dynamic expression — can't verify | warning |
| UC006 | the pin mismatches the work the prompt describes | warning |
| UC007 | effort exceeds the model's cap (e.g. sonnet @ xhigh) | warning |
| UC008 | an alwaysOpus role (orchestrator, …) pins a cheaper tier | warning |
The scanner runs on a hand-rolled, zero-dependency JS tokenizer — robust to template literals, spreads, optional-call agent?.(), and dynamic values; an agent( inside a prompt or comment is prose, never a call. Fan-out detection covers .map/.flatMap/forEach/for…of/Promise.all/Array.from/pipeline. --json for CI, --fix to auto-pin the unambiguous cases (UC001/UC002), --quiet for problems only. Only UC001–UC004 fail the build.
Cost estimate, effort, and the pre-flight gate
ultracost estimate ./wf.js # agents, model mix, tiered vs all-opus
ultracost simulate ./wf.js # all-opus vs your tiered pins vs all-sonnet- Official-sourced pricing. Prices live in
policy.jsonwith a_sourceURL and_asOfdate;ultracost pricing refreshre-fetches Anthropic's official page. The estimate itself runs offline. - Dynamic effort. Each stage gets the lowest effort that fits (
low→xhigh), bounded by model (sonnetup tohigh,opusup toxhigh), and effort feeds the estimate. - Pre-flight gate (on by default, hard in every mode). A deterministic
PreToolUsehook on theWorkflowtool runs the guard + estimate and leads with⚠ N/M stage(s) NOT pinned → will inherit Opuswhen stages are unpinned. It asks (with the estimate) indefault/acceptEdits/autoand auto-denies an unpinned or over-budget launch inbypassPermissions/dontAsk.ULTRACOST_GATE=strict|ask|offoverrides it.
Estimates are relative (tiered vs all-opus), not a bill; fan-outs are ranges. Full detail and the gate's #52343 limitation: docs/ESTIMATES.md.
The closed loop: measure, reconcile, calibrate
ultracost reads its own results back — parsing your local transcripts (offline, no telemetry) and attributing tokens per workflow stage via the subagents/workflows/wf_*/agent-*.jsonl files Claude Code writes. No other router does this.
ultracost usage # real cost: main loop vs subagents vs workflow stages
ultracost reconcile --last # estimate vs ACTUAL, per stage, for your latest run
ultracost calibrate # learn a token prior from your runs (the estimate uses it)
ultracost ledger # cumulative $ saved vs all-opus, persisted- Self-calibrating.
calibratelearns real per-stage token sizes (outlier-filtered);estimate,explain,simulate, and the gate use it automatically — closer to your reality every run. - Automatic on Stop. A
Stophook runs this loop (reconcile + calibrate + ledger) when the session ends, so you never have to remember the commands. It no-ops unless a new workflow finished; disable withULTRACOST_AUTORUN=off. - Budget guard. Set
budget.perRun/budget.perDayand the gate denies a launch whose estimate blows the cap, before it runs.
Live HUD statusline
Turn ultracost into your Claude Code statusline — a live, themed HUD that shows session cost, running subagents, context usage, and an animated pixel logo, refreshed ~once a second.
It sets itself up automatically. On the first session after you install the plugin, a
SessionStart hook sets ultracost as your statusLine — no command to run. It does this
once (so if you later remove it, it won't fight you), backs up any statusline you
already had (restored on uninstall), and can be turned off with ULTRACOST_HUD=off.
ultracost init does the same on the CLI path. Drive it directly with:
/ultracost:hud # or: ultracost hud (reads the statusline JSON on stdin)It reads Claude Code's statusline payload on stdin and renders offline — no telemetry, no
network. Don't want it? It's removed cleanly by ultracost uninstall / /plugin uninstall.
How routing is decided
| Tier | Model | Use for |
|------|-------|---------|
| opus | claude-opus-4-8 @ xhigh | writing/refactoring/debugging, design & architecture, security/perf, tests needing judgment, planning, synthesis |
| sonnet | claude-sonnet-4-6 @ high | applying a decided edit across files, search/grep, running tests, git ops, docs, gathering context |
Decision rule: if the stage must decide how to change code → opus. If the how is already planned and it just executes → sonnet. When in doubt → opus. Never haiku. This is opinionated and quality-first; edit the policy for a cost-first split.
Commands
Every verb is a plugin slash command (/ultracost:<verb>, runs the bundled CLI via ${CLAUDE_PLUGIN_ROOT} — nothing on PATH) and an npm CLI command.
| Command | What it does |
|---------|--------------|
| check [path] | Flag agent() stages that don't pin a model (or pin the wrong tier); --fix the safe ones. |
| estimate <script> | Agent count, model mix, tiered cost vs all-opus. |
| explain <script> | Per-stage rationale: model, effort, reads-like tier, est cost, check flags. |
| simulate <script> | Cost under all-opus vs tiered vs all-sonnet. |
| diff <a> <b> | Cost delta between two versions (--ci → PR-comment table). |
| audit [dir] | Pin stats across your real workflow scripts. |
| hud | Live cost HUD statusline (set as your statusLine on install; restores the prior one on uninstall). |
| usage [dir] | Real token cost from local transcripts. |
| reconcile [--last\|<id>] | Estimate vs actual per stage for a finished run. |
| calibrate | Tune the estimator from your real token usage. |
| ledger | Cumulative savings vs all-opus across recorded runs. |
| pricing [refresh] | Show pricing, or refresh from Anthropic's official page. |
| status · doctor · init · uninstall | Delivery/policy state, diagnostics, install, reverse it. |
init,pricing refresh,doctor, anduninstallare CLI-only. The plugin bundles theSessionStarthook, thePreToolUsegate, the routing skill, and the slash commands; nothing in your config is mutated.
Usage examples
Common workflows, end to end. Inside Claude Code, use the slash commands (the plugin
path — nothing on PATH); the ultracost <verb> CLI equivalents are for shells and CI.
1. Check a workflow script before you launch it
/ultracost:check ./deep-audit.workflow.jscheck · 1 file(s)
✗ UC002 stage "scan repo" has options but no model (line 12)
✗ UC001 agent("summarize") pins no model (line 27)
2 error(s) — these stages would inherit the session's Opus @ xhigh/ultracost:check proposes the correct per-stage pins and offers to apply them for you.
CLI equivalent: ultracost check ./deep-audit.workflow.js --fix.
2. Estimate cost and compare tiers before launching
/ultracost:estimate ./deep-audit.workflow.js # tiered vs all-opus
/ultracost:simulate ./deep-audit.workflow.js # all-opus vs tiered vs all-sonnet
/ultracost:explain ./deep-audit.workflow.js # per-stage rationale + flags3. Audit every workflow script you've already run
/ultracost:auditPrints the share of agent() stages that pin no model (and would silently inherit Opus).
CLI: ultracost audit ~/.claude/projects.
4. Pin a stage correctly when authoring a workflow
// search/collection → cheap tier, low effort
agent("grep the repo for callers", { model: 'sonnet', effort: 'low' });
// design/refactor decision → opus
agent("redesign the auth module", { model: 'opus', effort: 'xhigh' });5. Reconcile an estimate against what a run actually cost
/ultracost:reconcile # estimate vs ACTUAL per stage, for your latest run
/ultracost:ledger # cumulative $ saved vs all-opus6. Gate every launch in CI (no plugin — CLI only)
- run: npx ultracost check . --jsonUninstall
/plugin uninstall ultracost@ultracost # plugin (removes everything it added)
/plugin marketplace remove ultracostultracost uninstall # npm CLI (reverses init; invalid settings.json is reported, never overwritten)Customizing the policy
Edit ~/.claude/ultracost/policy.json, then re-run ultracost init to recompile the rules:
{
"neverUse": ["haiku"],
"allowInherit": false,
"default": "opus",
"tiers": {
"opus": { "model": "opus", "effort": "xhigh" },
"sonnet": { "model": "sonnet", "effort": "high" }
},
"alwaysOpus": ["orchestrator", "planner", "final-synthesis"]
}Full reference: docs/policy.md.
Use in CI
- run: npx ultracost check . --jsonFails the build if any committed workflow script has a stage that would inherit the session model.
How it compares
ultracost is intentionally narrow. General-purpose routers (claude-router, claude-smart-router, claude-model-changer, model-matchmaker) score every prompt and route the main loop. Linters like claudelint validate a file-based agent's model:. ultracost targets the dynamic-workflow / ultracode path and is, as far as we can tell, the only tool that statically detects an unpinned inline agent()/pipeline() stage, flags a pin that mismatches the prompt, and reconciles its own estimate against real per-stage token usage. Cost tools (ccusage, tokencast, tokentoll) informed the transcript-parsing and calibration approaches (reimplemented clean-room). Credits: NOTICE.
Documentation
- Showcase — a live
ultracoderun — policy injection → guard → cost gate → confirm, end to end, unprompted - Architecture · Policy reference · Cost & estimates
- Why ultracode needs this · Testing guide · Publishing & recognition
Versioning & releases
Semantic versioning; see CHANGELOG.md. Tagged releases (vX.Y.Z) publish to npm and GitHub Releases via CI.
Configured for GitHub
danielkremen818/ultracost. Forking? Update the handle in the install commands, badges,package.json,CHANGELOG.md, and.claude-plugin/plugin.json— seedocs/PUBLISHING.md.
License
MIT © Daniel Kremen. Clean-room implementation; prior art credited in NOTICE.
