ultracost

v0.4.5

Published

6 days ago

Per-stage model routing for Claude Code dynamic workflows (ultracode). Quality-first policy, CLAUDE.md rule injection, and a workflow-script guard that catches subagent stages that would silently inherit Opus.

0High
0Medium
0Low

danielkremen818

claude claude-code ultracode dynamic-workflows subagents model-routing token-optimization cost-optimization opus sonnet

Stop a single ultracode fan-out from running dozens of subagents on Opus by accident.

When ultracode is on, the session is pinned to Opus @ xhigh and a single dynamic workflow fans out to dozens of subagents that inherit that session model unless every stage is pinned. ultracost makes per-stage routing explicit, policy-driven, and verifiable — quality-first, so coding and reasoning stay on Opus while pre-planned mechanical work drops to Sonnet. No telemetry. No network on the hot path. MIT.

Built for ultracode (Opus @ xhigh dynamic workflows) — the only place this fan-out happens. ultracost routes by tier (opus/sonnet), not a pinned version, so it tracks whatever Opus your session runs.

Security & trust. ultracost has zero runtime and dev dependencies, so there is no supply chain to compromise — Snyk Open Source and npm audit report 0 vulnerabilities. Releases publish to npm with OIDC Trusted Publishing and signed provenance, every GitHub Action is pinned to a commit SHA, and CodeQL + OpenSSF Scorecard run in CI. The installer touches only its own files and is fully reversible. See SECURITY.md.

Install

Claude Code plugin (recommended):

/plugin marketplace add danielkremen818/ultracost
/plugin install ultracost@ultracost

npm CLI (CI / scripting):

npx ultracost init

Via ClaudePluginHub (one command — adds the marketplace and installs the plugin):

npx claudepluginhub danielkremen818/ultracost --plugin ultracost

First command, in Claude Code: /ultracost:check ./path/to/workflow.js — flag any agent() stage that would silently inherit Opus. Every verb is also a slash command, so plugin users need nothing on PATH.

The problem

When ultracode is on, Claude Code runs the session on Opus @ xhigh (the only model that supports xhigh) and auto-orchestrates dynamic workflows that fan out to dozens — up to 1,000 — subagents. Two defaults compound:

Subagents inherit the session model. No per-stage override → every stage runs on the session's Opus.
The built-in workflow guidance tells Claude to omit the per-agent model. So inheritance wins.

The documented result: one prompt spawning 46 Opus subagents and ~3M tokens with no warning. A grep sweep and a per-file verifier do not need Opus. (Why ultracode makes this worse →)

The evidence: nobody pins a stage

This is the default behavior, not user error. In a scan of ~22 real ultracode workflow scripts, almost none pinned model: on any stage — even Anthropic's own bundled deep-research workflow pins zero. Reproduce it on your own history in one command:

npx ultracost audit ~/.claude/projects

What ultracost does

A quality-first policy. Coding and reasoning stay on Opus @ xhigh; pre-planned mechanical work and search/collection drop to Sonnet; Haiku is never used. You own it in one JSON file.
Always-on routing guidance. A SessionStart hook injects the policy as context at the start of every session (and after compaction) — present when Claude authors a workflow, no reliance on the model opening a skill.
The Workflow Guard. A static analyzer that flags any agent() stage missing a model: pin. Run it by hand, via /ultracost:check, or in CI. No other tool does this.
A pre-flight cost gate. A default-on PreToolUse hook estimates every workflow launch and pauses (or denies) it before a single subagent runs.
A closed loop. It reads its own runs back from local transcripts to reconcile, calibrate, and tally savings — offline.

Architecture

One shared core in src/, two delivery surfaces (a Claude Code plugin and an npm CLI), a runtime verification layer (the guard + cost gate), and a closed loop — all compiled from the same policy.json.

The plan lives in data (policy.json), not prose buried in a prompt. The guard is the layer the model can't talk its way out of. Full picture: docs/architecture.md.

The Workflow Guard

ultracost check ./wf.js      # or /ultracost:check in Claude Code

| Code | Meaning | Severity | |------|---------|----------| | UC001 | agent(x) with no options object | error | | UC002 | options object present, no model | error | | UC003 | model resolves to a banned model (e.g. haiku) | error | | UC004 | model: 'inherit' while allowInherit is false | error | | UC005 | model/options is a dynamic expression — can't verify | warning | | UC006 | the pin mismatches the work the prompt describes | warning | | UC007 | effort exceeds the model's cap (e.g. sonnet @ xhigh) | warning | | UC008 | an alwaysOpus role (orchestrator, …) pins a cheaper tier | warning |

The scanner runs on a hand-rolled, zero-dependency JS tokenizer — robust to template literals, spreads, optional-call agent?.(), and dynamic values; an agent( inside a prompt or comment is prose, never a call. Fan-out detection covers .map/.flatMap/forEach/for…of/Promise.all/Array.from/pipeline. --json for CI, --fix to auto-pin the unambiguous cases (UC001/UC002), --quiet for problems only. Only UC001–UC004 fail the build.

Cost estimate, effort, and the pre-flight gate

ultracost estimate ./wf.js   # agents, model mix, tiered vs all-opus
ultracost simulate ./wf.js   # all-opus vs your tiered pins vs all-sonnet

Official-sourced pricing. Prices live in policy.json with a _source URL and _asOf date; ultracost pricing refresh re-fetches Anthropic's official page. The estimate itself runs offline.
Dynamic effort. Each stage gets the lowest effort that fits (low→xhigh), bounded by model (sonnet up to high, opus up to xhigh), and effort feeds the estimate.
Pre-flight gate (on by default, hard in every mode). A deterministic PreToolUse hook on the Workflow tool runs the guard + estimate and leads with ⚠ N/M stage(s) NOT pinned → will inherit Opus when stages are unpinned. It asks (with the estimate) in default/acceptEdits/auto and auto-denies an unpinned or over-budget launch in bypassPermissions/dontAsk. ULTRACOST_GATE=strict|ask|off overrides it.

Estimates are relative (tiered vs all-opus), not a bill; fan-outs are ranges. Full detail and the gate's #52343 limitation: docs/ESTIMATES.md.

The closed loop: measure, reconcile, calibrate

ultracost reads its own results back — parsing your local transcripts (offline, no telemetry) and attributing tokens per workflow stage via the subagents/workflows/wf_*/agent-*.jsonl files Claude Code writes. No other router does this.

ultracost usage          # real cost: main loop vs subagents vs workflow stages
ultracost reconcile --last   # estimate vs ACTUAL, per stage, for your latest run
ultracost calibrate      # learn a token prior from your runs (the estimate uses it)
ultracost ledger         # cumulative $ saved vs all-opus, persisted

Self-calibrating. calibrate learns real per-stage token sizes (outlier-filtered); estimate, explain, simulate, and the gate use it automatically — closer to your reality every run.
Automatic on Stop. A Stop hook runs this loop (reconcile + calibrate + ledger) when the session ends, so you never have to remember the commands. It no-ops unless a new workflow finished; disable with ULTRACOST_AUTORUN=off.
Budget guard. Set budget.perRun / budget.perDay and the gate denies a launch whose estimate blows the cap, before it runs.

Live HUD statusline

Turn ultracost into your Claude Code statusline — a live, themed HUD that shows session cost, running subagents, context usage, and an animated pixel logo, refreshed ~once a second.

It sets itself up automatically. On the first session after you install the plugin, a SessionStart hook sets ultracost as your statusLine — no command to run. It does this once (so if you later remove it, it won't fight you), backs up any statusline you already had (restored on uninstall), and can be turned off with ULTRACOST_HUD=off. ultracost init does the same on the CLI path. Drive it directly with:

/ultracost:hud            # or: ultracost hud   (reads the statusline JSON on stdin)

It reads Claude Code's statusline payload on stdin and renders offline — no telemetry, no network. Don't want it? It's removed cleanly by ultracost uninstall / /plugin uninstall.

How routing is decided

| Tier | Model | Use for | |------|-------|---------| | opus | claude-opus-4-8 @ xhigh | writing/refactoring/debugging, design & architecture, security/perf, tests needing judgment, planning, synthesis | | sonnet | claude-sonnet-4-6 @ high | applying a decided edit across files, search/grep, running tests, git ops, docs, gathering context |

Decision rule: if the stage must decide how to change code → opus. If the how is already planned and it just executes → sonnet. When in doubt → opus. Never haiku. This is opinionated and quality-first; edit the policy for a cost-first split.

Commands

Every verb is a plugin slash command (/ultracost:<verb>, runs the bundled CLI via ${CLAUDE_PLUGIN_ROOT} — nothing on PATH) and an npm CLI command.

| Command | What it does | |---------|--------------| | check [path] | Flag agent() stages that don't pin a model (or pin the wrong tier); --fix the safe ones. | | estimate <script> | Agent count, model mix, tiered cost vs all-opus. | | explain <script> | Per-stage rationale: model, effort, reads-like tier, est cost, check flags. | | simulate <script> | Cost under all-opus vs tiered vs all-sonnet. | | diff <a> <b> | Cost delta between two versions (--ci → PR-comment table). | | audit [dir] | Pin stats across your real workflow scripts. | | hud | Live cost HUD statusline (set as your statusLine on install; restores the prior one on uninstall). | | usage [dir] | Real token cost from local transcripts. | | reconcile [--last\|<id>] | Estimate vs actual per stage for a finished run. | | calibrate | Tune the estimator from your real token usage. | | ledger | Cumulative savings vs all-opus across recorded runs. | | pricing [refresh] | Show pricing, or refresh from Anthropic's official page. | | status · doctor · init · uninstall | Delivery/policy state, diagnostics, install, reverse it. |

init, pricing refresh, doctor, and uninstall are CLI-only. The plugin bundles the SessionStart hook, the PreToolUse gate, the routing skill, and the slash commands; nothing in your config is mutated.

Usage examples

Common workflows, end to end. Inside Claude Code, use the slash commands (the plugin path — nothing on PATH); the ultracost <verb> CLI equivalents are for shells and CI.

1. Check a workflow script before you launch it

/ultracost:check ./deep-audit.workflow.js

check · 1 file(s)
✗ UC002  stage "scan repo" has options but no model  (line 12)
✗ UC001  agent("summarize") pins no model            (line 27)
2 error(s) — these stages would inherit the session's Opus @ xhigh

/ultracost:check proposes the correct per-stage pins and offers to apply them for you. CLI equivalent: ultracost check ./deep-audit.workflow.js --fix.

2. Estimate cost and compare tiers before launching

/ultracost:estimate ./deep-audit.workflow.js   # tiered vs all-opus
/ultracost:simulate ./deep-audit.workflow.js   # all-opus vs tiered vs all-sonnet
/ultracost:explain  ./deep-audit.workflow.js   # per-stage rationale + flags

3. Audit every workflow script you've already run

/ultracost:audit

Prints the share of agent() stages that pin no model (and would silently inherit Opus). CLI: ultracost audit ~/.claude/projects.

4. Pin a stage correctly when authoring a workflow

// search/collection → cheap tier, low effort
agent("grep the repo for callers", { model: 'sonnet', effort: 'low' });

// design/refactor decision → opus
agent("redesign the auth module", { model: 'opus', effort: 'xhigh' });

5. Reconcile an estimate against what a run actually cost

/ultracost:reconcile   # estimate vs ACTUAL per stage, for your latest run
/ultracost:ledger      # cumulative $ saved vs all-opus

6. Gate every launch in CI (no plugin — CLI only)

- run: npx ultracost check . --json

Uninstall

/plugin uninstall ultracost@ultracost     # plugin (removes everything it added)
/plugin marketplace remove ultracost

ultracost uninstall                        # npm CLI (reverses init; invalid settings.json is reported, never overwritten)

Customizing the policy

Edit ~/.claude/ultracost/policy.json, then re-run ultracost init to recompile the rules:

{
  "neverUse": ["haiku"],
  "allowInherit": false,
  "default": "opus",
  "tiers": {
    "opus": { "model": "opus", "effort": "xhigh" },
    "sonnet": { "model": "sonnet", "effort": "high" }
  },
  "alwaysOpus": ["orchestrator", "planner", "final-synthesis"]
}

Full reference: docs/policy.md.

Use in CI

- run: npx ultracost check . --json

Fails the build if any committed workflow script has a stage that would inherit the session model.

How it compares

ultracost is intentionally narrow. General-purpose routers (claude-router, claude-smart-router, claude-model-changer, model-matchmaker) score every prompt and route the main loop. Linters like claudelint validate a file-based agent's model:. ultracost targets the dynamic-workflow / ultracode path and is, as far as we can tell, the only tool that statically detects an unpinned inline agent()/pipeline() stage, flags a pin that mismatches the prompt, and reconciles its own estimate against real per-stage token usage. Cost tools (ccusage, tokencast, tokentoll) informed the transcript-parsing and calibration approaches (reimplemented clean-room). Credits: NOTICE.

Documentation

Showcase — a live ultracode run — policy injection → guard → cost gate → confirm, end to end, unprompted
Architecture · Policy reference · Cost & estimates
Why ultracode needs this · Testing guide · Publishing & recognition

Versioning & releases

Semantic versioning; see CHANGELOG.md. Tagged releases (vX.Y.Z) publish to npm and GitHub Releases via CI.

Configured for GitHub danielkremen818/ultracost. Forking? Update the handle in the install commands, badges, package.json, CHANGELOG.md, and .claude-plugin/plugin.json — see docs/PUBLISHING.md.