claude-subagent-budget
v0.1.1
Published
Pre-flight cost & quota estimator for Claude Code subagent spawns. Estimate Anthropic Max plan / ChatGPT Plus quota consumption, duration, and risk before launching multi-agent workflows.
Maintainers
Readme
claude-subagent-budget
Pre-flight cost & quota estimator for Claude Code subagent spawns.
Before launching a multi-agent workflow (e.g. Writer → Codex fact-check → QA-Guard × N parallel articles), claude-subagent-budget estimates how much of your Claude Max plan / ChatGPT Plus quota the spawn will consume — plus duration and risk warnings — so you can avoid quota exhaustion or context-limit crashes mid-flight.
Why
- Claude Max ($20 / $100 / $200) and ChatGPT Plus are subscription-based. There is no built-in "how much of my 5h quota will this spawn use?" feedback.
- Multi-agent spawns can quietly explode the context window or burn through ChatGPT quota in minutes.
- This tool is dependency-free Node.js and prints a quota-percentage view that matches how subscription users actually think about cost.
Features
- Anthropic Max plan quota usage per model (Opus / Sonnet / Haiku) — primary display
- ChatGPT Plus quota usage for Codex CLI / GPT-5.x sessions — secondary display
- USD / JPY reference for API-billed mode (OSS users on pay-per-use)
- Wall-clock duration estimate using per-agent runtime and parallelism wave-splitting
- Risk evaluation (
warn/block) at configurable thresholds - Exit codes 0/1/2 for CI/script integration
- JSON output for tooling integration
- Cross-platform (Windows / macOS / Linux), Node.js >= 18, zero dependencies
Installation
# Run with npx (no install)
npx claude-subagent-budget < plan.json
# Or install globally
npm install -g claude-subagent-budget
claude-subagent-budget < plan.jsonUsage
CLI
echo '{
"plan": [
{"agent": "writer", "task": "article generation", "model": "opus", "expected_chars": 5500},
{"agent": "codex-rescue", "task": "FC --fresh", "model": "gpt-5.5", "expected_chars": 5500},
{"agent": "qa-guard", "task": "QA review", "model": "sonnet", "expected_chars": 5500}
],
"parallelism": 12
}' | claude-subagent-budgetOutput (default: pretty)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🪙 Subagent Budget Estimate
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
plan: 3 agents × 12 articles parallel
Tokens:
in: ~ 180,096 tokens
out: ~ 153,000 tokens
total: ~ 333,096 tokens
📊 Claude Max ($100) 5h quota usage:
opus : 2.9% ( 159,024 / 5,500,000 tokens) → 97.1% remaining
sonnet : 0.3% ( 78,036 / 27,500,000 tokens) → 99.7% remaining
📊 ChatGPT Plus ($20) 5h quota usage:
codex : 19.3% ( 96,036 / 500,000 tokens) → 80.7% remaining
(API-billed reference: $8.78 / ¥1,317 — no real charge for subscription users)
Duration:
median: 84 min (parallelism=12, 3 wave(s))
p90: 147 min
⚠️ Warnings:
[BLOCK] - codex-rescue × 12 parallel (=12 sessions) exceeds ChatGPT Plus quota
❌ Block: true (reason: codex-parallel-hard-block)JSON output
echo '<plan>' | claude-subagent-budget --json{
"plan_summary": { "agents": 3, "parallelism": 12 },
"tokens": { "input_tokens": 180096, "output_tokens": 153000, "total_tokens": 333096 },
"anthropic_quota": {
"plan": "max_100",
"tier_name": "Claude Max ($100)",
"by_model": {
"opus": { "used": 159024, "quota": 5500000, "pct": 2.9 },
"sonnet": { "used": 78036, "quota": 27500000, "pct": 0.3 },
"haiku": { "used": 0, "quota": 55000000, "pct": 0 }
}
},
"chatgpt_quota": {
"plan": "plus",
"tier_name": "ChatGPT Plus ($20)",
"used": 96036, "quota": 500000, "pct": 19.3
},
"cost_reference": { "usd": 8.78, "jpy": 1317, "note": "..." },
"duration": { "median_min": 84, "p90_min": 147, "per_article_min": 28, "waves": 3 },
"warnings": ["..."],
"block": true,
"block_reason": "codex-parallel-hard-block",
"exit_code": 2
}Input format
| Field | Type | Description |
|---|---|---|
| plan[].agent | string | Agent identifier (e.g. writer, qa-guard, codex-rescue, analyst-scout) |
| plan[].task | string | Task description (length affects input token estimate) |
| plan[].model | string | Model identifier (see Supported models) |
| plan[].expected_chars | number | Expected output length in characters (drives output token estimate) |
| parallelism | number | Number of articles processed in parallel (default: 1) |
Flags
| Flag | Description |
|---|---|
| --json | Output JSON instead of pretty text |
| --auto | Same as --json (intended for tool integration) |
| --block-on-quota | Promote 80%+ quota warnings to block (exit 2) |
| --block-on-context | Promote 800K+ context warnings to block (exit 2) |
| -h, --help | Show help |
Exit codes
| Code | Meaning |
|---|---|
| 0 | OK — no warnings |
| 1 | WARN — warnings present, execution allowed |
| 2 | BLOCK — quota exhaustion / context overflow / hard parallelism cap reached |
Supported models
| Model | Provider | Billing |
|---|---|---|
| opus, sonnet, haiku | Anthropic Claude | Subscription quota (Max plan) or pay-per-use API |
| gpt-5.5, gpt-5.4 | ChatGPT Plus (Codex CLI) | 5h subscription quota |
| gpt-4o, gpt-4o-mini | OpenAI API | Pay-per-use |
Unknown models fall back to Sonnet-equivalent quota tracking.
Configuration
Switching plan tier
Edit config/model-pricing.json:
{
"user_plan": "max_200", // "pro" | "max_100" | "max_200"
...
}Customising quota figures
Anthropic does not publish exact numbers, so the bundled defaults are approximations. Tune them in config/model-pricing.json:
"max_100": {
"tier_name": "Claude Max ($100)",
"5h_token_quota": {
"opus": 5500000,
"sonnet": 27500000,
"haiku": 55000000
}
}Adding a custom model
{
"models": {
"my-custom-model": {
"input_per_1m": 5,
"output_per_1m": 20,
"currency": "USD"
}
}
}Library usage
const { estimatePlan } = require('claude-subagent-budget/lib/token-estimator');
const { calcPlanCost } = require('claude-subagent-budget/lib/cost-calculator');
const { predictDuration } = require('claude-subagent-budget/lib/duration-predictor');
const { evaluateRisk } = require('claude-subagent-budget/lib/risk-evaluator');
const plan = [
{ agent: 'writer', task: 'article', model: 'opus', expected_chars: 5500 },
{ agent: 'codex-rescue', task: 'fc', model: 'gpt-5.5', expected_chars: 5500 },
];
const parallelism = 6;
const tokens = estimatePlan(plan, parallelism);
const cost = calcPlanCost(plan, tokens.per_agent, parallelism);
const duration = predictDuration(plan, parallelism);
const risk = evaluateRisk({ tokens: tokens.totals, cost, parallelism, plan }, {
blockOnQuota: true,
blockOnContext: true,
});
if (risk.block) {
console.error(`BLOCKED: ${risk.block_reason}`);
process.exit(2);
}How it works
Plan JSON
│
▼
┌──────────────────────────────┐
│ token-estimator │ Japanese chars × 1.5 + agent-specific output factors
├──────────────────────────────┤
│ cost-calculator │ Anthropic quota / ChatGPT quota / USD reference
├──────────────────────────────┤
│ duration-predictor │ per-agent runtime × ceil(parallelism / 5) waves
├──────────────────────────────┤
│ risk-evaluator │ warn at 80% / block at 95% per quota
└──────────────┬───────────────┘
▼
Pretty / JSON outputLimitations
- Token estimates are heuristic. They assume Japanese-text input; tune
CHARS_PER_TOKENinlib/token-estimator.jsfor English-heavy workloads. - Anthropic Max plan quota figures are approximations; calibrate from observed usage.
- Per-agent runtime values are based on observed Claude Code behavior with default plans/skills. Override via
agentRuntimesargument topredictDuration().
Releasing
Publishing is fully automated via GitHub Actions on tag push.
One-time setup (maintainer)
- Create an npm Automation Token at https://www.npmjs.com/settings//tokens (scope:
Automation). - Add it to the repo as a secret named
NPM_TOKEN(Settings → Secrets and variables → Actions → New repository secret).
Each release
# Bump version in package.json
npm version patch # or minor / major
# Push the commit + the tag created by npm version
git push origin main
git push origin v0.1.1The Publish to npm workflow runs automatically on v* tag push:
- Runs smoke tests (
test/run.js) - Verifies tag matches
package.jsonversion - Publishes to npm with provenance (
--provenance)
You can also dry-run the publish from the GitHub Actions UI (Run workflow → enable "dry_run").
CI
The Test workflow runs the smoke tests on every push to main and every PR, across Node 18/20/22 on Ubuntu/macOS/Windows.
License
MIT — see LICENSE.
Contributing
Pull requests welcome. The core surface is small (~1,000 lines, 5 files). Useful extensions:
- More model-pricing presets (e.g. provider-specific tiers)
- Real-time quota API integration (when Anthropic/OpenAI expose it)
- Historical run log → automatic recalibration of runtime/token defaults
