claude-subagent-budget

v0.1.1

Published

2 months ago

Pre-flight cost & quota estimator for Claude Code subagent spawns. Estimate Anthropic Max plan / ChatGPT Plus quota consumption, duration, and risk before launching multi-agent workflows.

0High
0Medium
0Low

yuziri

claude claude-code anthropic max-plan subagent budget cost-estimation quota pre-flight multi-agent

claude-subagent-budget

Pre-flight cost & quota estimator for Claude Code subagent spawns.

Before launching a multi-agent workflow (e.g. Writer → Codex fact-check → QA-Guard × N parallel articles), claude-subagent-budget estimates how much of your Claude Max plan / ChatGPT Plus quota the spawn will consume — plus duration and risk warnings — so you can avoid quota exhaustion or context-limit crashes mid-flight.

Why

Claude Max ($20 / $100 / $200) and ChatGPT Plus are subscription-based. There is no built-in "how much of my 5h quota will this spawn use?" feedback.
Multi-agent spawns can quietly explode the context window or burn through ChatGPT quota in minutes.
This tool is dependency-free Node.js and prints a quota-percentage view that matches how subscription users actually think about cost.

Features

Anthropic Max plan quota usage per model (Opus / Sonnet / Haiku) — primary display
ChatGPT Plus quota usage for Codex CLI / GPT-5.x sessions — secondary display
USD / JPY reference for API-billed mode (OSS users on pay-per-use)
Wall-clock duration estimate using per-agent runtime and parallelism wave-splitting
Risk evaluation (warn / block) at configurable thresholds
Exit codes 0/1/2 for CI/script integration
JSON output for tooling integration
Cross-platform (Windows / macOS / Linux), Node.js >= 18, zero dependencies

Installation

# Run with npx (no install)
npx claude-subagent-budget < plan.json

# Or install globally
npm install -g claude-subagent-budget
claude-subagent-budget < plan.json

Usage

CLI

echo '{
  "plan": [
    {"agent": "writer",       "task": "article generation", "model": "opus",    "expected_chars": 5500},
    {"agent": "codex-rescue", "task": "FC --fresh",         "model": "gpt-5.5", "expected_chars": 5500},
    {"agent": "qa-guard",     "task": "QA review",          "model": "sonnet",  "expected_chars": 5500}
  ],
  "parallelism": 12
}' | claude-subagent-budget

Output (default: pretty)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🪙 Subagent Budget Estimate
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
plan: 3 agents × 12 articles parallel

Tokens:
  in:  ~ 180,096 tokens
  out: ~ 153,000 tokens
  total: ~ 333,096 tokens

📊 Claude Max ($100) 5h quota usage:
  opus   :   2.9%  (   159,024 /  5,500,000 tokens)  → 97.1% remaining
  sonnet :   0.3%  (    78,036 / 27,500,000 tokens)  → 99.7% remaining

📊 ChatGPT Plus ($20) 5h quota usage:
  codex  :  19.3%  (    96,036 /    500,000 tokens)  → 80.7% remaining

(API-billed reference: $8.78 / ¥1,317 — no real charge for subscription users)

Duration:
  median: 84 min  (parallelism=12, 3 wave(s))
  p90:    147 min

⚠️ Warnings:
  [BLOCK] - codex-rescue × 12 parallel (=12 sessions) exceeds ChatGPT Plus quota

❌ Block: true  (reason: codex-parallel-hard-block)

JSON output

echo '<plan>' | claude-subagent-budget --json

{
  "plan_summary": { "agents": 3, "parallelism": 12 },
  "tokens": { "input_tokens": 180096, "output_tokens": 153000, "total_tokens": 333096 },
  "anthropic_quota": {
    "plan": "max_100",
    "tier_name": "Claude Max ($100)",
    "by_model": {
      "opus":   { "used": 159024, "quota": 5500000,  "pct": 2.9 },
      "sonnet": { "used":  78036, "quota": 27500000, "pct": 0.3 },
      "haiku":  { "used":      0, "quota": 55000000, "pct": 0 }
    }
  },
  "chatgpt_quota": {
    "plan": "plus",
    "tier_name": "ChatGPT Plus ($20)",
    "used": 96036, "quota": 500000, "pct": 19.3
  },
  "cost_reference": { "usd": 8.78, "jpy": 1317, "note": "..." },
  "duration": { "median_min": 84, "p90_min": 147, "per_article_min": 28, "waves": 3 },
  "warnings": ["..."],
  "block": true,
  "block_reason": "codex-parallel-hard-block",
  "exit_code": 2
}

Input format

| Field | Type | Description | |---|---|---| | plan[].agent | string | Agent identifier (e.g. writer, qa-guard, codex-rescue, analyst-scout) | | plan[].task | string | Task description (length affects input token estimate) | | plan[].model | string | Model identifier (see Supported models) | | plan[].expected_chars | number | Expected output length in characters (drives output token estimate) | | parallelism | number | Number of articles processed in parallel (default: 1) |

Flags

| Flag | Description | |---|---| | --json | Output JSON instead of pretty text | | --auto | Same as --json (intended for tool integration) | | --block-on-quota | Promote 80%+ quota warnings to block (exit 2) | | --block-on-context | Promote 800K+ context warnings to block (exit 2) | | -h, --help | Show help |

Exit codes

| Code | Meaning | |---|---| | 0 | OK — no warnings | | 1 | WARN — warnings present, execution allowed | | 2 | BLOCK — quota exhaustion / context overflow / hard parallelism cap reached |

Supported models

| Model | Provider | Billing | |---|---|---| | opus, sonnet, haiku | Anthropic Claude | Subscription quota (Max plan) or pay-per-use API | | gpt-5.5, gpt-5.4 | ChatGPT Plus (Codex CLI) | 5h subscription quota | | gpt-4o, gpt-4o-mini | OpenAI API | Pay-per-use |

Unknown models fall back to Sonnet-equivalent quota tracking.

Configuration

Switching plan tier

Edit config/model-pricing.json:

{
  "user_plan": "max_200",   // "pro" | "max_100" | "max_200"
  ...
}

Customising quota figures

Anthropic does not publish exact numbers, so the bundled defaults are approximations. Tune them in config/model-pricing.json:

"max_100": {
  "tier_name": "Claude Max ($100)",
  "5h_token_quota": {
    "opus": 5500000,
    "sonnet": 27500000,
    "haiku": 55000000
  }
}

Adding a custom model

{
  "models": {
    "my-custom-model": {
      "input_per_1m": 5,
      "output_per_1m": 20,
      "currency": "USD"
    }
  }
}

Library usage

const { estimatePlan } = require('claude-subagent-budget/lib/token-estimator');
const { calcPlanCost } = require('claude-subagent-budget/lib/cost-calculator');
const { predictDuration } = require('claude-subagent-budget/lib/duration-predictor');
const { evaluateRisk } = require('claude-subagent-budget/lib/risk-evaluator');

const plan = [
  { agent: 'writer',       task: 'article', model: 'opus',    expected_chars: 5500 },
  { agent: 'codex-rescue', task: 'fc',      model: 'gpt-5.5', expected_chars: 5500 },
];
const parallelism = 6;

const tokens = estimatePlan(plan, parallelism);
const cost = calcPlanCost(plan, tokens.per_agent, parallelism);
const duration = predictDuration(plan, parallelism);
const risk = evaluateRisk({ tokens: tokens.totals, cost, parallelism, plan }, {
  blockOnQuota: true,
  blockOnContext: true,
});

if (risk.block) {
  console.error(`BLOCKED: ${risk.block_reason}`);
  process.exit(2);
}

How it works

Plan JSON
   │
   ▼
┌──────────────────────────────┐
│  token-estimator             │  Japanese chars × 1.5 + agent-specific output factors
├──────────────────────────────┤
│  cost-calculator             │  Anthropic quota / ChatGPT quota / USD reference
├──────────────────────────────┤
│  duration-predictor          │  per-agent runtime × ceil(parallelism / 5) waves
├──────────────────────────────┤
│  risk-evaluator              │  warn at 80% / block at 95% per quota
└──────────────┬───────────────┘
               ▼
        Pretty / JSON output

Limitations

Token estimates are heuristic. They assume Japanese-text input; tune CHARS_PER_TOKEN in lib/token-estimator.js for English-heavy workloads.
Anthropic Max plan quota figures are approximations; calibrate from observed usage.
Per-agent runtime values are based on observed Claude Code behavior with default plans/skills. Override via agentRuntimes argument to predictDuration().

Releasing

Publishing is fully automated via GitHub Actions on tag push.

One-time setup (maintainer)

Create an npm Automation Token at https://www.npmjs.com/settings//tokens (scope: Automation).
Add it to the repo as a secret named NPM_TOKEN (Settings → Secrets and variables → Actions → New repository secret).

Each release

# Bump version in package.json
npm version patch     # or minor / major

# Push the commit + the tag created by npm version
git push origin main
git push origin v0.1.1

The Publish to npm workflow runs automatically on v* tag push:

Runs smoke tests (test/run.js)
Verifies tag matches package.json version
Publishes to npm with provenance (--provenance)

You can also dry-run the publish from the GitHub Actions UI (Run workflow → enable "dry_run").

CI

The Test workflow runs the smoke tests on every push to main and every PR, across Node 18/20/22 on Ubuntu/macOS/Windows.

License

MIT — see LICENSE.

Contributing

Pull requests welcome. The core surface is small (~1,000 lines, 5 files). Useful extensions:

More model-pricing presets (e.g. provider-specific tiers)
Real-time quota API integration (when Anthropic/OpenAI expose it)
Historical run log → automatic recalibration of runtime/token defaults

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

claude-subagent-budget

Why

Features

Installation

Usage

CLI

Output (default: pretty)

JSON output

Input format

Flags

Exit codes

Supported models

Configuration

Switching plan tier

Customising quota figures

Adding a custom model

Library usage

How it works

Limitations

Releasing

One-time setup (maintainer)

Each release

CI

License

Contributing