@dstyll/cc-clip
v0.1.0
Published
Route coding-agent prompts to the right Claude model tier (haiku/sonnet/opus) based on classified complexity.
Maintainers
Readme
dstyll CC Clip
Route coding-agent prompts to the right Claude model tier — haiku / sonnet / opus — based on the prompt's classified complexity. Easy work runs cheap, hard work escalates up, and you don't change how you prompt.
It works as a Claude Code UserPromptSubmit hook: every prompt is classified
locally (no model tokens, tens of milliseconds), and a directive is injected
telling the agent to delegate the task to a model-pinned subagent and relay the
result. Classification is pure TypeScript — no Python, no torch, no native
modules at runtime.
Install
npm i -g @dstyll/cc-clip(You already have Node — Claude Code runs on it.)
Set up a repo
cd your-project
cc-clip initThis is idempotent and:
- registers the
UserPromptSubmithook in.claude/settings.json - writes
easy/medium/hardsubagents to.claude/agents/(pinned to haiku / sonnet / opus) - adds a managed routing rule to
CLAUDE.md
Remove it any time with cc-clip init --remove.
Recommended: set a cheap orchestrator
In Claude Code, set your model with /model to Sonnet or Haiku.
cc-clip routes the heavy work to a subagent and has the orchestrator
relay the result, so a cheap orchestrator is what turns routing into real
savings — hard tasks still escalate up to Opus via the hard subagent. If you
run Opus as the orchestrator, savings only appear when post-processing stays
minimal (which the relay directive enforces).
How classification works
Three stages, all in-process:
- Heuristics — length, code fences, file references, and verb signals
(
refactor/migrate/architect→ hard;rename/typo/format→ easy). Obvious prompts short-circuit here instantly. - Static embedding + logistic regression — a model2vec-style quantized word table (mean-pooled) feeds a pretrained softmax head producing per-tier probabilities. No neural inference, no network.
- Confidence fallback — if the top probability is below the threshold, the prompt routes to Sonnet (the safe all-rounder).
On a fresh install before the embedding artifacts are present, it falls back to heuristics + the safe tier — so it always works, just more conservatively.
Context-aware follow-ups
Conversational follow-ups ("now do the same", "make that thread-safe") are
ambiguous on their own. When contextAware is on (default), the hook reads the
session transcript and, only for messages a detector flags as ambiguous
follow-ups, consults the last couple of turns to resolve the reference. The
prior context is blended under a capped weight that cannot override a confident
read of the current message, so it lifts follow-up accuracy without making
everything look hard. With no transcript (e.g. cc-clip classify) or the feature
off, behaviour is identical to single-message routing. On follow-ups where prior session context is available, context-aware routing adds +31 pts (0.62 → 0.92, held-out evaluation) with no regression on standalone prompts.
Commands
| Command | What it does |
|---------|--------------|
| cc-clip classify [prompt] | Classify a prompt (arg or stdin) → JSON {tier,confidence,scores,reasons} |
| cc-clip hook | The UserPromptSubmit entry point (reads the payload on stdin) |
| cc-clip init [--remove] | Scaffold / remove the hook, subagents, and CLAUDE.md rule |
| cc-clip stats | Tier distribution + estimated cumulative savings |
| cc-clip savings [--since 7d] | Projected-savings total + extrapolated rate (estimate) |
| cc-clip train --data labels.jsonl | Refine the LR head locally from labeled prompts |
Projected savings
Every routing decision is logged to a local JSONL file (a prompt hash, never
the raw text). cc-clip stats and cc-clip savings report how much routing is
estimated to save versus running every prompt on your baseline tier.
These are estimates, not billed amounts: at classify time the real response
token count is unknown, so savings are computed as
baseline_cost − chosen_cost using a configurable per-tier price table and
estimated token counts. Override the prices as pricing changes (see Config).
On the evaluation dataset, routing cuts spend ~20% vs an Opus-only baseline (risk-adjusted, held-out evaluation). With an Opus orchestrator or large per-prompt context, savings are smaller — set a cheap orchestrator (Sonnet or Haiku) to get the full benefit.
Config
Resolved from built-in defaults → ~/.config/cc-clip/config.toml →
a project-level ./.cc-clip.toml. Example:
enabled = true
confidenceThreshold = 0.42
baselineModel = "opus"
# context-aware routing for ambiguous follow-ups
contextAware = true
priorTurns = 2
maxContextTokens = 400
[tierModel]
easy = "haiku"
medium = "sonnet"
hard = "opus"
[postProcessing]
easy = "relay"
medium = "relay"
hard = "light-verify"
[expectedOutputTokens]
easy = 400
medium = 1200
hard = 3000
# USD per token (override as list prices change)
[prices.haiku]
input = 0.0000008
output = 0.000004
[prices.sonnet]
input = 0.000003
output = 0.000015
[prices.opus]
input = 0.000015
output = 0.000075Accuracy
On a held-out split the classifier (embedding+LR) reaches ~74% accuracy on the
combined dataset of standalone prompts and context-paired follow-ups.
Errors skew to the safe direction by design — hard tasks are never silently
handed to the cheapest model; uncertain prompts fall back to Sonnet. Accuracy
improves as the dataset grows — add labeled rows and re-run the pipeline
(scripts/); no code changes needed. See docs/performance.md
for full benchmark results.
Development
npm install
npm run typecheck
npm test
npm run buildThe bundled model artifacts under artifacts/ are produced by the offline
pipeline in scripts/ (Python — never run by end users, only when retraining).
See scripts/README.md.
License
MIT
