anthropic-router
v0.1.0
Published
Drop-in Anthropic SDK wrapper that automatically routes requests to the cheapest model that can handle them.
Maintainers
Readme
anthropic-router
Drop-in Anthropic SDK wrapper that automatically routes messages.create() calls to the cheapest model that can reliably handle them.
npm install anthropic-routerThe problem
You default to Sonnet on every request — including the ones Haiku handles just as well. The pricing difference:
| Model | Price / 1K tokens | |---|---| | Haiku | $0.0002 | | Sonnet | $0.003 | | Opus | $0.015 |
That's a 15x spread between Haiku and Sonnet. Most apps have 40–70% of requests that are Haiku-eligible.
Projected savings
| Monthly Anthropic spend | Haiku-eligible requests | Spend with router | Monthly savings | |---|---|---|---| | $50 | 50% | $27 | $23 | | $100 | 50% | $54 | $46 | | $200 | 60% | $75 | $125 | | $500 | 60% | $187 | $313 |
Usage
// Before
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
// After — zero other changes needed
import { AnthropicRouter } from "anthropic-router";
const client = new AnthropicRouter();
// Same interface
const response = await client.messages.create({
model: "auto", // or omit — defaults to auto-routing
max_tokens: 1024,
messages: [{ role: "user", content: "What is the capital of France?" }],
});
console.log(response.routing);
// {
// requested: "auto",
// selected: "claude-haiku-4-5-20251001",
// confidence: 0.94,
// signals: ["short_prompt", "factual_keywords"],
// retried: false,
// latencyMs: 0
// }Override: explicit model always wins
// Pin a specific call — bypasses routing entirely
const response = await client.messages.create({
model: "claude-sonnet-4-6", // explicit model: no routing
max_tokens: 2048,
messages: [{ role: "user", content: "Design a scalable auth system." }],
});Streaming
messages.stream() works as a drop-in: the router classifies the request and injects the selected model before opening the stream. No buffering, no retry logic on streams (that requires a full response).
const stream = client.messages.stream({
model: "auto",
max_tokens: 1024,
messages: [{ role: "user", content: "Tell me a short story." }],
});
for await (const event of stream) {
// same as SDK stream events
}Routing logic
Every request is classified by 11 heuristic signals before the API call:
| Signal | Direction | |---|---| | Short prompt (< 200 tokens) | → Haiku | | Factual keywords (what is, who is, translate, summarize...) | → Haiku | | Long prompt (> 800 tokens) | → Sonnet | | Complex keywords (analyze, debug, architecture, reason through...) | → Sonnet | | Multi-step instructions | → Sonnet | | Simple tools (≤ 2 tools, flat schema) | → neutral | | Complex tools (3+ tools or nested schemas) | → Sonnet | | Long system prompt (> 500 chars) | → Sonnet | | Deep conversation (> 5 turns) | → Sonnet |
Conservative default: routes UP (Sonnet) when confidence < 0.85 or fewer than 2 independent signals support downrouting. When in doubt, quality over cost.
Auto-retry: if a Haiku response looks truncated (stop_reason: max_tokens on a substantive prompt) or matches a refusal pattern, retries once on Sonnet automatically. routing.retried: true tells you when this happened.
Rate limit fallback: if Haiku returns 429, automatically retries on Sonnet with routing.fallback: "rate_limit".
Options
const client = new AnthropicRouter({
confidenceThreshold: 0.85, // default; raise to be more conservative
telemetry: false, // opt-in only; sends signals, never prompt text
onMisroute: (event) => {
// called when you manually override the selected model
// event: { signals, selectedModel, developerOverrideModel, confidence }
console.log("Router would have used:", event.selectedModel);
},
});Pass an existing Anthropic client as the first argument:
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const client = new AnthropicRouter(anthropic, { telemetry: true });Routing metadata
Every messages.create() response includes a routing field:
type RoutingMetadata = {
requested: string; // "auto" or the model you passed
selected: string; // model that was actually called
selectedTier: "haiku" | "sonnet" | "opus";
confidence: number; // 0–1
signals: string[]; // which signals fired
override: string | null; // set when you pinned an explicit model
retried: boolean;
retryReason: string | null;
fallback: string | null; // e.g. "rate_limit"
latencyMs: number; // classification time (not total request latency)
};Accuracy
The classifier is tested against a labeled ground truth dataset of real API call patterns. The CI gate requires > 90% accuracy on every PR. Current: 100% on 30 examples.
Add your own real API calls to tests/ground-truth.ts to calibrate the classifier against your specific usage patterns before deploying.
What's out of scope in v0
- Streaming routing with retry:
messages.stream()classifies and routes but does not retry on low-confidence responses. Retry logic requires buffering the full response — v0.2. - Multi-provider routing (OpenAI, Gemini): planned post-v1.
License
MIT
