anthropic-router

v0.1.0

Published

a month ago

Drop-in Anthropic SDK wrapper that automatically routes requests to the cheapest model that can handle them.

0High
0Medium
0Low

rikitrader

anthropic claude routing haiku sonnet cost-optimization llm

anthropic-router

Drop-in Anthropic SDK wrapper that automatically routes messages.create() calls to the cheapest model that can reliably handle them.

npm install anthropic-router

The problem

You default to Sonnet on every request — including the ones Haiku handles just as well. The pricing difference:

| Model | Price / 1K tokens | |---|---| | Haiku | $0.0002 | | Sonnet | $0.003 | | Opus | $0.015 |

That's a 15x spread between Haiku and Sonnet. Most apps have 40–70% of requests that are Haiku-eligible.

Projected savings

| Monthly Anthropic spend | Haiku-eligible requests | Spend with router | Monthly savings | |---|---|---|---| | $50 | 50% | $27 | $23 | | $100 | 50% | $54 | $46 | | $200 | 60% | $75 | $125 | | $500 | 60% | $187 | $313 |

Usage

// Before
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();

// After — zero other changes needed
import { AnthropicRouter } from "anthropic-router";
const client = new AnthropicRouter();

// Same interface
const response = await client.messages.create({
  model: "auto",  // or omit — defaults to auto-routing
  max_tokens: 1024,
  messages: [{ role: "user", content: "What is the capital of France?" }],
});

console.log(response.routing);
// {
//   requested: "auto",
//   selected: "claude-haiku-4-5-20251001",
//   confidence: 0.94,
//   signals: ["short_prompt", "factual_keywords"],
//   retried: false,
//   latencyMs: 0
// }

Override: explicit model always wins

// Pin a specific call — bypasses routing entirely
const response = await client.messages.create({
  model: "claude-sonnet-4-6",  // explicit model: no routing
  max_tokens: 2048,
  messages: [{ role: "user", content: "Design a scalable auth system." }],
});

Streaming

messages.stream() works as a drop-in: the router classifies the request and injects the selected model before opening the stream. No buffering, no retry logic on streams (that requires a full response).

const stream = client.messages.stream({
  model: "auto",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Tell me a short story." }],
});

for await (const event of stream) {
  // same as SDK stream events
}

Routing logic

Every request is classified by 11 heuristic signals before the API call:

| Signal | Direction | |---|---| | Short prompt (< 200 tokens) | → Haiku | | Factual keywords (what is, who is, translate, summarize...) | → Haiku | | Long prompt (> 800 tokens) | → Sonnet | | Complex keywords (analyze, debug, architecture, reason through...) | → Sonnet | | Multi-step instructions | → Sonnet | | Simple tools (≤ 2 tools, flat schema) | → neutral | | Complex tools (3+ tools or nested schemas) | → Sonnet | | Long system prompt (> 500 chars) | → Sonnet | | Deep conversation (> 5 turns) | → Sonnet |

Conservative default: routes UP (Sonnet) when confidence < 0.85 or fewer than 2 independent signals support downrouting. When in doubt, quality over cost.

Auto-retry: if a Haiku response looks truncated (stop_reason: max_tokens on a substantive prompt) or matches a refusal pattern, retries once on Sonnet automatically. routing.retried: true tells you when this happened.

Rate limit fallback: if Haiku returns 429, automatically retries on Sonnet with routing.fallback: "rate_limit".

Options

const client = new AnthropicRouter({
  confidenceThreshold: 0.85,  // default; raise to be more conservative
  telemetry: false,            // opt-in only; sends signals, never prompt text
  onMisroute: (event) => {
    // called when you manually override the selected model
    // event: { signals, selectedModel, developerOverrideModel, confidence }
    console.log("Router would have used:", event.selectedModel);
  },
});

Pass an existing Anthropic client as the first argument:

import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const client = new AnthropicRouter(anthropic, { telemetry: true });

Routing metadata

Every messages.create() response includes a routing field:

type RoutingMetadata = {
  requested: string;       // "auto" or the model you passed
  selected: string;        // model that was actually called
  selectedTier: "haiku" | "sonnet" | "opus";
  confidence: number;      // 0–1
  signals: string[];       // which signals fired
  override: string | null; // set when you pinned an explicit model
  retried: boolean;
  retryReason: string | null;
  fallback: string | null; // e.g. "rate_limit"
  latencyMs: number;       // classification time (not total request latency)
};

Accuracy

The classifier is tested against a labeled ground truth dataset of real API call patterns. The CI gate requires > 90% accuracy on every PR. Current: 100% on 30 examples.

Add your own real API calls to tests/ground-truth.ts to calibrate the classifier against your specific usage patterns before deploying.

What's out of scope in v0

Streaming routing with retry: messages.stream() classifies and routes but does not retry on low-confidence responses. Retry logic requires buffering the full response — v0.2.
Multi-provider routing (OpenAI, Gemini): planned post-v1.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

anthropic-router

The problem

Projected savings

Usage

Override: explicit model always wins

Streaming

Routing logic

Options

Routing metadata

Accuracy

What's out of scope in v0

License