llm-cost-router

v1.0.0

Published

a month ago

Automatically route LLM requests to the right model based on complexity, cost, and speed. Stop overpaying for simple prompts.

llm-router

Automatically route LLM requests to the right model based on complexity, cost, and speed. Stop paying Opus prices for tasks that Haiku can handle.

npm install llm-router

The Problem

// You're doing this:
const result = await anthropic.complete(prompt, { model: "claude-opus-4-6" })

// For EVERY request. Including:
// "Is this email valid?"           → should be Haiku ($0.00025/1K)
// "Translate this to French"       → should be Haiku
// "Summarize this article"         → should be Sonnet
// "Debug this race condition"      → fine, use Opus

Typical savings: 60–80% on LLM API costs with zero quality loss.

Quick Start

import { createRouter } from "llm-router"

const router = createRouter({
  routes: [
    { model: "claude-haiku-4-5-20251001", cost: "low",    for: ["classify", "validate"] },
    { model: "claude-sonnet-4-6",         cost: "medium", for: ["summarize", "rewrite"] },
    { model: "claude-opus-4-6",           cost: "high",   for: ["debug", "architect"] },
  ],
  fallback: "claude-sonnet-4-6",
})

// Routing decision — no API call needed
const decision = router.route("Is this email format valid?")
// { model: "claude-haiku-4-5-20251001", tier: "low", reason: "...", confidence: 0.8 }

// Full completion — attach your own provider
router.useProvider(myAnthropicProvider)
const result = await router.complete("Debug this race condition across 3 files...")
// result.model → "claude-opus-4-6" (routed automatically)
// result.usage.estimatedCost → 0.0032

Routing Strategies

`rules` (default) — Fast, zero extra API call

Analyzes prompt length, keywords, structure. Decision in under 1ms.

createRouter({ ...config, strategy: "rules" })

`smart` — Intent-based matching

Matches against a curated intent map. More accurate for ambiguous prompts.

createRouter({ ...config, strategy: "smart" })

Budget Control

const router = createRouter({
  ...config,
  budget: {
    maxCostPerRequest: 0.05,   // $0.05 hard limit per call
  },
  onBudgetExceeded: "fallback",  // "throw" | "skip" | "fallback"
})

| Behaviour | What happens | |---|---| | "throw" | Throws an error before the API call | | "skip" | Returns empty response, no API call | | "fallback" | Downgrades to cheapest available model |

Cost Estimation

const cost = router.estimateCost("classify this as spam or not")
// → 0.000018  (estimated USD before making the call)

Force a Model

const result = await router.complete(prompt, {
  forceModel: "claude-opus-4-6",   // bypass routing
})

Observe Routing Decisions

createRouter({
  ...config,
  onRoute: (prompt, model, reason) => {
    console.log(`[router] ${model} — ${reason}`)
  },
})

Supported Models (built-in cost table)

| Model | Tier | Cost/1K tokens | |---|---|---| | claude-haiku-4-5-20251001 | low | $0.00025 | | claude-sonnet-4-6 | medium | $0.003 | | claude-opus-4-6 | high | $0.015 | | gpt-4o-mini | low | $0.00015 | | gpt-4o | medium | $0.005 | | gemini-1.5-flash | low | $0.000075 |

Any model string works — unknown models fall back to a default cost estimate.

TypeScript

Fully typed. All config, decisions, and responses are typed end to end.

import type { RouterConfig, RoutingDecision, LLMResponse } from "llm-router"

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme