llm-switchboard
v1.0.4
Published
Blazing-fast, zero-cost local LLM router. Classify and route prompts to specialized AI models (OpenAI, Claude, Gemini, Llama) with <1ms latency using heuristic rules.
Maintainers
Readme
🌟 Key Features
- 💸 Zero-Cost Routing: Runs 100% locally. No expensive LLM-based classification calls.
- ⚡ Ultra-Low Latency: Heuristic-based classification adds less than 1ms to your stack.
- 🧠 Tiered Intelligence: Automatically maps prompts to
SIMPLE,MEDIUM,COMPLEX, orREASONINGtiers. - 🤖 Agentic Detection: Specialized logic to identify multi-step, tool-heavy tasks.
- 🌍 Multilingual Support: Native intent detection for 10+ major languages.
- 🛠️ Developer First: Type-safe, customizable, and works with Bun, Node.js, and Deno.
🚀 Why llm-switchboard?
In high-volume AI applications, using high-end models (like GPT-4o or Claude 3.5 Sonnet) for every request is a waste of both time and money. Traditional routers use another LLM call to classify the prompt, which adds latency and cost.
llm-switchboard solves this by using a high-performance heuristic engine that scores prompts across 14 weighted dimensions instantly.
📦 Installation
# Using Bun (Recommended)
bun install llm-switchboard
# Using NPM
npm install llm-switchboard
# Using Yarn
yarn add llm-switchboard🚦 Smart Tiering System
llm-switchboard classifies every prompt into one of four tiers, allowing you to map specific models to specific task complexities.
| Tier | Task Type | Ideal For | Default Model |
| :--- | :--- | :--- | :--- |
| 🟢 SIMPLE | Utility | Greetings, yes/no, simple data extraction. | moonshot/kimi-k2.5 |
| 🟡 MEDIUM | Creative | Summarization, standard chat, basic coding. | xai/grok-code-fast-1 |
| 🔴 COMPLEX | Technical | Systems design, deep analysis, large context. | google/gemini-3.1-pro-preview |
| 🧠 REASONING| Logic | Math, proofs, complex debugging, multi-step logic. | xai/grok-4-1-fast-reasoning |
📖 Usage
⚙️ Global Configuration
Set your model preferences once at application startup.
import { configureRouter, getProductionModel } from "llm-switchboard";
// Configure your routing table
configureRouter({
tiers: {
SIMPLE: { primary: "meta-llama/llama-3-8b-instruct" },
MEDIUM: { primary: "anthropic/claude-3-haiku" }
},
agenticTiers: {
// Models highly optimized for multi-step tool use
COMPLEX: { primary: "anthropic/claude-3-5-sonnet-20241022" },
REASONING: { primary: "openai/o3-mini" }
},
overrides: {
agenticMode: true
}
});
// Get the best model for a prompt
const model = getProductionModel("What is the weather like in Tokyo?");
console.log(model); // => "meta-llama/llama-3-8b-instruct"Configuration Parameters:
tiers: The standard routing table mapping task complexity (SIMPLE,MEDIUM,COMPLEX,REASONING) to specific models. Each tier requires aprimarymodel.agenticTiers: An alternative routing table. WhenagenticModeis true (or when the router automatically detects a multi-step agentic prompt), it routes the request to models defined here instead. This allows you to keep standard workloads cheap while reserving premium tool-calling models for agentic tasks.overrides.agenticMode: A boolean (true/false). When set totrue, it forces the router to ALWAYS prefer models from theagenticTiersconfig, ignoring standard tiers.
🎯 Per-Request Overrides
Override global settings for specific, high-priority, or sensitive prompts without affecting the rest of your app.
const prompt = "Analyze this highly confidential dataset.";
const model = getProductionModel(prompt, {
customTiers: {
COMPLEX: {
primary: "local-mixtral-8x7b"
}
},
customAgenticTiers: {
// Override the global agentic tier for this request
REASONING: { primary: "deepseek/deepseek-r1" }
},
agenticMode: false // Explicitly bypass agentic routing for this single prompt
});Per-Request Parameters:
customTiers: Deep-merges with the globaltiersmapping for this specific request.customAgenticTiers: Deep-merges with the globalagenticTiersmapping.agenticMode: (boolean) Enable or disable agent-optimized model selection strictly for this prompt.
📊 How it Works
The classification engine analyzes prompts across multiple dimensions including:
- Token Density: Estimating semantic weight vs. length.
- Syntactic Markers: Detecting code chunks, mathematical notation, and imperative verbs.
- Instruction Depth: Identifying complex formatting demands (JSON, Tables, CSV).
- Agentic Signatures: Multi-step planning patterns and tool-use intent.
- Domain Context: Scanning for technical terminology and high-entropy keywords.
🧪 Development & Testing
We include a comprehensive test suite to help you benchmark classification accuracy.
bun run test📄 License
MIT © Uo1428
