switchboard-llm
v0.1.1
Published
Intelligent multi-model LLM router. Routes prompts to the optimal provider (Claude, GPT-4o, Gemini, DeepSeek, Groq, Mistral, Together AI, and more) based on task type, cost, and quality.
Maintainers
Readme
switchboard-llm
Intelligent multi-model LLM router. Sends each prompt to the right model — automatically.
Most apps call one LLM for everything. That's like using a bulldozer to plant seeds.
Switchboard classifies each incoming prompt by task type and routes it to the model that wins on that category — not just in marketing, but in benchmarks and real cost. A bug fix goes to Codestral (beats GPT-4o on HumanEval, costs 1/10 as much). A 200-page document goes to Gemini (1M token context). A latency-critical autocomplete goes to Groq (300–800 tok/s on custom LPU silicon). A high-stakes legal summary runs on 4 frontier models simultaneously and takes the majority vote.
This is not a translation layer. This is not a marketplace. It's a router.
Quick Start
npm install switchboard-llmimport { route } from 'switchboard-llm';
// Auto-classify from prompt content
const result = await route({ prompt: 'Fix the off-by-one error in this sort function' });
// → routes to Codestral automatically
console.log(result.content); // the answer
console.log(result.cost); // USD, e.g. 0.000041
console.log(result.latencyMs); // wall-clock msRouting Table
| Task Type | Primary | Fallback | Rationale |
|-----------|---------|----------|-----------|
| code | Codestral | DeepSeek | Top HumanEval scores at 1/10 GPT-4o cost |
| reasoning | Claude | GPT-4o | Best multi-step analysis and instruction following |
| creative | GPT-4o | Claude | Best copy, voice, and storytelling |
| multimodal | Gemini 1.5 Pro | GPT-4o | 1M token context, native multimodal |
| fast | Groq (Llama 3.3 70B) | GPT-4o mini | 300–800 tok/s on custom LPU hardware |
| research | Gemini 1.5 Pro | Claude | Long-context synthesis and document analysis |
| security | Claude | GPT-4o | Best safety-aware reasoning and threat modeling |
| rag | Cohere Command R+ | Claude | Purpose-built for retrieval-augmented generation |
| search | Perplexity Sonar | GPT-4o | Live web retrieval on every call |
| consensus | Claude + GPT-4o + Gemini + DeepSeek | — | Parallel run, majority vote |
Features
Task-aware routing. The classifier reads your prompt and picks the right model before the LLM call. No config required. You can also pass type explicitly if you know what you need.
Groq speed routing. Groq runs Llama 3.3 70B on custom LPU hardware at 300–800 tokens per second — 10–25x faster than standard GPU inference. When latency matters (autocomplete, voice, streaming), type: 'fast' routes there by default.
Self-healing fallback. Switchboard tracks rolling success rates per provider. When a provider's rate drops below 70%, it automatically promotes the fallback for that task type. No manual intervention, no 500s leaking to users.
Swarm mode. Dispatch N independent tasks to N providers in parallel and collect all results simultaneously. Useful for pipelines where multiple LLM calls would otherwise run serially.
Consensus mode. Run 4 frontier models (Claude, GPT-4o, Gemini 1.5 Pro, DeepSeek) in parallel and take the majority vote. Built for high-stakes decisions where a single model hallucinating is unacceptable.
OpenAI drop-in proxy. Start the proxy server and change one line in your existing code. Switchboard intercepts the request, routes it intelligently, and returns an OpenAI-compatible response. Zero other code changes.
MCP-native. Runs as a Model Context Protocol tool server, so it works inside Claude Code and any other MCP-compatible agent.
Fully typed. Ships with TypeScript definitions for every public interface. Zero any.
Installation
npm install switchboard-llmEnvironment Variables
Set the API keys for the providers you want active. Unused providers are simply skipped.
# Required for defaults — set at least these two:
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
# Optional — unlock additional routing targets:
GEMINI_API_KEY=AIza...
DEEPSEEK_API_KEY=sk-...
GROQ_API_KEY=gsk_...
MISTRAL_API_KEY=... # enables Codestral (code routing) + Mistral Large
TOGETHER_API_KEY=...
PERPLEXITY_API_KEY=pplx-...
XAI_API_KEY=xai-...
COHERE_API_KEY=...Usage
Auto-detect task type from prompt
import { route } from 'switchboard-llm';
const result = await route({
prompt: 'Refactor this Python function to use a generator instead of a list',
});
console.log(result.provider); // "codestral-latest"
console.log(result.content);
console.log(`Cost: $${result.cost.toFixed(6)}`);
console.log(`Latency: ${result.latencyMs}ms`);Explicit task type
const result = await route({
prompt: 'Write a cold email for a SaaS product targeting HR teams',
type: 'creative',
});
// → routes to GPT-4oForce a specific provider
const result = await route({
prompt: 'Summarize these meeting notes',
provider: 'gemini', // bypass routing, always use this provider
});With a system prompt
const result = await route({
prompt: userMessage,
type: 'reasoning',
system: 'You are a senior software architect. Be concise and opinionated.',
maxTokens: 2048,
});Swarm mode — parallel dispatch
Dispatch multiple independent tasks simultaneously. Each is independently classified and routed.
import { swarm } from 'switchboard-llm';
const results = await swarm([
{ id: 'summary', prompt: 'Summarize this 80-page contract', type: 'research' },
{ id: 'rewrite', prompt: 'Rewrite this headline for LinkedIn', type: 'creative' },
{ id: 'fix', prompt: 'Fix the SQL injection in line 42', type: 'code' },
{ id: 'check', prompt: 'Flag any GDPR compliance issues', type: 'security' },
]);
for (const { task, result, error } of results) {
if (error) console.error(`${task.id} failed:`, error);
else console.log(`${task.id} → ${result.provider} ($${result.cost.toFixed(6)})`);
}Consensus mode — majority vote
Run 4 frontier models in parallel. The response that appears most often across models wins.
import { route } from 'switchboard-llm';
import type { ConsensusResult } from 'switchboard-llm';
const result = await route({
prompt: 'Is this contract clause enforceable under Illinois law?',
type: 'consensus',
}) as ConsensusResult;
console.log(result.winner.content); // majority-vote answer
console.log(result.all.length); // number of models that responded
console.log(result.failed); // any that erroredOpenAI drop-in proxy
Start the proxy server:
npx switchboard-llm proxy --port 4141Change one line in your existing code:
// Before:
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// After (zero other changes):
const client = new OpenAI({
apiKey: 'switchboard',
baseURL: 'http://localhost:4141/v1',
});
// Switchboard intercepts the request, routes intelligently, returns OpenAI-format response.
// Works with LangChain, LlamaIndex, Vercel AI SDK, anything that speaks OpenAI.
const response = await client.chat.completions.create({
model: 'auto', // Switchboard picks the model; or pass a task type here
messages: [{ role: 'user', content: 'Fix the off-by-one error in this sort function' }],
});MCP server
Add to your MCP config (e.g., mcp.json for Claude Code):
{
"mcpServers": {
"switchboard": {
"command": "npx",
"args": ["switchboard-llm", "mcp"]
}
}
}Then use the route_prompt tool inside Claude Code or any MCP-compatible client:
route_prompt({ prompt: "...", type: "code" })Create a custom client
For multi-tenant apps or when you need to override provider settings at runtime:
import { createClient } from 'switchboard-llm';
const client = createClient({
providers: {
groq: { maxTokens: 4096, costPer1kInput: 0.00059, costPer1kOutput: 0.00079 },
},
});
const result = await client.route({ prompt: 'Translate this to Spanish', type: 'fast' });Cost Comparison
The numbers below use real published API prices and a synthetic mixed workload of 1,000 requests distributed across task types.
Workload assumption: 30% code, 20% reasoning, 15% creative, 10% fast, 10% research, 5% security, 5% rag, 5% search. Average 500 input tokens / 400 output tokens per request.
| Routing Strategy | Avg cost / request | 1K requests | vs. GPT-4o baseline | |------------------|--------------------|-------------|----------------------| | Always GPT-4o | ~$0.00525 | ~$5.25 | baseline | | Switchboard auto-route | ~$0.00112 | ~$1.12 | 79% cheaper | | Always GPT-4o mini | ~$0.00027 | ~$0.27 | cheaper, but quality drops significantly on complex tasks |
Cost breakdown by task type with smart routing:
| Task | Model Used | Cost / 1K tokens (avg) | vs. GPT-4o | |------|------------|------------------------|------------| | code | Codestral | $0.00040 | 92% cheaper | | fast | Groq Llama 3.3 | $0.00069 | 87% cheaper | | research | Gemini 1.5 Pro | $0.00313 | 40% cheaper | | reasoning | Claude Sonnet | $0.00900 | similar | | consensus | 4 models parallel | $0.02100 | 4x (4 calls, worth it for critical decisions) |
Cost is tracked per-call and available on every result:
const result = await route({ prompt: '...', type: 'code' });
console.log(`$${result.cost.toFixed(6)}`); // e.g. $0.000041Providers
| ID | Model | Provider | Input / 1K tokens | Strength |
|----|-------|----------|--------------------|----------|
| claude | claude-sonnet-4-6 | Anthropic | $0.003 | Reasoning, safety, instruction following |
| gpt4o | gpt-4o | OpenAI | $0.0025 | Creative, broad competence |
| gpt4o-mini | gpt-4o-mini | OpenAI | $0.00015 | Low-cost fallback |
| gemini | gemini-1.5-pro | Google | $0.00125 | 1M context, multimodal, research |
| gemini-flash | gemini-1.5-flash | Google | $0.000075 | Fast + cheap multimodal |
| deepseek | deepseek-chat | DeepSeek | $0.00027 | Strong code/reasoning at low cost |
| groq | llama-3.3-70b-versatile | Groq | $0.00059 | 300–800 tok/s LPU hardware speed |
| codestral | codestral-latest | Mistral | $0.00020 | Best-in-class code model |
| together | Llama 3.3 70B Turbo | Together AI | $0.00088 | 50+ open models, flexible |
| perplexity | sonar-pro | Perplexity | $0.003 | Live web search on every call |
| xai | grok-2-latest | xAI | $0.002 | Real-time X/Twitter data |
| cohere | command-r-plus | Cohere | $0.0025 | RAG-optimized, grounded generation |
Configuration
Routing rules live in src/config/routing.yaml and are fully overridable. Copy the file and set SWITCHBOARD_CONFIG to point to your version.
# routing.yaml — customize routing logic per task type
routing:
code:
primary: codestral
fallback: deepseek
# Override creative to use xAI Grok for real-time trend awareness:
creative:
primary: xai
fallback: gpt4o
# Add your own task type:
internal-docs:
primary: claude
fallback: gpt4o
description: "Internal documentation — prefers verbose, structured output"
# Classifier thresholds (token count heuristics for auto-routing):
classifier:
simpleThreshold: 50 # prompts under 50 tokens → 'fast' route
complexThreshold: 500 # prompts over 500 tokens → 'reasoning' route
defaultRoute: fastSWITCHBOARD_CONFIG=/path/to/my-routing.yaml node app.jsHow Auto-Classification Works
The classifier uses a keyword-priority system with token-count heuristics as a secondary signal:
- Keyword scan — checks for strong signals (
def,function, ````python,SQL,bug, etc.) to detect code;imagine,story,email` for creative; etc. - Length heuristic — very short prompts (< 50 tokens) route to
fast; very long prompts (> 500 tokens) route toreasoningwhen no stronger signal is found. - Default — unclassified prompts fall back to
classifier.defaultRoute(default:fast).
You can always override with type: 'reasoning' or any other task type.
Self-Healing Fallback
Switchboard tracks a rolling window of outcomes per (providerId, taskType) pair. When a provider's success rate for a task type falls below 70%, subsequent requests for that task type automatically route to the fallback provider.
The tracker is in-memory per process. Stats are available at runtime:
import { tracker } from 'switchboard-llm';
const stats = tracker.getStats();
// [{ providerId, taskType, calls, successRate, avgLatencyMs, avgCost, totalCost }, ...]
for (const s of stats) {
console.log(`${s.providerId}/${s.taskType}: ${(s.successRate * 100).toFixed(1)}% success, $${s.totalCost.toFixed(4)} total`);
}CLI
# Start the OpenAI-compatible proxy
npx switchboard-llm proxy --port 4141
# Start the MCP tool server
npx switchboard-llm mcp
# Route a single prompt (stdout)
npx switchboard-llm route --prompt "Fix the null check on line 42" --type codeContributing
Pull requests are welcome. For large changes, open an issue first.
git clone https://github.com/your-org/switchboard-llm
cd switchboard-llm
npm install
npm run build
npm testThe project follows standard TypeScript conventions. All public APIs require typed interfaces. Tests use Vitest.
Adding a provider:
- Add a config entry to
src/config/routing.yaml - If the provider is OpenAI-compatible, set
adapter: openai-compat— no code needed - If it needs a custom adapter, add a file in
src/providers/implementing theBaseProviderinterface - Add it to
src/providers/registry.ts - Write a test
License
MIT — see LICENSE.
If this is useful, a star on GitHub helps other developers find it.
