npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

switchboard-llm

v0.1.1

Published

Intelligent multi-model LLM router. Routes prompts to the optimal provider (Claude, GPT-4o, Gemini, DeepSeek, Groq, Mistral, Together AI, and more) based on task type, cost, and quality.

Readme

switchboard-llm

npm version License: MIT TypeScript Node ≥ 18

Intelligent multi-model LLM router. Sends each prompt to the right model — automatically.


Most apps call one LLM for everything. That's like using a bulldozer to plant seeds.

Switchboard classifies each incoming prompt by task type and routes it to the model that wins on that category — not just in marketing, but in benchmarks and real cost. A bug fix goes to Codestral (beats GPT-4o on HumanEval, costs 1/10 as much). A 200-page document goes to Gemini (1M token context). A latency-critical autocomplete goes to Groq (300–800 tok/s on custom LPU silicon). A high-stakes legal summary runs on 4 frontier models simultaneously and takes the majority vote.

This is not a translation layer. This is not a marketplace. It's a router.


Quick Start

npm install switchboard-llm
import { route } from 'switchboard-llm';

// Auto-classify from prompt content
const result = await route({ prompt: 'Fix the off-by-one error in this sort function' });
// → routes to Codestral automatically

console.log(result.content);       // the answer
console.log(result.cost);          // USD, e.g. 0.000041
console.log(result.latencyMs);     // wall-clock ms

Routing Table

| Task Type | Primary | Fallback | Rationale | |-----------|---------|----------|-----------| | code | Codestral | DeepSeek | Top HumanEval scores at 1/10 GPT-4o cost | | reasoning | Claude | GPT-4o | Best multi-step analysis and instruction following | | creative | GPT-4o | Claude | Best copy, voice, and storytelling | | multimodal | Gemini 1.5 Pro | GPT-4o | 1M token context, native multimodal | | fast | Groq (Llama 3.3 70B) | GPT-4o mini | 300–800 tok/s on custom LPU hardware | | research | Gemini 1.5 Pro | Claude | Long-context synthesis and document analysis | | security | Claude | GPT-4o | Best safety-aware reasoning and threat modeling | | rag | Cohere Command R+ | Claude | Purpose-built for retrieval-augmented generation | | search | Perplexity Sonar | GPT-4o | Live web retrieval on every call | | consensus | Claude + GPT-4o + Gemini + DeepSeek | — | Parallel run, majority vote |


Features

Task-aware routing. The classifier reads your prompt and picks the right model before the LLM call. No config required. You can also pass type explicitly if you know what you need.

Groq speed routing. Groq runs Llama 3.3 70B on custom LPU hardware at 300–800 tokens per second — 10–25x faster than standard GPU inference. When latency matters (autocomplete, voice, streaming), type: 'fast' routes there by default.

Self-healing fallback. Switchboard tracks rolling success rates per provider. When a provider's rate drops below 70%, it automatically promotes the fallback for that task type. No manual intervention, no 500s leaking to users.

Swarm mode. Dispatch N independent tasks to N providers in parallel and collect all results simultaneously. Useful for pipelines where multiple LLM calls would otherwise run serially.

Consensus mode. Run 4 frontier models (Claude, GPT-4o, Gemini 1.5 Pro, DeepSeek) in parallel and take the majority vote. Built for high-stakes decisions where a single model hallucinating is unacceptable.

OpenAI drop-in proxy. Start the proxy server and change one line in your existing code. Switchboard intercepts the request, routes it intelligently, and returns an OpenAI-compatible response. Zero other code changes.

MCP-native. Runs as a Model Context Protocol tool server, so it works inside Claude Code and any other MCP-compatible agent.

Fully typed. Ships with TypeScript definitions for every public interface. Zero any.


Installation

npm install switchboard-llm

Environment Variables

Set the API keys for the providers you want active. Unused providers are simply skipped.

# Required for defaults — set at least these two:
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

# Optional — unlock additional routing targets:
GEMINI_API_KEY=AIza...
DEEPSEEK_API_KEY=sk-...
GROQ_API_KEY=gsk_...
MISTRAL_API_KEY=...           # enables Codestral (code routing) + Mistral Large
TOGETHER_API_KEY=...
PERPLEXITY_API_KEY=pplx-...
XAI_API_KEY=xai-...
COHERE_API_KEY=...

Usage

Auto-detect task type from prompt

import { route } from 'switchboard-llm';

const result = await route({
  prompt: 'Refactor this Python function to use a generator instead of a list',
});

console.log(result.provider);    // "codestral-latest"
console.log(result.content);
console.log(`Cost: $${result.cost.toFixed(6)}`);
console.log(`Latency: ${result.latencyMs}ms`);

Explicit task type

const result = await route({
  prompt: 'Write a cold email for a SaaS product targeting HR teams',
  type: 'creative',
});
// → routes to GPT-4o

Force a specific provider

const result = await route({
  prompt: 'Summarize these meeting notes',
  provider: 'gemini',   // bypass routing, always use this provider
});

With a system prompt

const result = await route({
  prompt: userMessage,
  type: 'reasoning',
  system: 'You are a senior software architect. Be concise and opinionated.',
  maxTokens: 2048,
});

Swarm mode — parallel dispatch

Dispatch multiple independent tasks simultaneously. Each is independently classified and routed.

import { swarm } from 'switchboard-llm';

const results = await swarm([
  { id: 'summary',  prompt: 'Summarize this 80-page contract',    type: 'research'  },
  { id: 'rewrite',  prompt: 'Rewrite this headline for LinkedIn', type: 'creative'  },
  { id: 'fix',      prompt: 'Fix the SQL injection in line 42',    type: 'code'      },
  { id: 'check',    prompt: 'Flag any GDPR compliance issues',     type: 'security'  },
]);

for (const { task, result, error } of results) {
  if (error) console.error(`${task.id} failed:`, error);
  else console.log(`${task.id} → ${result.provider} ($${result.cost.toFixed(6)})`);
}

Consensus mode — majority vote

Run 4 frontier models in parallel. The response that appears most often across models wins.

import { route } from 'switchboard-llm';
import type { ConsensusResult } from 'switchboard-llm';

const result = await route({
  prompt: 'Is this contract clause enforceable under Illinois law?',
  type: 'consensus',
}) as ConsensusResult;

console.log(result.winner.content);     // majority-vote answer
console.log(result.all.length);         // number of models that responded
console.log(result.failed);             // any that errored

OpenAI drop-in proxy

Start the proxy server:

npx switchboard-llm proxy --port 4141

Change one line in your existing code:

// Before:
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// After (zero other changes):
const client = new OpenAI({
  apiKey: 'switchboard',
  baseURL: 'http://localhost:4141/v1',
});

// Switchboard intercepts the request, routes intelligently, returns OpenAI-format response.
// Works with LangChain, LlamaIndex, Vercel AI SDK, anything that speaks OpenAI.
const response = await client.chat.completions.create({
  model: 'auto',   // Switchboard picks the model; or pass a task type here
  messages: [{ role: 'user', content: 'Fix the off-by-one error in this sort function' }],
});

MCP server

Add to your MCP config (e.g., mcp.json for Claude Code):

{
  "mcpServers": {
    "switchboard": {
      "command": "npx",
      "args": ["switchboard-llm", "mcp"]
    }
  }
}

Then use the route_prompt tool inside Claude Code or any MCP-compatible client:

route_prompt({ prompt: "...", type: "code" })

Create a custom client

For multi-tenant apps or when you need to override provider settings at runtime:

import { createClient } from 'switchboard-llm';

const client = createClient({
  providers: {
    groq: { maxTokens: 4096, costPer1kInput: 0.00059, costPer1kOutput: 0.00079 },
  },
});

const result = await client.route({ prompt: 'Translate this to Spanish', type: 'fast' });

Cost Comparison

The numbers below use real published API prices and a synthetic mixed workload of 1,000 requests distributed across task types.

Workload assumption: 30% code, 20% reasoning, 15% creative, 10% fast, 10% research, 5% security, 5% rag, 5% search. Average 500 input tokens / 400 output tokens per request.

| Routing Strategy | Avg cost / request | 1K requests | vs. GPT-4o baseline | |------------------|--------------------|-------------|----------------------| | Always GPT-4o | ~$0.00525 | ~$5.25 | baseline | | Switchboard auto-route | ~$0.00112 | ~$1.12 | 79% cheaper | | Always GPT-4o mini | ~$0.00027 | ~$0.27 | cheaper, but quality drops significantly on complex tasks |

Cost breakdown by task type with smart routing:

| Task | Model Used | Cost / 1K tokens (avg) | vs. GPT-4o | |------|------------|------------------------|------------| | code | Codestral | $0.00040 | 92% cheaper | | fast | Groq Llama 3.3 | $0.00069 | 87% cheaper | | research | Gemini 1.5 Pro | $0.00313 | 40% cheaper | | reasoning | Claude Sonnet | $0.00900 | similar | | consensus | 4 models parallel | $0.02100 | 4x (4 calls, worth it for critical decisions) |

Cost is tracked per-call and available on every result:

const result = await route({ prompt: '...', type: 'code' });
console.log(`$${result.cost.toFixed(6)}`);   // e.g. $0.000041

Providers

| ID | Model | Provider | Input / 1K tokens | Strength | |----|-------|----------|--------------------|----------| | claude | claude-sonnet-4-6 | Anthropic | $0.003 | Reasoning, safety, instruction following | | gpt4o | gpt-4o | OpenAI | $0.0025 | Creative, broad competence | | gpt4o-mini | gpt-4o-mini | OpenAI | $0.00015 | Low-cost fallback | | gemini | gemini-1.5-pro | Google | $0.00125 | 1M context, multimodal, research | | gemini-flash | gemini-1.5-flash | Google | $0.000075 | Fast + cheap multimodal | | deepseek | deepseek-chat | DeepSeek | $0.00027 | Strong code/reasoning at low cost | | groq | llama-3.3-70b-versatile | Groq | $0.00059 | 300–800 tok/s LPU hardware speed | | codestral | codestral-latest | Mistral | $0.00020 | Best-in-class code model | | together | Llama 3.3 70B Turbo | Together AI | $0.00088 | 50+ open models, flexible | | perplexity | sonar-pro | Perplexity | $0.003 | Live web search on every call | | xai | grok-2-latest | xAI | $0.002 | Real-time X/Twitter data | | cohere | command-r-plus | Cohere | $0.0025 | RAG-optimized, grounded generation |


Configuration

Routing rules live in src/config/routing.yaml and are fully overridable. Copy the file and set SWITCHBOARD_CONFIG to point to your version.

# routing.yaml — customize routing logic per task type
routing:
  code:
    primary: codestral
    fallback: deepseek

  # Override creative to use xAI Grok for real-time trend awareness:
  creative:
    primary: xai
    fallback: gpt4o

  # Add your own task type:
  internal-docs:
    primary: claude
    fallback: gpt4o
    description: "Internal documentation — prefers verbose, structured output"

# Classifier thresholds (token count heuristics for auto-routing):
classifier:
  simpleThreshold: 50     # prompts under 50 tokens → 'fast' route
  complexThreshold: 500   # prompts over 500 tokens → 'reasoning' route
  defaultRoute: fast
SWITCHBOARD_CONFIG=/path/to/my-routing.yaml node app.js

How Auto-Classification Works

The classifier uses a keyword-priority system with token-count heuristics as a secondary signal:

  1. Keyword scan — checks for strong signals (def , function, ````python, SQL, bug, etc.) to detect code; imagine, story, email` for creative; etc.
  2. Length heuristic — very short prompts (< 50 tokens) route to fast; very long prompts (> 500 tokens) route to reasoning when no stronger signal is found.
  3. Default — unclassified prompts fall back to classifier.defaultRoute (default: fast).

You can always override with type: 'reasoning' or any other task type.


Self-Healing Fallback

Switchboard tracks a rolling window of outcomes per (providerId, taskType) pair. When a provider's success rate for a task type falls below 70%, subsequent requests for that task type automatically route to the fallback provider.

The tracker is in-memory per process. Stats are available at runtime:

import { tracker } from 'switchboard-llm';

const stats = tracker.getStats();
// [{ providerId, taskType, calls, successRate, avgLatencyMs, avgCost, totalCost }, ...]

for (const s of stats) {
  console.log(`${s.providerId}/${s.taskType}: ${(s.successRate * 100).toFixed(1)}% success, $${s.totalCost.toFixed(4)} total`);
}

CLI

# Start the OpenAI-compatible proxy
npx switchboard-llm proxy --port 4141

# Start the MCP tool server
npx switchboard-llm mcp

# Route a single prompt (stdout)
npx switchboard-llm route --prompt "Fix the null check on line 42" --type code

Contributing

Pull requests are welcome. For large changes, open an issue first.

git clone https://github.com/your-org/switchboard-llm
cd switchboard-llm
npm install
npm run build
npm test

The project follows standard TypeScript conventions. All public APIs require typed interfaces. Tests use Vitest.

Adding a provider:

  1. Add a config entry to src/config/routing.yaml
  2. If the provider is OpenAI-compatible, set adapter: openai-compat — no code needed
  3. If it needs a custom adapter, add a file in src/providers/ implementing the BaseProvider interface
  4. Add it to src/providers/registry.ts
  5. Write a test

License

MIT — see LICENSE.


If this is useful, a star on GitHub helps other developers find it.