@illuma-ai/llm-router
v2.0.4
Published
Superfast semantic routing layer for LLMs and agents. Uses embedding similarity to classify queries into configurable route categories.
Maintainers
Readme
@illuma-ai/llm-router
Superfast semantic routing layer for LLMs and agents. Uses embedding similarity to classify queries into configurable route categories. The library provides the routing engine; consumers define their own routes, utterances, and model mappings.
Installation
npm install @illuma-ai/llm-routerQuick Start
import { SemanticRouter, createEncoder } from '@illuma-ai/llm-router';
const encoder = createEncoder(); // defaults to Bedrock Titan; set LLM_ROUTER_ENCODER for others
const router = new SemanticRouter({
encoder,
routes: [
{
name: 'billing',
utterances: ['payment issue', 'invoice question', 'subscription cost'],
scoreThreshold: 0.3,
},
{
name: 'technical',
utterances: ['bug report', 'API error', 'integration help'],
scoreThreshold: 0.3,
},
],
topK: 5,
aggregation: 'mean',
});
await router.initialize(); // embeds all utterances (batch via encoder)
const result = await router.route('my payment failed');
console.log(result.name); // 'billing'
console.log(result.similarityScore); // 0.52How It Works
Query → Encoder → Embedding → Cosine Similarity (vs utterance index) → Best Route + Score- Initialize — all utterances are embedded and stored in an in-memory cosine similarity index.
- Route — the user's query is embedded (1 API call), compared against the index, and the best-matching route above its
scoreThresholdis returned. - No match — if no route clears its threshold,
{ name: null, similarityScore: 0 }is returned. The consumer decides the fallback.
Supported Encoders
Configure via LLM_ROUTER_ENCODER environment variable or pass programmatically.
| Encoder | Env Value | Batching | Notes |
|---------|-----------|----------|-------|
| AWS Bedrock Titan Embed v2 | bedrock (default) | 1 text/call, 5 concurrent | Symmetric encoding |
| Cohere Embed v4 (via Bedrock) | cohere-bedrock | 96 texts/call | Asymmetric encoding (recommended) |
| OpenAI Embeddings | openai | Native batch | Uses native fetch, zero extra deps |
| Custom | — | — | Extend BaseEncoder |
Why Cohere is Recommended
- Faster initialization: 96 texts per API call vs Titan's 1-at-a-time
- Asymmetric encoding: separate
search_queryandsearch_documentmodes for better retrieval accuracy - Higher score range: 0.40–0.75 (vs Titan's 0.08–0.60) gives clearer route separation
Custom Encoder
Extend BaseEncoder to plug in any embedding provider:
import { BaseEncoder, SemanticRouter } from '@illuma-ai/llm-router';
class MyEncoder extends BaseEncoder {
readonly name = 'my-encoder';
readonly type = 'custom';
readonly scoreThreshold = 0.3;
async encode(docs: string[]): Promise<number[][]> {
return Promise.all(docs.map(t => myEmbeddingService.encode(t)));
}
}
const router = new SemanticRouter({
encoder: new MyEncoder(),
routes: myRoutes,
});
await router.initialize();Warm-Up
Pre-initialize the router at application startup to eliminate cold-start latency on the first user query:
import { SemanticRouter, createEncoder } from '@illuma-ai/llm-router';
// At server startup
const router = new SemanticRouter({ encoder: createEncoder(), routes: myRoutes });
await router.initialize();
await router.route('warm-up'); // optional: force a throwaway query to fully warm cachesThe router is a singleton in your application — initialize once, route for all users.
Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| LLM_ROUTER_ENCODER | bedrock | Encoder type: bedrock, cohere-bedrock, openai |
| LLM_ROUTER_EMBEDDING_DIMENSIONS | 1024 | Embedding dimensions (256, 512, 1024, or 1536 for Cohere) |
| BEDROCK_AWS_ACCESS_KEY_ID | — | AWS access key (falls back to AWS_ACCESS_KEY_ID) |
| BEDROCK_AWS_SECRET_ACCESS_KEY | — | AWS secret key |
| BEDROCK_AWS_DEFAULT_REGION | us-east-1 | AWS region |
| OPENAI_API_KEY | — | OpenAI API key (when using OpenAI encoder) |
Pre-built Route Definitions
The library ships with 3 pre-built model-tier routes (MODEL_TIER_ROUTES) as a reference. These are general-purpose and may not fit your use case — define your own routes for best results.
| Tier | Semantic Space |
|------|---------------|
| moderate | Greetings, Q&A, standard code, writing |
| complex | Deep analysis, debugging, architecture |
| expert | Research, academic, strategic planning |
Important: Route definitions belong in the consumer, not the library. The library provides the routing engine; you provide the utterances and model mapping that make sense for your domain.
API Reference
Core Classes
| Class | Description |
|-------|-------------|
| SemanticRouter | Main router — encoder + routes, handles init and routing |
| BedrockTitanEncoder | AWS Bedrock Titan Embed v2 encoder |
| CohereBedrockEncoder | Cohere Embed v4 via AWS Bedrock |
| OpenAIEncoder | OpenAI embeddings encoder |
| BaseEncoder | Abstract base class for custom encoders |
| LocalIndex | In-memory cosine similarity index |
SemanticRouter Methods
| Method | Description |
|--------|-------------|
| initialize() | Embed all utterances and build the index. Call once at startup. |
| route(query, options?) | Route a query — returns RouteChoice ({ name, similarityScore }) |
| routeWithScores(query, options?) | Route with full breakdown — returns { choice, allScores } |
Factory Functions
| Function | Description |
|----------|-------------|
| createEncoder(config?) | Create an encoder from config or env vars |
| registerEncoder(name, factory) | Register a custom encoder type |
Legacy High-Level API
These functions are retained for backwards compatibility but consumers should use SemanticRouter directly:
| Function | Description |
|----------|-------------|
| routeToModel(query, config?) | Route query to a concrete model ID (uses built-in presets) |
| createModelTierRouter(config?) | Create/get singleton router with built-in tier routes |
| warmUp(config?) | Pre-initialize the built-in singleton router |
Types
type ModelTier = 'moderate' | 'complex' | 'expert';
interface Route {
name: string;
utterances: string[];
scoreThreshold: number;
description?: string;
}
interface RouteChoice {
name: string | null;
similarityScore: number;
}
interface ScoredRoute {
name: string;
score: number;
}Testing
npm test # Unit tests
npm run test:integration # Integration tests (requires AWS credentials)
npm run test:e2e # End-to-end accuracy tests
npm run test:coverage # Coverage reportBuilding
npm run build # Build CJS + ESM bundles to dist/
npm run type-check # TypeScript type checkingIssues & Support
Report bugs and request features at github.com/illuma-ai/llm-router-issues.
For licensing inquiries or modification permissions, contact: [email protected]
License
Elastic License 2.0 (ELv2) — Copyright (c) 2024-2026 Illuma AI.
You are free to use, copy, and distribute this software. You may not provide it as a hosted/managed service. See LICENSE for full terms.
