@illuma-ai/llm-router

v2.0.4

Published

3 months ago

Superfast semantic routing layer for LLMs and agents. Uses embedding similarity to classify queries into configurable route categories.

Downloads

311

@illuma-ai/llm-router

Superfast semantic routing layer for LLMs and agents. Uses embedding similarity to classify queries into configurable route categories. The library provides the routing engine; consumers define their own routes, utterances, and model mappings.

Installation

npm install @illuma-ai/llm-router

Quick Start

import { SemanticRouter, createEncoder } from '@illuma-ai/llm-router';

const encoder = createEncoder(); // defaults to Bedrock Titan; set LLM_ROUTER_ENCODER for others

const router = new SemanticRouter({
  encoder,
  routes: [
    {
      name: 'billing',
      utterances: ['payment issue', 'invoice question', 'subscription cost'],
      scoreThreshold: 0.3,
    },
    {
      name: 'technical',
      utterances: ['bug report', 'API error', 'integration help'],
      scoreThreshold: 0.3,
    },
  ],
  topK: 5,
  aggregation: 'mean',
});

await router.initialize(); // embeds all utterances (batch via encoder)

const result = await router.route('my payment failed');
console.log(result.name);            // 'billing'
console.log(result.similarityScore); // 0.52

How It Works

Query → Encoder → Embedding → Cosine Similarity (vs utterance index) → Best Route + Score

Initialize — all utterances are embedded and stored in an in-memory cosine similarity index.
Route — the user's query is embedded (1 API call), compared against the index, and the best-matching route above its scoreThreshold is returned.
No match — if no route clears its threshold, { name: null, similarityScore: 0 } is returned. The consumer decides the fallback.

Supported Encoders

Configure via LLM_ROUTER_ENCODER environment variable or pass programmatically.

| Encoder | Env Value | Batching | Notes | |---------|-----------|----------|-------| | AWS Bedrock Titan Embed v2 | bedrock (default) | 1 text/call, 5 concurrent | Symmetric encoding | | Cohere Embed v4 (via Bedrock) | cohere-bedrock | 96 texts/call | Asymmetric encoding (recommended) | | OpenAI Embeddings | openai | Native batch | Uses native fetch, zero extra deps | | Custom | — | — | Extend BaseEncoder |

Why Cohere is Recommended

Faster initialization: 96 texts per API call vs Titan's 1-at-a-time
Asymmetric encoding: separate search_query and search_document modes for better retrieval accuracy
Higher score range: 0.40–0.75 (vs Titan's 0.08–0.60) gives clearer route separation

Custom Encoder

Extend BaseEncoder to plug in any embedding provider:

import { BaseEncoder, SemanticRouter } from '@illuma-ai/llm-router';

class MyEncoder extends BaseEncoder {
  readonly name = 'my-encoder';
  readonly type = 'custom';
  readonly scoreThreshold = 0.3;

  async encode(docs: string[]): Promise<number[][]> {
    return Promise.all(docs.map(t => myEmbeddingService.encode(t)));
  }
}

const router = new SemanticRouter({
  encoder: new MyEncoder(),
  routes: myRoutes,
});
await router.initialize();

Warm-Up

Pre-initialize the router at application startup to eliminate cold-start latency on the first user query:

import { SemanticRouter, createEncoder } from '@illuma-ai/llm-router';

// At server startup
const router = new SemanticRouter({ encoder: createEncoder(), routes: myRoutes });
await router.initialize();
await router.route('warm-up'); // optional: force a throwaway query to fully warm caches

The router is a singleton in your application — initialize once, route for all users.

Environment Variables

| Variable | Default | Description | |----------|---------|-------------| | LLM_ROUTER_ENCODER | bedrock | Encoder type: bedrock, cohere-bedrock, openai | | LLM_ROUTER_EMBEDDING_DIMENSIONS | 1024 | Embedding dimensions (256, 512, 1024, or 1536 for Cohere) | | BEDROCK_AWS_ACCESS_KEY_ID | — | AWS access key (falls back to AWS_ACCESS_KEY_ID) | | BEDROCK_AWS_SECRET_ACCESS_KEY | — | AWS secret key | | BEDROCK_AWS_DEFAULT_REGION | us-east-1 | AWS region | | OPENAI_API_KEY | — | OpenAI API key (when using OpenAI encoder) |

Pre-built Route Definitions

The library ships with 3 pre-built model-tier routes (MODEL_TIER_ROUTES) as a reference. These are general-purpose and may not fit your use case — define your own routes for best results.

| Tier | Semantic Space | |------|---------------| | moderate | Greetings, Q&A, standard code, writing | | complex | Deep analysis, debugging, architecture | | expert | Research, academic, strategic planning |

Important: Route definitions belong in the consumer, not the library. The library provides the routing engine; you provide the utterances and model mapping that make sense for your domain.

API Reference

Core Classes

| Class | Description | |-------|-------------| | SemanticRouter | Main router — encoder + routes, handles init and routing | | BedrockTitanEncoder | AWS Bedrock Titan Embed v2 encoder | | CohereBedrockEncoder | Cohere Embed v4 via AWS Bedrock | | OpenAIEncoder | OpenAI embeddings encoder | | BaseEncoder | Abstract base class for custom encoders | | LocalIndex | In-memory cosine similarity index |

SemanticRouter Methods

| Method | Description | |--------|-------------| | initialize() | Embed all utterances and build the index. Call once at startup. | | route(query, options?) | Route a query — returns RouteChoice ({ name, similarityScore }) | | routeWithScores(query, options?) | Route with full breakdown — returns { choice, allScores } |

Factory Functions

| Function | Description | |----------|-------------| | createEncoder(config?) | Create an encoder from config or env vars | | registerEncoder(name, factory) | Register a custom encoder type |

Legacy High-Level API

These functions are retained for backwards compatibility but consumers should use SemanticRouter directly:

| Function | Description | |----------|-------------| | routeToModel(query, config?) | Route query to a concrete model ID (uses built-in presets) | | createModelTierRouter(config?) | Create/get singleton router with built-in tier routes | | warmUp(config?) | Pre-initialize the built-in singleton router |

Types

type ModelTier = 'moderate' | 'complex' | 'expert';

interface Route {
  name: string;
  utterances: string[];
  scoreThreshold: number;
  description?: string;
}

interface RouteChoice {
  name: string | null;
  similarityScore: number;
}

interface ScoredRoute {
  name: string;
  score: number;
}

Testing

npm test                  # Unit tests
npm run test:integration  # Integration tests (requires AWS credentials)
npm run test:e2e          # End-to-end accuracy tests
npm run test:coverage     # Coverage report

Building

npm run build       # Build CJS + ESM bundles to dist/
npm run type-check  # TypeScript type checking

Issues & Support

Report bugs and request features at github.com/illuma-ai/llm-router-issues.

For licensing inquiries or modification permissions, contact: [email protected]

License

You are free to use, copy, and distribute this software. You may not provide it as a hosted/managed service. See LICENSE for full terms.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@illuma-ai/llm-router

Installation

Quick Start

How It Works

Supported Encoders

Why Cohere is Recommended

Custom Encoder

Warm-Up

Environment Variables

Pre-built Route Definitions

API Reference

Core Classes

SemanticRouter Methods

Factory Functions

Legacy High-Level API

Types

Testing

Building

Issues & Support

License