@orchestrated-ai/semantic-router

v1.0.6

Published

8 months ago

Hybrid semantic router with BM25 + dense embedding search

0High
0Medium
0Low

startlabtech

semantic router bm25 embeddings nlp

Semantic Router Package

Technical documentation for the core semantic-router library

📦 Installation

npm install semantic-router
# or
pnpm add semantic-router

🚀 Quick Start

1. Define Your Routes

Create a routes.json file with your intent definitions:

[
  {
    "id": "greeting",
    "name": "Greeting",
    "description": "Handle greetings and hello messages",
    "keywords": ["hello", "hi", "greetings"],
    "examples": ["hello", "hi there", "hey", "good morning", "greetings"],
    "metadata": {
      "category": "social",
      "priority": 1,
      "responseType": "friendly"
    }
  },
  {
    "id": "help_request",
    "name": "Help Request",
    "description": "User needs help or assistance",
    "examples": ["help", "I need help", "can you assist me", "support please"],
    "metadata": {
      "category": "support",
      "priority": 0,
      "urgent": true
    }
  }
]

2. Train the Model

semantic-router fit --routes routes.json --out .semantic-router --embedder base

This generates:

BM25 search index for keyword matching
Dense embeddings for semantic similarity
Strongly typed TypeScript definitions
Metadata and configuration

3. Use in Your Application

import { routeQuery } from "semantic-router/inference";

// Automatic type safety thanks to generated types!
const result = await routeQuery("hello there");

if (result.winningRoute) {
  console.log(`Route: ${result.winningRoute.name}`);
  console.log(`ID: ${result.winningRoute.id}`); // Typed as union of your route IDs
  console.log(`Score: ${result.ranking[0]?.hybridScore}`);
}

📚 API Reference

Training API

`fitRoutes(routes, options)`

Trains a semantic router model from route definitions.

import { fitRoutes } from "semantic-router";
import { HashEmbedder } from "semantic-router/embedders";

const routes = [
  /* your routes */
];
const embedder = new HashEmbedder();

const artifacts = await fitRoutes(routes, {
  embedder,
  outDir: "./.semantic-router",
  split: {
    seed: 42,
    proportions: { train: 0.8, val: 0.1, test: 0.1 },
  },
  autoTune: {
    enabled: true,
    target: "balanced", // 'accuracy' | 'speed' | 'recall'
    maxConfigs: 20,
  },
});

Options:

embedder: Embedding model (HashEmbedder, OpenAIEmbedder)
outDir: Output directory for artifacts
split: Train/validation/test split configuration
leakage: Control what metadata is included in training
bm25: BM25 parameters (k1, b, k)
autoTune: Automatic hyperparameter optimization
onProgress: Progress callback function

Inference API

`routeQuery(query, options?)`

Routes a query to the best matching route.

import { routeQuery } from "semantic-router/inference";

const result = await routeQuery("I need assistance", {
  sqlitePath: "./.semantic-router/meta.db", // Auto-detected if not provided
  topK: 20,
  alpha: 0.5,
  decisionThreshold: 0.3,
  optimizationTarget: "balanced",
});

Options:

embedder: Override embedder (auto-loaded from training)
sqlitePath: Database path (defaults to .semantic-router/meta.db)
topK: Number of candidates to consider (auto-tuned if available)
alpha: BM25 vs embedding weight (auto-tuned if available)
decisionThreshold: Minimum score for route selection (auto-tuned if available)
optimizationTarget: Which auto-tune result to use
tokenizer: Custom tokenization function

Returns:

interface RoutingResult {
  winningRoute: Route | null;
  ranking: RankingRow[];
  topKUsed: number;
  explanation: string;
}

`typedRouteQuery<T>(query, options?)`

Type-safe version that works with generated route types:

import { typedRouteQuery } from "semantic-router/inference";
import type { GeneratedRoute } from "semantic-router/inference";

const result = await typedRouteQuery<GeneratedRoute>("hello");
// result.winningRoute is now strongly typed!

Embedders

HashEmbedder (Deterministic Baseline)

import { HashEmbedder } from "semantic-router/embedders";

const embedder = new HashEmbedder(256); // 256-dimensional
const embeddings = await embedder.embed(["hello", "world"]);

Deterministic: Same input always produces same output
Fast: No network calls or GPU required
Baseline: Good for testing and development

OpenAIEmbedder

import { OpenAIEmbedder } from "semantic-router/embedders";

const embedder = new OpenAIEmbedder("text-embedding-3-small");
const embeddings = await embedder.embed(["hello", "world"]);

High Quality: State-of-the-art semantic understanding
Requires API Key: Set OPENAI_API_KEY environment variable
Network Dependent: Requires internet connection

CLI Usage

Fit Command

semantic-router fit [options]

Options:
  --routes <file>          Routes JSON file (required)
  --out <dir>              Output directory (default: ".semantic-router")
  --embedder <type>        Embedder: "base" or "openai" (default: "base")
  --seed <seed>            Random seed (default: "42")
  --train <p>              Train split proportion (default: "0.8")
  --val <p>                Validation split proportion (default: "0.1")
  --test <p>               Test split proportion (default: "0.1")
  --bm25-k1 <k1>          BM25 k1 parameter
  --bm25-b <b>            BM25 b parameter
  --bm25-k <k>            BM25 k parameter
  --auto-tune             Enable auto-tuning
  --tune-target <target>   Optimization target (default: "balanced")
  --tune-max-configs <n>   Max configurations to test (default: 20)
  --tune-timeout <ms>      Max tuning time in milliseconds

🔧 Advanced Configuration

BM25 Parameters

Control sparse retrieval behavior:

const bm25Config = {
  k1: 1.2, // Term frequency saturation (0.8-2.0)
  b: 0.75, // Document length normalization (0.55-0.75)
  k: 1.0, // IDF management
  fldWeights: {
    name: 3.0, // Route name gets 3x weight
    description: 1.0, // Description gets normal weight
    keywords: 2.0, // Keywords get 2x weight
  },
};

Auto-Tuning

Automatically find optimal hyperparameters:

const autoTuneConfig = {
  enabled: true,
  target: "balanced", // Optimize for balanced performance
  maxConfigs: 50, // Test up to 50 configurations
  timeoutMs: 600000, // 10 minute timeout
};

Optimization Targets:

accuracy: Maximize accuracy@1 (precision)
recall: Maximize hit rate@K (coverage)
speed: Minimize latency
balanced: Balance all metrics

Data Splits

Control training data splits:

const splitConfig = {
  seed: 42,
  proportions: {
    train: 0.7, // 70% for training
    val: 0.2, // 20% for validation/tuning
    test: 0.1, // 10% for final evaluation
  },
  minExamplesPerRoute: 3, // Minimum examples per route
  enforceAllSplitsPerRoute: true, // Ensure each route appears in all splits
};

Leakage Control

Control what metadata is included during training:

const leakageConfig = {
  includeName: true, // Include route names in training corpus
  includeDescription: true, // Include descriptions
  includeKeywords: true, // Include keywords
  includeExamplesFrom: "trainOnly", // Only use training examples
};

📊 Generated Types

After training, strongly typed definitions are automatically generated:

// Generated in .semantic-router/types/generated-routes.d.ts
declare module 'semantic-router/inference' {
  export interface GreetingRoute extends Route {
    readonly id: 'greeting';
    readonly name: 'Greeting';
    readonly metadata: {
      readonly category: 'social';
      readonly priority: 1;
      readonly responseType: 'friendly';
    };
  }

  export type GeneratedRoute = GreetingRoute | HelpRequestRoute | ...;
  export type RouteId = 'greeting' | 'help_request' | ...;
  export type RouteName = 'Greeting' | 'Help Request' | ...;
}

This enables:

Autocomplete in IDEs
Compile-time type checking
Type narrowing based on route.id
Metadata type safety

🧪 Testing

# Run all tests
pnpm test

# Run specific test files
pnpm test tests/integration.test.ts

# Watch mode
pnpm test --watch

Integration Testing

import { fitRoutes, routeQuery } from "semantic-router";

// Train a model
await fitRoutes(routes, { embedder, outDir: "test-output" });

// Test inference
const result = await routeQuery("test query", {
  sqlitePath: "test-output/meta.db",
});

expect(result.winningRoute?.id).toBe("expected_route");

🛠️ Development

Building

pnpm build

File Structure

packages/semantic-router/
├── src/
│   ├── cli/              # CLI commands and components
│   ├── inference/        # Runtime inference engine
│   └── shared/           # Shared utilities and types
├── tests/               # Test files
├── dist/               # Built package
└── package.json

🔍 Debugging

Enable Debug Logging

// Set environment variable
process.env.DEBUG = "semantic-router:*";

Inspect Training Results

import { RouteDB } from "semantic-router";

const db = new RouteDB(".semantic-router/meta.db");
const metadata = await db.getMeta("embedder");
const tuningResults = await db.getBestTuningResult("balanced");

Analyze Failures

Training generates detailed failure analysis:

{
  "query": "help me please",
  "expectedRoute": "help_request",
  "predictedRoute": "greeting",
  "hybridScore": 0.75,
  "ranking": [
    {
      "routeId": "greeting",
      "bm25Score": 0.8,
      "denseScore": 0.7,
      "hybridScore": 0.75
    }
  ]
}

📈 Performance Tips

Optimization

Route Count: Keep under 100 routes for best performance
Example Quality: 5-10 diverse examples per route
Keywords: Include important domain terms
Auto-tuning: Let the system optimize parameters
Caching: Embeddings are cached automatically

Production Deployment

// Pre-load model at startup
import { routeQuery } from "semantic-router/inference";

// Warm up the model
await routeQuery("warmup query");

// Now ready for production traffic

🐛 Troubleshooting

Common Issues

No routes found:

Check that .semantic-router/meta.db exists
Verify routes were trained with semantic-router fit

Type errors:

Ensure you ran semantic-router fit to generate types
Check that generated types are in your TypeScript path

Low accuracy:

Add more diverse examples per route
Enable auto-tuning with --auto-tune
Adjust decision threshold

Performance issues:

Reduce topK parameter
Use HashEmbedder for faster inference
Enable embedding caching