@kb-labs/llm-router
v1.2.0
Published
Adaptive LLM router with tier-based model selection and fallback support.
Maintainers
Readme
@kb-labs/llm-router
Adaptive LLM Router with tier-based model selection for KB Labs Platform.
Overview
LLM Router provides an abstraction layer that isolates plugins from LLM providers and models. Plugins specify what they need (tier + capabilities), and the platform decides how to fulfill it.
Key Principles:
- Plugin isolation - Plugins don't know about providers/models
- User-defined tiers - Users decide what "small/medium/large" means for them
- Adaptive resolution - Platform adapts to available models
- Simplify by default - Minimal config works out of the box
Installation
pnpm add @kb-labs/llm-routerQuick Start
Plugin Usage (via SDK)
import { useLLM } from '@kb-labs/sdk';
// Simple - uses configured default tier
const llm = useLLM();
// Request specific tier
const llm = useLLM({ tier: 'small' }); // Simple tasks
const llm = useLLM({ tier: 'large' }); // Complex tasks
// Request capabilities
const llm = useLLM({ tier: 'medium', capabilities: ['coding'] });
// Use LLM
if (llm) {
const result = await llm.complete('Generate commit message');
console.log(result.content);
}Configuration
Minimal config in kb.config.json:
{
"platform": {
"adapters": {
"llm": "@kb-labs/adapters-openai"
},
"adapterOptions": {
"llm": {
"tier": "medium",
"defaultModel": "gpt-4o"
}
}
}
}Centralized Cache/Stream Defaults
Platform can manage cache/stream defaults centrally via adapterOptions.llm.executionDefaults.
Plugins then use plain useLLM() and get consistent behavior by default.
{
"platform": {
"adapterOptions": {
"llm": {
"defaultTier": "medium",
"executionDefaults": {
"cache": {
"mode": "prefer",
"scope": "segments",
"ttlSec": 3600
},
"stream": {
"mode": "prefer",
"fallbackToComplete": true
}
}
}
}
}
}Plugin-level override remains available as an escape hatch:
const llm = useLLM({
tier: 'medium',
execution: {
cache: { mode: 'require', key: 'mind-rag:v2' }
}
});Merge priority:
- platform
executionDefaults - plugin
useLLM({ execution }) - per-call
llm.complete(..., { execution })
How To Verify Cache Is Working
Analytics events include cache outcome and billing breakdown:
llm.cache.hitllm.cache.missllm.cache.bypass
Completion/tool events also include:
cacheReadTokenscacheWriteTokensbillablePromptTokensestimatedUncachedCostestimatedCostestimatedCacheSavingsUsd
Practical signal that cache works:
llm.cache.hitappears regularly;cacheReadTokens > 0;billablePromptTokens < promptTokens;estimatedCacheSavingsUsd > 0.
Tier System
Tiers are User-Defined Slots
small / medium / large are NOT tied to specific models. They are abstract slots that users fill with whatever models they want.
| Tier | Plugin Intent | User Decides |
|------|---------------|--------------|
| small | "This task is simple" | "What model for simple stuff" |
| medium | "Standard task" | "My workhorse model" |
| large | "Complex task, need max quality" | "When I really need quality" |
Example Configurations
# Budget-conscious: everything on mini
small: gpt-4o-mini
medium: gpt-4o-mini
large: gpt-4o-mini
# Standard gradient
small: gpt-4o-mini
medium: gpt-4o
large: o1
# Anthropic-first
small: claude-3-haiku
medium: claude-3.5-sonnet
large: claude-opus-4
# Local-first with cloud fallback
small: ollama/llama-3-8b
medium: ollama/llama-3-70b
large: claude-opus-4Adaptive Resolution
Escalation (Silent)
If plugin requests lower tier than configured, platform escalates silently:
Plugin requests: small
Configured: medium
Result: medium (no warning)Degradation (With Warning)
If plugin requests higher tier than configured, platform degrades with warning:
Plugin requests: large
Configured: medium
Result: medium (⚠️ warning logged)Resolution Table
| Request | Configured | Result | Note | |---------|------------|--------|------| | small | small | small | Exact match | | small | medium | medium | Escalate ✅ | | small | large | large | Escalate ✅ | | medium | small | small | Degrade ⚠️ | | medium | medium | medium | Exact match | | medium | large | large | Escalate ✅ | | large | small | small | Degrade ⚠️ | | large | medium | medium | Degrade ⚠️ | | large | large | large | Exact match |
Capabilities
Capabilities describe task-specific requirements:
| Capability | Description | Typical Models |
|------------|-------------|----------------|
| fast | Lowest latency | gpt-4o-mini, haiku, flash |
| reasoning | Complex reasoning | o1, claude-opus |
| coding | Code-optimized | claude-sonnet, gpt-4o |
| vision | Image support | gpt-4o, claude-sonnet, gemini |
// Request with capabilities
const llm = useLLM({ tier: 'medium', capabilities: ['coding'] });
const llm = useLLM({ capabilities: ['vision'] });API Reference
Types
// Tier (user-defined quality slot)
type LLMTier = 'small' | 'medium' | 'large';
// Capability (task-specific requirements)
type LLMCapability = 'reasoning' | 'coding' | 'vision' | 'fast';
// Options for useLLM()
interface UseLLMOptions {
tier?: LLMTier;
capabilities?: LLMCapability[];
}Functions
// Get LLM with tier selection
function useLLM(options?: UseLLMOptions): ILLM | undefined;
// Check if LLM is available
function isLLMAvailable(): boolean;
// Get configured tier
function getLLMTier(): LLMTier | undefined;ILLMRouter Interface
interface ILLMRouter {
getConfiguredTier(): LLMTier;
resolve(options?: UseLLMOptions): LLMResolution;
hasCapability(capability: LLMCapability): boolean;
getCapabilities(): LLMCapability[];
}Architecture
┌─────────────────────────────────────────────────────────────┐
│ PLUGIN LAYER │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Plugins see ONLY: │ │
│ │ • tier: 'small' | 'medium' | 'large' │ │
│ │ • capabilities: 'reasoning' | 'coding' | ... │ │
│ │ │ │
│ │ Plugins DON'T KNOW: │ │
│ │ ✗ Providers (openai, anthropic, google) │ │
│ │ ✗ Models (gpt-4o, claude-3-opus) │ │
│ │ ✗ API keys, endpoints, pricing │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ useLLM({ tier, capabilities }) │
│ │ │
├────────────────────────────┼────────────────────────────────┤
│ PLATFORM LAYER │
│ ▼ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ LLM Router │ │
│ │ • Adaptive tier resolution │ │
│ │ • Capability matching │ │
│ │ • Transparent ILLM delegation │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ ILLM Adapter │ │
│ │ (OpenAI, Anthropic, Google, Ollama, etc.) │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘Related
License
MIT
