@kb-labs/llm-router

v2.94.0

Published

5 days ago

Adaptive LLM router with tier-based model selection and fallback support.

0High
0Medium
0Low

k.baranov

adaptive kb-labs llm router tier

@kb-labs/llm-router

Adaptive LLM Router with tier-based model selection for KB Labs Platform.

Overview

LLM Router provides an abstraction layer that isolates plugins from LLM providers and models. Plugins specify what they need (tier + capabilities), and the platform decides how to fulfill it.

Key Principles:

Plugin isolation - Plugins don't know about providers/models
User-defined tiers - Users decide what "small/medium/large" means for them
Adaptive resolution - Platform adapts to available models
Simplify by default - Minimal config works out of the box

Installation

pnpm add @kb-labs/llm-router

Quick Start

Plugin Usage (via SDK)

import { useLLM } from '@kb-labs/sdk';

// Simple - uses configured default tier
const llm = useLLM();

// Request specific tier
const llm = useLLM({ tier: 'small' });   // Simple tasks
const llm = useLLM({ tier: 'large' });   // Complex tasks

// Request capabilities
const llm = useLLM({ tier: 'medium', capabilities: ['coding'] });

// Use LLM
if (llm) {
  const result = await llm.complete('Generate commit message');
  console.log(result.content);
}

Configuration

Minimal config in kb.config.json:

{
  "platform": {
    "adapters": {
      "llm": "@kb-labs/adapters-openai"
    },
    "adapterOptions": {
      "llm": {
        "tier": "medium",
        "defaultModel": "gpt-4o"
      }
    }
  }
}

Centralized Cache/Stream Defaults

Platform can manage cache/stream defaults centrally via adapterOptions.llm.executionDefaults. Plugins then use plain useLLM() and get consistent behavior by default.

{
  "platform": {
    "adapterOptions": {
      "llm": {
        "defaultTier": "medium",
        "executionDefaults": {
          "cache": {
            "mode": "prefer",
            "scope": "segments",
            "ttlSec": 3600
          },
          "stream": {
            "mode": "prefer",
            "fallbackToComplete": true
          }
        }
      }
    }
  }
}

Plugin-level override remains available as an escape hatch:

const llm = useLLM({
  tier: 'medium',
  execution: {
    cache: { mode: 'require', key: 'mind-rag:v2' }
  }
});

Merge priority:

platform executionDefaults
plugin useLLM({ execution })
per-call llm.complete(..., { execution })

How To Verify Cache Is Working

Analytics events include cache outcome and billing breakdown:

llm.cache.hit
llm.cache.miss
llm.cache.bypass

Completion/tool events also include:

cacheReadTokens
cacheWriteTokens
billablePromptTokens
estimatedUncachedCost
estimatedCost
estimatedCacheSavingsUsd

Practical signal that cache works:

llm.cache.hit appears regularly;
cacheReadTokens > 0;
billablePromptTokens < promptTokens;
estimatedCacheSavingsUsd > 0.

Tier System

Tiers are User-Defined Slots

small / medium / large are NOT tied to specific models. They are abstract slots that users fill with whatever models they want.

| Tier | Plugin Intent | User Decides | |------|---------------|--------------| | small | "This task is simple" | "What model for simple stuff" | | medium | "Standard task" | "My workhorse model" | | large | "Complex task, need max quality" | "When I really need quality" |

Example Configurations

# Budget-conscious: everything on mini
small: gpt-4o-mini
medium: gpt-4o-mini
large: gpt-4o-mini

# Standard gradient
small: gpt-4o-mini
medium: gpt-4o
large: o1

# Anthropic-first
small: claude-3-haiku
medium: claude-3.5-sonnet
large: claude-opus-4

# Local-first with cloud fallback
small: ollama/llama-3-8b
medium: ollama/llama-3-70b
large: claude-opus-4

Adaptive Resolution

Escalation (Silent)

If plugin requests lower tier than configured, platform escalates silently:

Plugin requests: small
Configured:      medium
Result:          medium (no warning)

Degradation (With Warning)

If plugin requests higher tier than configured, platform degrades with warning:

Plugin requests: large
Configured:      medium
Result:          medium (⚠️ warning logged)

Resolution Table

| Request | Configured | Result | Note | |---------|------------|--------|------| | small | small | small | Exact match | | small | medium | medium | Escalate ✅ | | small | large | large | Escalate ✅ | | medium | small | small | Degrade ⚠️ | | medium | medium | medium | Exact match | | medium | large | large | Escalate ✅ | | large | small | small | Degrade ⚠️ | | large | medium | medium | Degrade ⚠️ | | large | large | large | Exact match |

Capabilities

Capabilities describe task-specific requirements:

| Capability | Description | Typical Models | |------------|-------------|----------------| | fast | Lowest latency | gpt-4o-mini, haiku, flash | | reasoning | Complex reasoning | o1, claude-opus | | coding | Code-optimized | claude-sonnet, gpt-4o | | vision | Image support | gpt-4o, claude-sonnet, gemini |

// Request with capabilities
const llm = useLLM({ tier: 'medium', capabilities: ['coding'] });
const llm = useLLM({ capabilities: ['vision'] });

API Reference

Types

// Tier (user-defined quality slot)
type LLMTier = 'small' | 'medium' | 'large';

// Capability (task-specific requirements)
type LLMCapability = 'reasoning' | 'coding' | 'vision' | 'fast';

// Options for useLLM()
interface UseLLMOptions {
  tier?: LLMTier;
  capabilities?: LLMCapability[];
}

Functions

// Get LLM with tier selection
function useLLM(options?: UseLLMOptions): ILLM | undefined;

// Check if LLM is available
function isLLMAvailable(): boolean;

// Get configured tier
function getLLMTier(): LLMTier | undefined;

ILLMRouter Interface

interface ILLMRouter {
  getConfiguredTier(): LLMTier;
  resolve(options?: UseLLMOptions): LLMResolution;
  hasCapability(capability: LLMCapability): boolean;
  getCapabilities(): LLMCapability[];
}

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      PLUGIN LAYER                           │
│  ┌───────────────────────────────────────────────────────┐ │
│  │  Plugins see ONLY:                                     │ │
│  │  • tier: 'small' | 'medium' | 'large'                  │ │
│  │  • capabilities: 'reasoning' | 'coding' | ...          │ │
│  │                                                        │ │
│  │  Plugins DON'T KNOW:                                   │ │
│  │  ✗ Providers (openai, anthropic, google)               │ │
│  │  ✗ Models (gpt-4o, claude-3-opus)                      │ │
│  │  ✗ API keys, endpoints, pricing                        │ │
│  └───────────────────────────────────────────────────────┘ │
│                            │                                │
│                   useLLM({ tier, capabilities })            │
│                            │                                │
├────────────────────────────┼────────────────────────────────┤
│                     PLATFORM LAYER                          │
│                            ▼                                │
│  ┌───────────────────────────────────────────────────────┐ │
│  │                     LLM Router                         │ │
│  │  • Adaptive tier resolution                            │ │
│  │  • Capability matching                                 │ │
│  │  • Transparent ILLM delegation                         │ │
│  └───────────────────────────────────────────────────────┘ │
│                            │                                │
│                            ▼                                │
│  ┌───────────────────────────────────────────────────────┐ │
│  │                   ILLM Adapter                         │ │
│  │  (OpenAI, Anthropic, Google, Ollama, etc.)             │ │
│  └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@kb-labs/llm-router

Overview

Installation

Quick Start

Plugin Usage (via SDK)

Configuration

Centralized Cache/Stream Defaults

How To Verify Cache Is Working

Tier System

Tiers are User-Defined Slots

Example Configurations

Adaptive Resolution

Escalation (Silent)

Degradation (With Warning)

Resolution Table

Capabilities

API Reference

Types

Functions

ILLMRouter Interface

Architecture

Related

License