@yadimon/prio-llm-router

v0.7.4

Published

24 days ago

Priority-based LLM routing across multiple providers to reduce cost with free-first and fallback chains.

0High
0Medium
0Low

llm router fallback gemini google openrouter groq mistral cohere perplexity xai togetherai openai anthropic deepseek typescript

@yadimon/prio-llm-router

@yadimon/prio-llm-router is a TypeScript library for routing text generation requests through a priority-ordered chain of LLM targets.

It is built for the common "free models first, paid models later" setup:

providers are configured once with names and API keys
models are configured once with names, provider references, priorities, and metadata
each request can use either an explicit chain or the implicit global priority order
failures automatically fall through to the next configured target

The package keeps the routing logic intentionally small and predictable while reusing the Vercel AI SDK provider ecosystem for the actual provider calls.

Features

Priority-based fallback across multiple providers and models
Separate provider config and model target config
Optional source builders for source-centric setup and strict free policies
Non-streaming text generation and optional streaming
Optional debug mode that mirrors attempt hooks to the console
Per-request and router-level attempt timeouts for clean fallback
AI SDK providerOptions passthrough for provider-specific controls
Built-in support for google, openrouter, groq, mistral, cohere, perplexity, xai, togetherai, openai, anthropic, deepseek, vercel, and generic openai-compatible
Strict TypeScript types
Hook points for attempt-level logging and telemetry
Ready for npm publishing and GitHub CI
Structured to support future provider key pools without changing the model-chain API

Documentation

Installation

npm install @yadimon/prio-llm-router

When To Use It

This package is a good fit when:

you want to try multiple providers in a deterministic order
you want free models first and paid models later
you want one stable application-facing API while provider choices evolve
you want fallback behavior to live in one place instead of being spread across app code

It is not trying to be a universal orchestration framework. The goal is a narrow, reliable router for text calls.

Quick Start

import { createLlmRouter } from '@yadimon/prio-llm-router';

const router = createLlmRouter({
  providers: [
    {
      name: 'openrouter-main',
      prefix: 'or',
      type: 'openrouter',
      auth: {
        mode: 'single',
        apiKey: process.env.OPENROUTER_API_KEY!,
      },
      appName: 'prio-llm-router-demo',
      appUrl: 'https://example.com',
    },
    {
      name: 'groq-main',
      type: 'groq',
      auth: {
        mode: 'single',
        apiKey: process.env.GROQ_API_KEY!,
      },
    },
    {
      name: 'openai-main',
      type: 'openai',
      auth: {
        mode: 'single',
        apiKey: process.env.OPENAI_API_KEY!,
      },
    },
  ],
  models: [
    {
      name: 'trinity-free',
      provider: 'openrouter-main',
      model: 'arcee-ai/trinity-large:free',
      priority: 10,
      tier: 'free',
    },
    {
      name: 'groq-oss',
      provider: 'groq-main',
      model: 'openai/gpt-oss-20b',
      priority: 20,
      tier: 'free',
    },
    {
      name: 'gpt-4.1-paid',
      provider: 'openai-main',
      model: 'gpt-4.1-mini',
      priority: 100,
      tier: 'paid',
    },
  ],
  debug: true,
  hooks: {
    onAttemptFailure(attempt) {
      console.warn('LLM attempt failed:', attempt);
    },
  },
});

const result = await router.generateText({
  prompt: 'Summarize the advantages of priority-based model routing in 3 bullets.',
  attemptTimeoutMs: 12000,
});

console.log(result.text);
console.log(result.target);
console.log(result.attempts);
console.log(result.usage);

With debug: true, the router writes attempt:start, attempt:success, and attempt:failure events to the console while still calling your custom hooks.

When the selected provider returns usage data through the AI SDK, the router exposes it on result.usage. The normalized shape includes fields such as inputTokens, outputTokens, totalTokens, reasoningTokens, and cachedInputTokens.

Basic Mental Model

There are two separate layers:

providers: named credentials and transport settings
models: named routing targets that point to a provider and a concrete model id

Your app sends requests to the router using model target names, not raw provider config.

If you prefer shorter model references, providers may also expose a prefix such as or, and model targets may then omit provider and use model: 'or:google/gemma-4-31b-it:free' instead.

There is also an additive builder layer for source-centric setup:

createLlmConnection(...)
createLlmSource(...)
createOpenRouterConnection(...)
createOpenRouterFreeSource(...)
createOpenAICompatibleConnection(...)

This is the preferred path when you want to mark a source as strict free.

Strict Free Sources

Strict free mode is intentionally narrow.

It exists only where the package can prevent paid usage from the request shape alone. Today that means:

only openrouter
only explicit model ids that end in :free

Example:

import {
  createOpenRouterConnection,
  createOpenRouterFreeSource,
  createLlmRouter,
} from '@yadimon/prio-llm-router';

const openRouter = createOpenRouterConnection({
  name: 'openrouter-main',
  auth: {
    mode: 'single',
    apiKey: process.env.OPENROUTER_API_KEY!,
  },
  appName: 'prio-llm-router-demo',
  appUrl: 'https://example.com',
});

const router = createLlmRouter({
  sources: [
    createOpenRouterFreeSource(openRouter, {
      name: 'kimi-free',
      model: 'moonshotai/kimi-k2:free',
      priority: 10,
    }),
  ],
});

The package rejects strict free sources for providers whose free status depends on account plan or billing setup, such as google, groq, mistral, or cohere.

Explicit Request Chains

If you want per-request routing, pass a chain of configured model target names:

const result = await router.generateText({
  prompt: 'Write a terse release note.',
  chain: ['trinity-free', 'groq-oss', 'gpt-4.1-paid'],
});

The chain values are usually target names from the models config.

If a chain entry does not match an exact configured target name, the router also checks for a provider-prefix model ref such as or:google/gemma-4-31b-it:free. Exact target-name matches always win before prefix fallback is attempted.

If chain is not provided, the router uses:

defaultChain from setup if present
otherwise all enabled model targets sorted by ascending priority

Provider Options

providerOptions are passed through to Vercel AI SDK generateText and streamText calls for provider-specific controls:

const result = await router.generateText({
  prompt: 'Answer briefly.',
  chain: ['google-flash'],
  providerOptions: {
    google: {
      thinkingConfig: {
        thinkingBudget: 0,
      },
    },
  },
});

For Gemini 2.5 Flash, thinkingBudget: 0 disables thinking. These options are provider-specific, so check the matching AI SDK provider documentation for the accepted shape.

Messages Instead of Prompt

const result = await router.generateText({
  system: 'Be concise.',
  messages: [
    { role: 'user', content: [{ type: 'text', text: 'Explain fallback routing.' }] },
  ],
});

Streaming With First-Chunk Fallback

For chat-style UX you can use streamText.

The router behavior is intentionally strict:

before the first text chunk arrives, it may fall back to the next target
once the first text chunk has been emitted, the model is locked in
if the selected stream later fails, the error is surfaced and no further fallback happens

const stream = await router.streamText({
  prompt: 'Explain this system in short sentences.',
  chain: ['trinity-free', 'groq-oss', 'gpt-4.1-paid'],
  firstChunkTimeoutMs: 2500,
});

for await (const chunk of stream.textStream) {
  process.stdout.write(chunk);
}

const final = await stream.final;
console.log(final.target.name);

Use firstChunkTimeoutMs when you want "switch if nothing starts quickly enough" behavior. If you omit it, the router waits indefinitely for the first chunk of the current target.

You can also use attemptTimeoutMs as the shared timeout for normal requests and streaming first-chunk fallback.

This makes the behavior safe for chat UIs:

no silent model switch after the answer has already started
no mixed output from multiple models in one response
deterministic fallback only during the "nothing has started yet" phase

Configuration Model

Providers

Providers are named credentials plus provider type:

{
  name: 'groq-main',
  type: 'groq',
  auth: {
    mode: 'single',
    apiKey: process.env.GROQ_API_KEY!,
  },
}

Today the auth mode is single. The type layout is intentionally future-friendly so provider key pools or key-priority strategies can be added later without changing how models reference providers.

Common provider-level fields:

name
prefix
type
auth
enabled
baseURL
headers

Models

Models are named routing targets:

{
  name: 'trinity-free',
  provider: 'openrouter-main',
  model: 'arcee-ai/trinity-large:free',
  priority: 10,
  tier: 'free',
}

Or, when the referenced provider config declares prefix: 'or':

{
  name: 'gemma-free',
  model: 'or:google/gemma-4-31b-it:free',
  priority: 10,
  tier: 'free',
}

The router either:

uses request.chain if provided
uses defaultChain from setup if provided
otherwise sorts enabled targets by ascending priority

Common model-level fields:

name
provider
model
enabled
priority
tier
metadata

provider is required for the standard object form. If model uses a configured provider prefix like or:..., the router resolves the provider from that prefix instead.

Attempt Timeouts

Use attemptTimeoutMs on a request when a single model attempt should fail and fall through after a fixed time:

const result = await router.generateText({
  prompt: 'Write a short answer.',
  attemptTimeoutMs: 8000,
});

Or set a router-level default:

const router = createLlmRouter({
  defaultAttemptTimeoutMs: 12000,
  providers,
  models,
});

Timeouts become normal failed attempts with error.name === 'AttemptTimeoutError', so they appear in attempts and fire onAttemptFailure(...) like other execution failures.

Debug Mode And Hooks

Use debug: true when you want the router to mirror attempt hooks to the console during development.

const router = createLlmRouter({
  debug: true,
  providers,
  models,
});

That debug mode is intentionally small:

console.log('[prio-llm-router] attempt:start', attempt)
console.log('[prio-llm-router] attempt:success', attempt)
console.error('[prio-llm-router] attempt:failure', attempt)

If you also pass hooks, both stay active. Debug mode does not replace custom telemetry.

Supported Providers

google
openrouter
groq
mistral
cohere
perplexity
xai
togetherai
openai
anthropic
deepseek
vercel
openai-compatible

These built-in types focus on API-key-based providers that map cleanly to the Vercel AI SDK. Use vercel for Vercel AI Gateway and openai-compatible for generic OpenAI-style gateways and proxies.

Use vercel when you want an explicit Vercel AI Gateway transport in router config:

{
  name: 'vercel-main',
  type: 'vercel',
  auth: {
    mode: 'single',
    apiKey: process.env.AI_GATEWAY_API_KEY!,
  },
}

Use openai-compatible when you have an OpenAI-style endpoint that is not covered by a first-party adapter:

{
  name: 'my-proxy',
  type: 'openai-compatible',
  baseURL: 'https://my-proxy.example.com/v1',
  providerLabel: 'my-proxy',
  auth: {
    mode: 'single',
    apiKey: process.env.MY_PROXY_API_KEY!,
  },
}

openai-compatible is also the one built-in provider type that may use an empty API key for local or internal backends. When the key is empty, the router allows the config and creates the adapter without an Authorization header.

If you prefer typed helpers over raw provider objects, use:

import {
  createOpenAICompatibleConnection,
  createOpenRouterConnection,
  createOpenRouterFreeSource,
} from '@yadimon/prio-llm-router';

This also covers local OpenAI-compatible runtimes such as LM Studio, Ollama, or other local gateways.

Example for LM Studio running locally on http://127.0.0.1:1234/v1:

Before using this setup, make sure LM Studio's local server is running with the OpenAI-compatible API enabled.

import {
  createLlmRouter,
  createOpenAICompatibleConnection,
} from '@yadimon/prio-llm-router';

const router = createLlmRouter({
  providers: [
    createOpenAICompatibleConnection({
      name: 'lm-studio-local',
      baseURL: 'http://127.0.0.1:1234/v1',
      providerLabel: 'lm-studio',
      auth: {
        mode: 'single',
        apiKey: '',
      },
    }).provider,
  ],
  models: [
    {
      name: 'local-qwen',
      provider: 'lm-studio-local',
      model: 'qwen2.5-7b-instruct',
      priority: 10,
    },
  ],
});

const result = await router.generateText({
  prompt: 'Describe this local LM Studio setup in one sentence.',
});

console.log(result.text);

Notes:

for LM Studio, enable the OpenAI-compatible local API before using this config
the local server still needs to expose an OpenAI-compatible HTTP API
the package allows an empty apiKey for openai-compatible, so local runtimes can use '' when they do not require auth
the model value must match the local model name exposed by your runtime

For a focused local-setup guide, see Local Providers.

Error Model

If every target fails, the router throws AllModelsFailedError.

That error includes:

attempts: all failed attempts in execution order
cause: the last underlying error

This makes it straightforward to log or surface detailed fallback history.

For streaming requests:

fallback is allowed only before the first emitted text chunk
after the stream starts, later errors are surfaced directly
stream.final resolves to the final aggregated result when the stream completes successfully

Public API

Main exports:

createLlmRouter
PrioLlmRouter
createDefaultTextGenerationExecutor
AttemptTimeoutError
createOpenRouterConnection
createOpenRouterFreeSource
createOpenAICompatibleConnection
AllModelsFailedError
RouterConfigurationError

Main methods:

router.generateText(...)
router.streamText(...)
router.listProviders()
router.listModels()

Development

npm install
npm run check

For a local packed-artifact smoke test against real provider keys from scripts/e2e/.env, run:

npm run test:e2e:real

Repository layout:

Notes

The routing logic is deliberately separate from provider execution logic.
OpenRouter request headers HTTP-Referer and X-Title can be set via appUrl and appName.
Examples in this repository import from ../src/index.js for local development. In external projects, import from @yadimon/prio-llm-router.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@yadimon/prio-llm-router

Features

Documentation

Installation

When To Use It

Quick Start

Basic Mental Model

Strict Free Sources

Explicit Request Chains

Provider Options

Messages Instead of Prompt

Streaming With First-Chunk Fallback

Configuration Model

Providers

Models

Attempt Timeouts

Debug Mode And Hooks

Supported Providers

Error Model

Public API

Development

Notes