@yadimon/prio-llm-router
v0.7.3
Published
Priority-based LLM routing across multiple providers to reduce cost with free-first and fallback chains.
Downloads
1,054
Maintainers
Readme
@yadimon/prio-llm-router
@yadimon/prio-llm-router is a TypeScript library for routing text generation requests through a priority-ordered chain of LLM targets.
It is built for the common "free models first, paid models later" setup:
- providers are configured once with names and API keys
- models are configured once with names, provider references, priorities, and metadata
- each request can use either an explicit chain or the implicit global priority order
- failures automatically fall through to the next configured target
The package keeps the routing logic intentionally small and predictable while reusing the Vercel AI SDK provider ecosystem for the actual provider calls.
Features
- Priority-based fallback across multiple providers and models
- Separate provider config and model target config
- Optional source builders for source-centric setup and strict free policies
- Non-streaming text generation and optional streaming
- Optional debug mode that mirrors attempt hooks to the console
- Per-request and router-level attempt timeouts for clean fallback
- AI SDK
providerOptionspassthrough for provider-specific controls - Built-in support for
google,openrouter,groq,mistral,cohere,perplexity,xai,togetherai,openai,anthropic,deepseek,vercel, and genericopenai-compatible - Strict TypeScript types
- Hook points for attempt-level logging and telemetry
- Ready for npm publishing and GitHub CI
- Structured to support future provider key pools without changing the model-chain API
Documentation
- Configuration Guide
- Streaming Semantics
- Architecture Notes
- Current Free Possibilities
- Local Providers
- Manual Real-Provider E2E
- Examples
- Contributor Agent Notes
Installation
npm install @yadimon/prio-llm-routerWhen To Use It
This package is a good fit when:
- you want to try multiple providers in a deterministic order
- you want free models first and paid models later
- you want one stable application-facing API while provider choices evolve
- you want fallback behavior to live in one place instead of being spread across app code
It is not trying to be a universal orchestration framework. The goal is a narrow, reliable router for text calls.
Quick Start
import { createLlmRouter } from '@yadimon/prio-llm-router';
const router = createLlmRouter({
providers: [
{
name: 'openrouter-main',
prefix: 'or',
type: 'openrouter',
auth: {
mode: 'single',
apiKey: process.env.OPENROUTER_API_KEY!,
},
appName: 'prio-llm-router-demo',
appUrl: 'https://example.com',
},
{
name: 'groq-main',
type: 'groq',
auth: {
mode: 'single',
apiKey: process.env.GROQ_API_KEY!,
},
},
{
name: 'openai-main',
type: 'openai',
auth: {
mode: 'single',
apiKey: process.env.OPENAI_API_KEY!,
},
},
],
models: [
{
name: 'trinity-free',
provider: 'openrouter-main',
model: 'arcee-ai/trinity-large:free',
priority: 10,
tier: 'free',
},
{
name: 'groq-oss',
provider: 'groq-main',
model: 'openai/gpt-oss-20b',
priority: 20,
tier: 'free',
},
{
name: 'gpt-4.1-paid',
provider: 'openai-main',
model: 'gpt-4.1-mini',
priority: 100,
tier: 'paid',
},
],
debug: true,
hooks: {
onAttemptFailure(attempt) {
console.warn('LLM attempt failed:', attempt);
},
},
});
const result = await router.generateText({
prompt: 'Summarize the advantages of priority-based model routing in 3 bullets.',
attemptTimeoutMs: 12000,
});
console.log(result.text);
console.log(result.target);
console.log(result.attempts);
console.log(result.usage);With debug: true, the router writes attempt:start, attempt:success, and attempt:failure events to the console while still calling your custom hooks.
When the selected provider returns usage data through the AI SDK, the router exposes it on result.usage. The normalized shape includes fields such as inputTokens, outputTokens, totalTokens, reasoningTokens, and cachedInputTokens.
Basic Mental Model
There are two separate layers:
providers: named credentials and transport settingsmodels: named routing targets that point to a provider and a concrete model id
Your app sends requests to the router using model target names, not raw provider config.
If you prefer shorter model references, providers may also expose a prefix such as or, and model targets may then omit provider and use model: 'or:google/gemma-4-31b-it:free' instead.
There is also an additive builder layer for source-centric setup:
createLlmConnection(...)createLlmSource(...)createOpenRouterConnection(...)createOpenRouterFreeSource(...)createOpenAICompatibleConnection(...)
This is the preferred path when you want to mark a source as strict free.
Strict Free Sources
Strict free mode is intentionally narrow.
It exists only where the package can prevent paid usage from the request shape alone. Today that means:
- only
openrouter - only explicit model ids that end in
:free
Example:
import {
createOpenRouterConnection,
createOpenRouterFreeSource,
createLlmRouter,
} from '@yadimon/prio-llm-router';
const openRouter = createOpenRouterConnection({
name: 'openrouter-main',
auth: {
mode: 'single',
apiKey: process.env.OPENROUTER_API_KEY!,
},
appName: 'prio-llm-router-demo',
appUrl: 'https://example.com',
});
const router = createLlmRouter({
sources: [
createOpenRouterFreeSource(openRouter, {
name: 'kimi-free',
model: 'moonshotai/kimi-k2:free',
priority: 10,
}),
],
});The package rejects strict free sources for providers whose free status depends on account plan or billing setup, such as google, groq, mistral, or cohere.
Explicit Request Chains
If you want per-request routing, pass a chain of configured model target names:
const result = await router.generateText({
prompt: 'Write a terse release note.',
chain: ['trinity-free', 'groq-oss', 'gpt-4.1-paid'],
});The chain values are usually target names from the models config.
If a chain entry does not match an exact configured target name, the router also checks for a provider-prefix model ref such as or:google/gemma-4-31b-it:free. Exact target-name matches always win before prefix fallback is attempted.
If chain is not provided, the router uses:
defaultChainfrom setup if present- otherwise all enabled model targets sorted by ascending
priority
Provider Options
providerOptions are passed through to Vercel AI SDK generateText and streamText calls for provider-specific controls:
const result = await router.generateText({
prompt: 'Answer briefly.',
chain: ['google-flash'],
providerOptions: {
google: {
thinkingConfig: {
thinkingBudget: 0,
},
},
},
});For Gemini 2.5 Flash, thinkingBudget: 0 disables thinking. These options are provider-specific, so check the matching AI SDK provider documentation for the accepted shape.
Messages Instead of Prompt
const result = await router.generateText({
system: 'Be concise.',
messages: [
{ role: 'user', content: [{ type: 'text', text: 'Explain fallback routing.' }] },
],
});Streaming With First-Chunk Fallback
For chat-style UX you can use streamText.
The router behavior is intentionally strict:
- before the first text chunk arrives, it may fall back to the next target
- once the first text chunk has been emitted, the model is locked in
- if the selected stream later fails, the error is surfaced and no further fallback happens
const stream = await router.streamText({
prompt: 'Explain this system in short sentences.',
chain: ['trinity-free', 'groq-oss', 'gpt-4.1-paid'],
firstChunkTimeoutMs: 2500,
});
for await (const chunk of stream.textStream) {
process.stdout.write(chunk);
}
const final = await stream.final;
console.log(final.target.name);Use firstChunkTimeoutMs when you want "switch if nothing starts quickly enough" behavior. If you omit it, the router waits indefinitely for the first chunk of the current target.
You can also use attemptTimeoutMs as the shared timeout for normal requests and streaming first-chunk fallback.
This makes the behavior safe for chat UIs:
- no silent model switch after the answer has already started
- no mixed output from multiple models in one response
- deterministic fallback only during the "nothing has started yet" phase
Configuration Model
Providers
Providers are named credentials plus provider type:
{
name: 'groq-main',
type: 'groq',
auth: {
mode: 'single',
apiKey: process.env.GROQ_API_KEY!,
},
}Today the auth mode is single. The type layout is intentionally future-friendly so provider key pools or key-priority strategies can be added later without changing how models reference providers.
Common provider-level fields:
nameprefixtypeauthenabledbaseURLheaders
Models
Models are named routing targets:
{
name: 'trinity-free',
provider: 'openrouter-main',
model: 'arcee-ai/trinity-large:free',
priority: 10,
tier: 'free',
}Or, when the referenced provider config declares prefix: 'or':
{
name: 'gemma-free',
model: 'or:google/gemma-4-31b-it:free',
priority: 10,
tier: 'free',
}The router either:
- uses
request.chainif provided - uses
defaultChainfrom setup if provided - otherwise sorts enabled targets by ascending
priority
Common model-level fields:
nameprovidermodelenabledprioritytiermetadata
provider is required for the standard object form. If model uses a configured provider prefix like or:..., the router resolves the provider from that prefix instead.
Attempt Timeouts
Use attemptTimeoutMs on a request when a single model attempt should fail and fall through after a fixed time:
const result = await router.generateText({
prompt: 'Write a short answer.',
attemptTimeoutMs: 8000,
});Or set a router-level default:
const router = createLlmRouter({
defaultAttemptTimeoutMs: 12000,
providers,
models,
});Timeouts become normal failed attempts with error.name === 'AttemptTimeoutError', so they appear in attempts and fire onAttemptFailure(...) like other execution failures.
Debug Mode And Hooks
Use debug: true when you want the router to mirror attempt hooks to the console during development.
const router = createLlmRouter({
debug: true,
providers,
models,
});That debug mode is intentionally small:
console.log('[prio-llm-router] attempt:start', attempt)console.log('[prio-llm-router] attempt:success', attempt)console.error('[prio-llm-router] attempt:failure', attempt)
If you also pass hooks, both stay active. Debug mode does not replace custom telemetry.
Supported Providers
googleopenroutergroqmistralcohereperplexityxaitogetheraiopenaianthropicdeepseekvercelopenai-compatible
These built-in types focus on API-key-based providers that map cleanly to the Vercel AI SDK. Use vercel for Vercel AI Gateway and openai-compatible for generic OpenAI-style gateways and proxies.
Use vercel when you want an explicit Vercel AI Gateway transport in router config:
{
name: 'vercel-main',
type: 'vercel',
auth: {
mode: 'single',
apiKey: process.env.AI_GATEWAY_API_KEY!,
},
}Use openai-compatible when you have an OpenAI-style endpoint that is not covered by a first-party adapter:
{
name: 'my-proxy',
type: 'openai-compatible',
baseURL: 'https://my-proxy.example.com/v1',
providerLabel: 'my-proxy',
auth: {
mode: 'single',
apiKey: process.env.MY_PROXY_API_KEY!,
},
}openai-compatible is also the one built-in provider type that may use an empty API key for local or internal backends. When the key is empty, the router allows the config and creates the adapter without an Authorization header.
If you prefer typed helpers over raw provider objects, use:
import {
createOpenAICompatibleConnection,
createOpenRouterConnection,
createOpenRouterFreeSource,
} from '@yadimon/prio-llm-router';This also covers local OpenAI-compatible runtimes such as LM Studio, Ollama, or other local gateways.
Example for LM Studio running locally on http://127.0.0.1:1234/v1:
Before using this setup, make sure LM Studio's local server is running with the OpenAI-compatible API enabled.
import {
createLlmRouter,
createOpenAICompatibleConnection,
} from '@yadimon/prio-llm-router';
const router = createLlmRouter({
providers: [
createOpenAICompatibleConnection({
name: 'lm-studio-local',
baseURL: 'http://127.0.0.1:1234/v1',
providerLabel: 'lm-studio',
auth: {
mode: 'single',
apiKey: '',
},
}).provider,
],
models: [
{
name: 'local-qwen',
provider: 'lm-studio-local',
model: 'qwen2.5-7b-instruct',
priority: 10,
},
],
});
const result = await router.generateText({
prompt: 'Describe this local LM Studio setup in one sentence.',
});
console.log(result.text);Notes:
- for LM Studio, enable the OpenAI-compatible local API before using this config
- the local server still needs to expose an OpenAI-compatible HTTP API
- the package allows an empty
apiKeyforopenai-compatible, so local runtimes can use''when they do not require auth - the
modelvalue must match the local model name exposed by your runtime
For a focused local-setup guide, see Local Providers.
Error Model
If every target fails, the router throws AllModelsFailedError.
That error includes:
attempts: all failed attempts in execution ordercause: the last underlying error
This makes it straightforward to log or surface detailed fallback history.
For streaming requests:
- fallback is allowed only before the first emitted text chunk
- after the stream starts, later errors are surfaced directly
stream.finalresolves to the final aggregated result when the stream completes successfully
Public API
Main exports:
createLlmRouterPrioLlmRoutercreateDefaultTextGenerationExecutorAttemptTimeoutErrorcreateOpenRouterConnectioncreateOpenRouterFreeSourcecreateOpenAICompatibleConnectionAllModelsFailedErrorRouterConfigurationError
Main methods:
router.generateText(...)router.streamText(...)router.listProviders()router.listModels()
Development
npm install
npm run checkFor a local packed-artifact smoke test against real provider keys from scripts/e2e/.env, run:
npm run test:e2e:realRepository layout:
Notes
- The routing logic is deliberately separate from provider execution logic.
- OpenRouter request headers
HTTP-RefererandX-Titlecan be set viaappUrlandappName. - Examples in this repository import from
../src/index.jsfor local development. In external projects, import from@yadimon/prio-llm-router.
