ai-failover
v0.3.0
Published
Zero-dependency TypeScript library that unifies free LLM APIs with automatic failover, SSE streaming, and optional React hooks. Works in Node, browsers, and React Native (direct or via your own backend proxy).
Downloads
325
Maintainers
Readme
ai-failover
Zero-dependency TypeScript library that unifies 6 free LLM APIs with automatic failover, SSE streaming, vision (image analysis), and optional React hooks. Works in Node/Bun backends, browsers (React/Next.js), and React Native/Expo — directly or through your own backend proxy so API keys never ship to the client.
Table of Contents
- Supported Providers
- Platform Support
- Installation
- Quick Start
- Integration Guide — step-by-step recipes for Node, Next.js, React Native/Expo, and SPAs
- Environment Variables
- API Reference
- Types Reference
- Vision / Image Analysis
- Streaming
- Failover Strategies
- Events
- Error Handling
- React Hooks
- Known Limitations
- Using in Another Project (local link)
- Testing Locally
Supported Providers
| Provider | Free Tier | Vision | Models | | -------------- | ----------------- | ---------------------- | ----------------------------------------- | | Groq | 30 RPM, 14.4K RPD | Yes (Llama 3.2 Vision) | Llama 3.3 70B, Llama 3.1 8B, Qwen QWQ 32B | | Gemini | 15 RPM, 1.5K RPD | Yes | Gemini 2.0 Flash, Flash Lite | | Cerebras | 30 RPM, 1K RPD | No | Llama 3.3 70B, Llama 3.1 8B, Qwen 3 32B | | OpenRouter | 20 RPM, 200 RPD | Yes (Gemini Flash) | Llama 3.3 70B, Qwen 3 32B, Gemini Flash | | Mistral | 30 RPM | Yes (Pixtral 12B) | Mistral Small, Pixtral 12B | | Cohere | 20 RPM, 1K RPD | No | Command R, Command R+ |
Platform Support
The same API (ai.chat(), ai.stream(), useChat) works everywhere — only the transport changes:
| Platform | Transport | API keys live | Streaming |
|----------|-----------|---------------|-----------|
| Node / Bun backend | DirectTransport (default) | Server env vars | Full SSE |
| Browser (React/Next.js) | HttpTransport → your backend | Server only | Full SSE |
| React Native / Expo | HttpTransport → your backend | Server only | SSE, or batched fallback* |
| Prototypes / personal scripts | DirectTransport with explicit keys | Wherever you put them | Full SSE |
* React Native's built-in fetch does not expose response bodies as streams. The library detects this and transparently falls back to delivering the full response through the same callbacks. For true incremental streaming use expo/fetch (SDK 52+) as the fetch option of httpTransport.
Security note: putting provider API keys in a browser bundle or mobile app exposes them to anyone. For anything you ship, use proxy mode — see the Integration Guide.
Installation
bun add ai-failover
# or
npm install ai-failoverOr install directly from GitHub:
npm install github:ManuelFeregrino/ai-failover
# or
bun add github:ManuelFeregrino/ai-failoverThe library has zero runtime dependencies. It works with Node.js >= 18, Bun, and Deno.
Quick Start
import { createAI } from "ai-failover";
// Auto-detects API keys from environment variables
const ai = createAI();
// Simple one-liner completion (returns string)
const answer = await ai.complete("What is TypeScript?");
console.log(answer);
// Full chat with message history (returns ChatResponse)
const response = await ai.chat({
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain closures in JavaScript" },
],
});
console.log(response.content); // AI response text
console.log(response.provider); // e.g. "groq"
console.log(response.model); // e.g. "llama-3.3-70b-versatile"
console.log(response.usage); // { promptTokens, completionTokens, totalTokens }
console.log(response.latencyMs); // e.g. 832
// Always clean up when done
ai.destroy();Integration Guide
Step-by-step recipes, written so a human or an AI coding agent can follow them verbatim. Each recipe is complete: install → env → exact file paths → full code → verification. If you are an AI agent integrating this library, also read Rules for AI coding agents at the end of this section.
Which mode do I need?
| Where does this code run? | Mode | Client construction |
|---|---|---|
| Node/Bun server, CLI, script, cron | Direct | createAI() — keys from server env |
| Browser (Next.js, Vite, any SPA) | Proxy | createAI({ transport: httpTransport("<your endpoint>") }) |
| React Native / Expo app | Proxy | same as browser, with an absolute baseURL |
Proxy mode keeps the provider keys on your server and forwards stream chunks as they arrive (passthrough), so time-to-first-token is essentially the same as calling providers directly:
Browser / React Native Your backend LLM providers
useChat / ai.chat() ── HttpTransport ──► createChatHandler(ai) ──► Groq/Gemini/…
(keys + failover here)Recipe 1 — Node/Bun backend, CLI, or script (direct mode)
Step 1. Install:
npm install ai-failover # or: bun add ai-failoverStep 2. Put at least one provider key in the environment (.env, never committed). All supported variables: GROQ_API_KEY, GEMINI_API_KEY, OPENROUTER_API_KEY, MISTRAL_API_KEY, CEREBRAS_API_KEY, COHERE_API_KEY.
Step 3. Use it:
import { createAI } from "ai-failover";
const ai = createAI(); // auto-detects keys; DirectTransport by default
const answer = await ai.complete("Hello!");
console.log(answer);
ai.destroy(); // on process teardownVerify: run the script. If it throws AllProvidersExhaustedError immediately, no API key was found in the environment.
Recipe 2 — Next.js app (App Router): UI + API route in one project
Step 1. Install in the Next.js project:
npm install ai-failoverStep 2. Create .env.local with provider keys. Server-side names only — never prefix them with NEXT_PUBLIC_:
GROQ_API_KEY=gsk_...
GEMINI_API_KEY=AI...Step 3. Create the API route at exactly app/api/ai/[...path]/route.ts:
import { createAI, createChatHandler } from "ai-failover";
const ai = createAI(); // reads keys from server env
const handler = createChatHandler(ai); // same-origin → no CORS needed
export const POST = handler;Step 4. Create a shared client at lib/ai-client.ts (one instance for the whole app):
import { createAI, httpTransport } from "ai-failover";
// No provider keys here — talks to the route from Step 3
export const aiClient = createAI({ transport: httpTransport("/api/ai") });Step 5. Use the hook in any client component:
"use client";
import { useChat } from "ai-failover/react";
import { aiClient } from "@/lib/ai-client";
export default function Chat() {
const { messages, input, setInput, handleSubmit, isLoading, stop } =
useChat({ client: aiClient });
return (
<form onSubmit={handleSubmit}>
{messages.map((m) => (
<p key={m.id}><b>{m.role}:</b> {typeof m.content === "string" ? m.content : "[image]"}</p>
))}
<input value={input} onChange={(e) => setInput(e.target.value)} />
<button type="submit" disabled={isLoading}>Send</button>
{isLoading && <button type="button" onClick={stop}>Stop</button>}
</form>
);
}Verify (with npm run dev running):
curl -X POST http://localhost:3000/api/ai/chat \
-H 'Content-Type: application/json' \
-d '{"messages":[{"role":"user","content":"Say ping"}]}'
# → {"content":"...","provider":"groq",...}
curl -N -X POST http://localhost:3000/api/ai/chat/stream \
-H 'Content-Type: application/json' \
-d '{"messages":[{"role":"user","content":"Count to 3"}]}'
# → data: {"content":"1",...} … data: [DONE]Deploying to Vercel/serverless? It works, with degraded proactive rate-limit tracking — see Known Limitations #1.
Recipe 3 — React Native / Expo app + a backend for the keys
The mobile app never holds provider keys; it talks to a backend you control.
Step 1. Backend (skip if you already did Recipe 2 — that route works for mobile too; just enable CORS or keep it same-API-domain). Standalone Bun/Node version:
// server.ts — deploy on Railway/Fly/VPS, or run on your LAN during development
import { createAI, createChatHandler } from "ai-failover";
const ai = createAI();
Bun.serve({ port: 8787, fetch: createChatHandler(ai, { cors: true }) });
console.log("AI proxy on :8787");Step 2. In the Expo/RN project:
npm install ai-failoverStep 3. Create lib/ai-client.ts. ⚠️ Use an absolute URL. localhost does NOT work from a device/emulator — in development use your machine's LAN IP (e.g. http://192.168.1.50:8787), in production your deployed URL:
import { createAI, httpTransport } from "ai-failover";
export const aiClient = createAI({
transport: httpTransport({
baseURL: process.env.EXPO_PUBLIC_AI_API_URL ?? "https://api.yourapp.com/ai",
// Your app's own user auth (NOT provider keys). Optional.
headers: async () => ({ Authorization: `Bearer ${await getSessionToken()}` }),
}),
});Step 4 (optional, Expo SDK 52+). True incremental streaming. Without this, streaming still works but tokens arrive in one batch at the end (RN's fetch limitation):
import { fetch as expoFetch } from "expo/fetch";
// add inside httpTransport({ ... }):
// fetch: expoFetch as unknown as typeof fetch,Step 5. Components use exactly the same hook as on the web:
import { useChat } from "ai-failover/react";
import { aiClient } from "../lib/ai-client";
const { messages, input, setInput, handleSubmit, isLoading } = useChat({ client: aiClient });
// wire to <TextInput>/<Pressable> as usualVerify: from the device, send a message; on the backend you should see the request and the provider being picked. To test the endpoint without the app: same curl as Recipe 2 against http://<LAN-IP>:8787/chat.
Recipe 4 — Vite/CRA SPA (backend on another origin)
Same as Recipe 3 but in a browser: run the Step-1 backend with cors: true (or cors: "https://app.example.com" to lock the origin) and point the transport at its absolute URL:
export const aiClient = createAI({
transport: httpTransport("https://api.yourapp.com/ai"),
});Rules for AI coding agents
Follow these when integrating ai-failover into a project:
- Never put provider keys (
GROQ_API_KEY, …) in client code, inNEXT_PUBLIC_*/EXPO_PUBLIC_*env vars, or in committed files. They belong only in the server environment. - In any browser or React Native file, construct the client with
createAI({ transport: httpTransport(...) }). A barecreateAI()is correct only in server/CLI code. - Import
ai-failover/node(imageMessageFromFile) only in Node code — it usesfsand breaks client bundles. For images in client apps, pass aBlob/File, bytes, or adata:URI tobuildImageMessage. - The handler serves
POST <mount>/chatandPOST <mount>/chat/stream. ThehttpTransportbaseURLmust equal that mount path (e.g. handler atapp/api/ai/[...path]/route.ts→baseURL: "/api/ai"). - Create one client per app (module-level export) and reuse it; call
destroy()only on teardown. Do not create a client per request/render. getStatus()throughHttpTransportreturns only local counters (providers: []) — provider state lives on the server.- Error handling: server-side, catch
AllProvidersExhaustedError(the handler already maps it to HTTP 503); client-side,HttpTransportthrowsFailoverErrorcarrying the server's message. - React hooks live in the
ai-failover/reactsubpath and require React ≥ 18 (optional peer dependency).
Environment Variables
Set API keys for auto-detection. Only providers with keys are enabled:
GROQ_API_KEY=gsk_...
GEMINI_API_KEY=AI...
OPENROUTER_API_KEY=sk-or-...
MISTRAL_API_KEY=...
CEREBRAS_API_KEY=csk-...
COHERE_API_KEY=...Optional behavior overrides:
AI_FAILOVER_STRATEGY=priority # priority | round-robin | least-used
AI_FAILOVER_TIMEOUT=30000 # Request timeout in ms
AI_FAILOVER_MAX_RETRIES=3 # Max provider retries before giving up
AI_FAILOVER_COOLDOWN=60000 # Cooldown after rate limit in msAPI Reference
createAI(config?)
Creates a FailoverClient instance. All parameters are optional.
import { createAI } from "ai-failover";
const ai = createAI(); // auto-detect from env
// or with explicit config
const ai = createAI({
strategy: "round-robin", // "priority" | "round-robin" | "least-used"
maxRetries: 3, // max providers to try before throwing
timeout: 15_000, // per-request timeout in ms
cooldownMs: 30_000, // how long to skip a provider after rate-limit
providers: {
groq: { apiKey: "gsk_..." },
gemini: { apiKey: "AI..." },
cerebras: { enabled: false }, // disable a provider
openrouter: {
apiKey: "sk-or-...",
models: ["google/gemini-2.0-flash-exp:free"],
},
},
});FailoverConfig fields:
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| strategy | "priority" \| "round-robin" \| "least-used" | "priority" | How to select the next provider |
| maxRetries | number | nº of providers | Max providers to try per request |
| timeout | number | 30000 | Per-request timeout in ms (also settable per request) |
| cooldownMs | number | 60000 | Base cooldown after a 429 without retry-after |
| maxCooldownMs | number | 900000 | Cap for the exponential backoff on consecutive 429s |
| errorThreshold | number | 2 | Consecutive non-429 errors before a provider gets a short cooldown |
| errorCooldownMs | number | 30000 | Cooldown applied when errorThreshold is reached |
| defaultModel | string | — | Used as request.model when the selected provider serves it |
| providers | Partial<Record<ProviderName, ProviderConfig>> | auto-detect | Provider-specific configuration |
| onFailover | (from, to, error) => void | — | Shorthand for ai.on("failover", …) |
| transport | Transport | DirectTransport | httpTransport(...) to route through your backend (Integration Guide). When set, the provider/strategy options are ignored — they apply server-side |
ProviderConfig fields:
| Field | Type | Description |
|-------|------|-------------|
| apiKey | string | Override env-based API key |
| models | string[] | Restrict/replace the model catalog. Unknown ids are accepted with default capabilities, so you can use newly released models without a library update. First entry becomes the default model |
| enabled | boolean | Set to false to disable |
| priority | number | Lower = tried first (default 100) |
| baseUrl | string | Override the provider's API base URL (gateways, compatible endpoints) |
ai.chat(request)
Send a chat completion request. Returns a Promise<ChatResponse>.
const response = await ai.chat({
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" },
],
temperature: 0.7, // optional
maxTokens: 1024, // optional
topP: 0.9, // optional
stop: ["\n\n"], // optional stop sequences
provider: "groq", // optional: force a specific provider
signal: controller.signal, // optional: AbortSignal
});ChatRequest fields:
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| messages | ChatMessage[] | Yes | Conversation messages |
| model | string | No | Override provider's default model |
| temperature | number | No | Sampling temperature (0-2) |
| maxTokens | number | No | Max tokens to generate |
| topP | number | No | Nucleus sampling |
| stop | string[] | No | Stop sequences |
| provider | ProviderName | No | Force a specific provider |
| signal | AbortSignal | No | Cancellation signal |
ChatResponse fields:
| Field | Type | Description |
|-------|------|-------------|
| content | string | Generated text |
| model | string | Model that was used |
| provider | ProviderName | Provider that was used |
| usage | TokenUsage | { promptTokens, completionTokens, totalTokens } |
| finishReason | string | e.g. "stop", "length" |
| latencyMs | number | Round-trip time in ms |
ChatMessage structure:
interface ChatMessage {
role: "system" | "user" | "assistant";
content: string | Array<TextContent | ImageContent>;
}
// Text-only message
{ role: "user", content: "Hello" }
// Multimodal message (text + image)
{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{ type: "image_url", image_url: { url: "data:image/jpeg;base64,..." } },
],
}ai.stream(request)
Stream a chat completion. Returns a Promise<ChatStream> (an AsyncIterable<ChatChunk>).
const stream = await ai.stream({
messages: [{ role: "user", content: "Write a poem" }],
onChunk(chunk) {
process.stdout.write(chunk.content); // real-time output
},
onDone(response) {
console.log(`Done! ${response.usage.totalTokens} tokens`);
},
onError(error) {
console.error("Stream error:", error);
},
});
// You MUST consume the async iterator for the stream to run
for await (const _ of stream) {}The StreamRequest extends ChatRequest with three optional callbacks:
| Callback | Signature | Description |
|----------|-----------|-------------|
| onChunk | (chunk: ChatChunk) => void | Called for each streamed token/chunk |
| onDone | (response: ChatResponse) => void | Called when the stream completes |
| onError | (error: Error) => void | Called on stream error |
ChatChunk fields:
| Field | Type | Description |
|-------|------|-------------|
| content | string | Partial text for this chunk |
| model | string | Model being used |
| provider | ProviderName | Provider being used |
| finishReason | string \| null | Set on final chunk |
| usage | TokenUsage \| null | Set on final chunk (if provider supports it) |
Alternative consumption patterns:
// Pure async iteration (no callbacks)
const stream = await ai.stream({
messages: [{ role: "user", content: "Hello" }],
});
for await (const chunk of stream) {
process.stdout.write(chunk.content);
}
// Cancel a stream
stream.controller.abort();
// Convert to web ReadableStream (for HTTP responses)
const readable = stream.toReadableStream();ai.complete(prompt, options?)
Shorthand for a single-message chat. Returns Promise<string>.
const text = await ai.complete("What is TypeScript?");
// Equivalent to:
// const { content } = await ai.chat({ messages: [{ role: "user", content: "..." }] });Accepts all ChatRequest options except messages:
const text = await ai.complete("Translate to Spanish: Hello world", {
temperature: 0.3,
provider: "mistral",
});ai.getStatus()
Returns the current ClientStatus with information about all providers.
const status = ai.getStatus();
console.log(status.strategy); // "priority"
console.log(status.totalRequests); // 42
console.log(status.totalTokens); // 15230
for (const p of status.providers) {
console.log(`${p.name}: configured=${p.configured}, available=${p.available}`);
// p.cooldownUntil — timestamp if rate-limited, else null
// p.usage — { minuteRequests, dayRequests, minuteTokens, dayTokens, ... }
}ai.on(event, handler) / ai.off(event, handler)
Subscribe/unsubscribe to lifecycle events. See Events for all event types.
const handler = ({ from, to, error }) => {
console.log(`Failover: ${from} -> ${to}`);
};
ai.on("failover", handler);
ai.off("failover", handler); // unsubscribeai.destroy()
Clean up timers, listeners, and internal state. Call this when you're done using the client.
ai.destroy();buildImageMessage(source, prompt?, options?)
Async helper that builds a ChatMessage ready for vision analysis. Universal — works in Node, browsers, and React Native (no fs, no Buffer).
import { buildImageMessage } from "ai-failover";Parameters:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| source | string \| Blob \| ArrayBuffer \| Uint8Array | Yes | http(s):// URL, data: URI, a File/Blob (e.g. from <input type="file">), or raw bytes |
| prompt | string | No | Text prompt for the image. Defaults to "Describe this image" |
| options.mimeType | string | No | MIME type for binary sources. Defaults to image/jpeg |
Returns: Promise<ChatMessage> — a message with role: "user" containing a text + image content array.
Local files (Node only): use imageMessageFromFile from the ai-failover/node subpath — it reads the file and infers the MIME type from the extension:
import { imageMessageFromFile } from "ai-failover/node";
const msg = await imageMessageFromFile("./photo.jpg", "What do you see?");Examples:
import { createAI, buildImageMessage } from "ai-failover";
const ai = createAI();
// --- Remote URL (universal) ---
const msg = await buildImageMessage(
"https://example.com/chart.png",
"Summarize this chart"
);
const response = await ai.chat({ messages: [msg] });
// --- Browser: file picker ---
const file = inputElement.files[0]; // File extends Blob
const msg2 = await buildImageMessage(file, "What's in this photo?");
// --- React Native / Expo: bytes from expo-file-system ---
const base64 = await FileSystem.readAsStringAsync(uri, { encoding: "base64" });
const msg3 = await buildImageMessage(`data:image/jpeg;base64,${base64}`, "Describe");
// --- Raw bytes with explicit MIME ---
const msg4 = await buildImageMessage(bytes, "Extract the total", { mimeType: "image/png" });
// --- Node: local file ---
import { imageMessageFromFile } from "ai-failover/node";
const msg5 = await imageMessageFromFile("./menu.jpg", "List all dishes with prices");The returned ChatMessage has this structure:
{
role: "user",
content: [
{ type: "text", text: "Your prompt here" },
{ type: "image_url", image_url: { url: "data:image/jpeg;base64,/9j/4AAQ..." } },
],
}Note: Vision requests are automatically routed to a vision-capable provider (Gemini, Groq Vision, OpenRouter, Mistral Pixtral). If no vision provider is configured, the request will fail.
Types Reference
All types are exported from "ai-failover" and can be imported for type safety:
import type {
// Provider
ProviderName, // "groq" | "gemini" | "openrouter" | "mistral" | "cerebras" | "cohere"
// Messages
ChatRole, // "system" | "user" | "assistant"
TextContent, // { type: "text"; text: string }
ImageContent, // { type: "image_url"; image_url: { url: string } }
MessageContent, // string | Array<TextContent | ImageContent>
ChatMessage, // { role: ChatRole; content: MessageContent }
// Request / Response
ChatRequest, // messages + optional model, temperature, maxTokens, etc.
ChatResponse, // content, model, provider, usage, finishReason, latencyMs
// Streaming
ChatChunk, // content, model, provider, finishReason, usage
ChatStream, // AsyncIterable<ChatChunk> + controller + toReadableStream()
StreamCallbacks, // onChunk, onDone, onError
StreamRequest, // ChatRequest & StreamCallbacks
// Token tracking
TokenUsage, // { promptTokens, completionTokens, totalTokens }
// Rate limits
RateLimits, // requestsPerMinute, requestsPerDay, tokensPerMinute, tokensPerDay
RateLimitState, // Current usage counters
// Provider info
ProviderCapabilities, // { vision, streaming, systemMessage, maxContextTokens }
ModelInfo, // { id, name, capabilities, rateLimits }
Provider, // Interface for provider implementations
// Configuration
FailoverStrategy, // "priority" | "round-robin" | "least-used"
ProviderConfig, // { apiKey?, models?, enabled?, priority? }
FailoverConfig, // Full client configuration
// Events & Status
FailoverEvents, // Event type map
ProviderStatus, // { name, configured, available, cooldownUntil, usage }
ClientStatus, // { providers, strategy, totalRequests, totalTokens }
} from "ai-failover";Vision / Image Analysis
There are two ways to send images for analysis:
Option 1: buildImageMessage helper (recommended)
The simplest approach. Handles URL fetching, base64 encoding, and MIME detection:
import { createAI, buildImageMessage } from "ai-failover";
const ai = createAI();
const msg = await buildImageMessage("https://example.com/photo.jpg", "What's in this image?");
const response = await ai.chat({ messages: [msg] });
console.log(response.content);
// Local files in Node:
import { imageMessageFromFile } from "ai-failover/node";
const msg2 = await imageMessageFromFile("./photo.jpg", "What's in this image?");Option 2: Manual ChatMessage construction
For when you already have the image data or need custom control:
const response = await ai.chat({
messages: [
{
role: "user",
content: [
{ type: "text", text: "Describe this image" },
{
type: "image_url",
image_url: { url: "data:image/jpeg;base64,/9j/4AAQ..." },
},
],
},
],
});The image_url.url field accepts:
- Data URIs:
data:image/jpeg;base64,...(used bybuildImageMessage) - HTTP URLs: Some providers support direct URLs, but base64 data URIs are the most portable
Streaming
// Async iteration
const stream = await ai.stream({
messages: [{ role: "user", content: "Write a poem" }],
});
for await (const chunk of stream) {
process.stdout.write(chunk.content);
}
// With callbacks
const stream = await ai.stream({
messages: [{ role: "user", content: "Write a poem" }],
onChunk(chunk) {
process.stdout.write(chunk.content);
},
onDone(response) {
console.log("Done!", response.usage);
},
onError(error) {
console.error("Error:", error);
},
});
for await (const _ of stream) {} // must consume
// Convert to ReadableStream (for HTTP responses in web servers)
const readable = stream.toReadableStream();
// Cancel mid-stream
stream.controller.abort();Failover Strategies
| Strategy | Description |
| ------------- | --------------------------------------------------- |
| priority | Try providers in order of priority (default) |
| round-robin | Rotate through providers evenly across requests |
| least-used | Prefer the provider with the fewest past requests |
When a provider fails (rate-limit, error, timeout), the client automatically tries the next available provider according to the selected strategy. Failed providers enter a cooldown period.
Recovery behavior
The router is designed so that a provider that comes back is used again automatically — and one that is down stops taxing your latency:
- 429 with
retry-after: the provider's own value is used verbatim — it always wins. - 429 without
retry-after: exponential backoff on consecutive hits —cooldownMs, ×2, ×4 … capped atmaxCooldownMs(15 min default). A success resets the backoff. - Non-429 failures (5xx, timeout, network): after
errorThresholdconsecutive errors (default 2) the provider gets a shorterrorCooldownMscooldown (30 s default), so an outage doesn't add a doomed first attempt to every request. - Stale local counters: daily usage is tracked in-process and may drift from the provider's real reset window. If every provider looks locally exhausted, the router probes the best candidate anyway (cooldowns from real 429s are still respected) — a successful probe resets its local counters. Local tracking is an optimization, never a permanent lockout.
- Cooldown expiry: expired cooldowns clear automatically; with the
prioritystrategy, a recovered provider goes back to being tried first.
Events
ai.on("failover", ({ from, to, error }) => {
console.log(`Switched from ${from} to ${to}: ${error.message}`);
});
ai.on("rateLimit", ({ provider, retryAfterMs }) => {
console.log(`${provider} rate limited for ${retryAfterMs}ms`);
});
ai.on("request", ({ provider, model }) => {
console.log(`Sending request to ${provider}/${model}`);
});
ai.on("response", ({ provider, model, latencyMs, usage }) => {
console.log(`${provider}/${model}: ${latencyMs}ms, ${usage.totalTokens} tokens`);
});
ai.on("error", ({ provider, error }) => {
console.error(`Error from ${provider}: ${error.message}`);
});
ai.on("exhausted", ({ providers }) => {
console.log(`All providers failed: ${providers.join(", ")}`);
});Event types:
| Event | Payload | Description |
|-------|---------|-------------|
| failover | { from: ProviderName, to: ProviderName, error: Error } | Provider switch occurred |
| rateLimit | { provider: ProviderName, retryAfterMs?: number } | Rate limit detected |
| request | { provider: ProviderName, model: string } | Request sent |
| response | { provider: ProviderName, model: string, latencyMs: number, usage: TokenUsage } | Response received |
| error | { provider: ProviderName, error: Error } | Provider error |
| exhausted | { providers: ProviderName[] } | All providers failed |
Error Handling
All error classes are exported and can be used for fine-grained catch logic:
import {
AllProvidersExhaustedError,
RateLimitError,
AuthenticationError,
TimeoutError,
ProviderError,
FailoverError,
} from "ai-failover";
try {
const response = await ai.chat({
messages: [{ role: "user", content: "Hello" }],
});
} catch (error) {
if (error instanceof AllProvidersExhaustedError) {
// Every provider failed — check individual errors
console.log("Providers tried:", error.providers);
console.log("Errors:", error.errors);
} else if (error instanceof RateLimitError) {
console.log(`${error.provider} rate limited, retry in ${error.retryAfterMs}ms`);
} else if (error instanceof AuthenticationError) {
console.log(`Bad API key for ${error.provider}`);
} else if (error instanceof TimeoutError) {
console.log(`${error.provider} timed out after ${error.timeoutMs}ms`);
}
}Error hierarchy:
FailoverError (base)
├── ProviderError (single provider failure)
│ ├── RateLimitError (HTTP 429)
│ ├── AuthenticationError (HTTP 401)
│ └── TimeoutError (request timeout)
└── AllProvidersExhaustedError (all providers failed)React Hooks
import { createAI } from "ai-failover";
import { useChat, useCompletion } from "ai-failover/react";
const client = createAI();useChat
function ChatComponent() {
const {
messages, // ChatMessageUI[] — message history with ids
input, // string — current input value
setInput, // (value: string) => void
handleSubmit, // (e: FormEvent) => void
isLoading, // boolean
error, // Error | null
stop, // () => void — cancel current stream
provider, // ProviderName | null — last used provider
} = useChat({ client });
return (
<form onSubmit={handleSubmit}>
{messages.map((m) => (
<div key={m.id}>
<strong>{m.role}:</strong> {m.content}
</div>
))}
<input value={input} onChange={(e) => setInput(e.target.value)} />
<button type="submit" disabled={isLoading}>Send</button>
{isLoading && <button onClick={stop}>Stop</button>}
{provider && <span>via {provider}</span>}
</form>
);
}useCompletion
function CompletionComponent() {
const { completion, complete, isLoading, stop } = useCompletion({ client });
return (
<div>
<button onClick={() => complete("Write a haiku")} disabled={isLoading}>
Generate
</button>
{isLoading && <button onClick={stop}>Stop</button>}
<p>{completion}</p>
</div>
);
}Note: React is an optional peer dependency. Hooks are imported from
"ai-failover/react". They work in React Native too — pair them withhttpTransport(see Integration Guide).
Known Limitations
Honest by design — these are accepted trade-offs, not bugs:
- Serverless (Vercel/Lambda): rate-limit tracking and cooldowns live in process memory. Each cold instance starts fresh, so proactive tracking ("I already spent today's quota") degrades; reactive failover (provider answers 429 → try the next one) still works within every request. For full effectiveness run the handler on a persistent server (VPS, Railway, Fly, a long-lived Bun/Node process).
- No mid-stream failover: if a provider dies after streaming began, the error is reported via
onError— restarting with another provider would duplicate text the user already saw. Failover applies before the first token. - Free-tier ceiling: the library maximizes free availability and resilience, not frontier-model quality. For top quality, plug a paid key (e.g. OpenRouter) into the same client.
- Catalog drift: providers deprecate free models and change quotas over time.
providers[].modelslets you point at new model ids from your app without waiting for a library update.
Using in Another Project (local link)
To use ai-failover locally without publishing to npm:
1. Build
cd /path/to/ai-failover
bun run buildThis generates dist/ with ESM, CJS, and TypeScript declarations.
2. Link globally
bun link3. Link in your project
cd /path/to/your-project
bun link ai-failover4. Use it
import { createAI } from "ai-failover";
import { imageMessageFromFile } from "ai-failover/node";
const ai = createAI();
// Text chat
const res = await ai.chat({
messages: [{ role: "user", content: "Hello!" }],
});
console.log(res.content);
// Image analysis
const msg = await imageMessageFromFile("./photo.jpg", "What's in this photo?");
const res2 = await ai.chat({ messages: [msg] });
console.log(res2.content);
// Streaming
const stream = await ai.stream({
messages: [{ role: "user", content: "Write a story" }],
onChunk(chunk) { process.stdout.write(chunk.content); },
});
for await (const _ of stream) {}
ai.destroy();// React hooks
import { createAI } from "ai-failover";
import { useChat, useCompletion } from "ai-failover/react";
const client = createAI();
function Chat() {
const { messages, input, setInput, handleSubmit, isLoading } = useChat({ client });
// ...
}5. Environment variables
Add API keys in your project's .env or .env.local:
GROQ_API_KEY=gsk_...
GEMINI_API_KEY=...
CEREBRAS_API_KEY=...Note: After modifying
ai-failoversource, runbun run buildagain to update the linked package.
Testing Locally
- Create a
.env.localfile in the project root with at least one API key:
GROQ_API_KEY=gsk_your_key_here
# GEMINI_API_KEY=
# CEREBRAS_API_KEY=
# OPENROUTER_API_KEY=
# MISTRAL_API_KEY=
# COHERE_API_KEY=- Run the interactive chat:
bun run examples/chat.tsInside the chat, use image: prefix for vision:
You > Hello, how are you?
You > image:./photo.jpg What do you see?
You > image:https://example.com/img.png Describe this
You > exit- Run other examples:
bun run examples/basic-chat.ts
bun run examples/streaming.ts
bun run examples/custom-priority.ts
bun run examples/vision.ts
bun run examples/try-it.ts- Unit tests (no API keys required):
bun test- E2E tests (requires API keys):
LIVE_TESTS=1 bun test test/e2e/Complete Integration Example
A full example showing how an agent or application can integrate ai-failover:
import { createAI } from "ai-failover";
import { imageMessageFromFile } from "ai-failover/node";
import type { ChatMessage, ChatResponse } from "ai-failover";
// 1. Initialize — auto-detects API keys from environment
const ai = createAI({
strategy: "priority",
maxRetries: 3,
});
// 2. Monitor events (optional)
ai.on("failover", ({ from, to, error }) => {
console.warn(`Provider ${from} failed, switching to ${to}: ${error.message}`);
});
// 3. Simple text completion
const answer = await ai.complete("Explain what an API is in one sentence.");
console.log(answer);
// 4. Multi-turn conversation
const history: ChatMessage[] = [
{ role: "system", content: "You are a code reviewer." },
{ role: "user", content: "Review this function: const add = (a, b) => a + b;" },
];
const review = await ai.chat({ messages: history });
history.push({ role: "assistant", content: review.content });
// Continue the conversation
history.push({ role: "user", content: "Now add TypeScript types to it." });
const followUp = await ai.chat({ messages: history });
console.log(followUp.content);
// 5. Image analysis
const imgMsg = await imageMessageFromFile("./diagram.png", "Explain this architecture diagram");
const imgRes = await ai.chat({
messages: [
{ role: "system", content: "You are a software architect." },
imgMsg,
],
});
console.log(imgRes.content);
// 6. Streaming with real-time output
const streamMsg = await imageMessageFromFile("./receipt.jpg", "Extract all line items and total");
const stream = await ai.stream({
messages: [streamMsg],
onChunk(chunk) {
process.stdout.write(chunk.content);
},
onDone(res) {
console.log(`\n[${res.provider}/${res.model} — ${res.usage.totalTokens} tokens]`);
},
});
for await (const _ of stream) {}
// 7. Check provider status
const status = ai.getStatus();
console.log(`Requests: ${status.totalRequests}, Tokens: ${status.totalTokens}`);
// 8. Clean up
ai.destroy();License
MIT
