ai-failover

v0.3.0

Published

6 days ago

Zero-dependency TypeScript library that unifies free LLM APIs with automatic failover, SSE streaming, and optional React hooks. Works in Node, browsers, and React Native (direct or via your own backend proxy).

Downloads

325

0High
0Medium
0Low

manuelferegrino

ai llm failover groq gemini openrouter mistral cerebras cohere streaming react-hooks react-native

ai-failover

Zero-dependency TypeScript library that unifies 6 free LLM APIs with automatic failover, SSE streaming, vision (image analysis), and optional React hooks. Works in Node/Bun backends, browsers (React/Next.js), and React Native/Expo — directly or through your own backend proxy so API keys never ship to the client.

Supported Providers
Platform Support
Installation
Quick Start
Integration Guide — step-by-step recipes for Node, Next.js, React Native/Expo, and SPAs
- Rules for AI coding agents
Environment Variables
API Reference
Types Reference
Vision / Image Analysis
Streaming
Failover Strategies
Events
Error Handling
React Hooks
Known Limitations
Using in Another Project (local link)
Testing Locally

Supported Providers

| Provider | Free Tier | Vision | Models | | -------------- | ----------------- | ---------------------- | ----------------------------------------- | | Groq | 30 RPM, 14.4K RPD | Yes (Llama 3.2 Vision) | Llama 3.3 70B, Llama 3.1 8B, Qwen QWQ 32B | | Gemini | 15 RPM, 1.5K RPD | Yes | Gemini 2.0 Flash, Flash Lite | | Cerebras | 30 RPM, 1K RPD | No | Llama 3.3 70B, Llama 3.1 8B, Qwen 3 32B | | OpenRouter | 20 RPM, 200 RPD | Yes (Gemini Flash) | Llama 3.3 70B, Qwen 3 32B, Gemini Flash | | Mistral | 30 RPM | Yes (Pixtral 12B) | Mistral Small, Pixtral 12B | | Cohere | 20 RPM, 1K RPD | No | Command R, Command R+ |

Platform Support

The same API (ai.chat(), ai.stream(), useChat) works everywhere — only the transport changes:

| Platform | Transport | API keys live | Streaming | |----------|-----------|---------------|-----------| | Node / Bun backend | DirectTransport (default) | Server env vars | Full SSE | | Browser (React/Next.js) | HttpTransport → your backend | Server only | Full SSE | | React Native / Expo | HttpTransport → your backend | Server only | SSE, or batched fallback* | | Prototypes / personal scripts | DirectTransport with explicit keys | Wherever you put them | Full SSE |

* React Native's built-in fetch does not expose response bodies as streams. The library detects this and transparently falls back to delivering the full response through the same callbacks. For true incremental streaming use expo/fetch (SDK 52+) as the fetch option of httpTransport.

Security note: putting provider API keys in a browser bundle or mobile app exposes them to anyone. For anything you ship, use proxy mode — see the Integration Guide.

Installation

bun add ai-failover
# or
npm install ai-failover

Or install directly from GitHub:

npm install github:ManuelFeregrino/ai-failover
# or
bun add github:ManuelFeregrino/ai-failover

The library has zero runtime dependencies. It works with Node.js >= 18, Bun, and Deno.

Quick Start

import { createAI } from "ai-failover";

// Auto-detects API keys from environment variables
const ai = createAI();

// Simple one-liner completion (returns string)
const answer = await ai.complete("What is TypeScript?");
console.log(answer);

// Full chat with message history (returns ChatResponse)
const response = await ai.chat({
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain closures in JavaScript" },
  ],
});

console.log(response.content);  // AI response text
console.log(response.provider); // e.g. "groq"
console.log(response.model);    // e.g. "llama-3.3-70b-versatile"
console.log(response.usage);    // { promptTokens, completionTokens, totalTokens }
console.log(response.latencyMs); // e.g. 832

// Always clean up when done
ai.destroy();

Integration Guide

Step-by-step recipes, written so a human or an AI coding agent can follow them verbatim. Each recipe is complete: install → env → exact file paths → full code → verification. If you are an AI agent integrating this library, also read Rules for AI coding agents at the end of this section.

Which mode do I need?

| Where does this code run? | Mode | Client construction | |---|---|---| | Node/Bun server, CLI, script, cron | Direct | createAI() — keys from server env | | Browser (Next.js, Vite, any SPA) | Proxy | createAI({ transport: httpTransport("<your endpoint>") }) | | React Native / Expo app | Proxy | same as browser, with an absolute baseURL |

Proxy mode keeps the provider keys on your server and forwards stream chunks as they arrive (passthrough), so time-to-first-token is essentially the same as calling providers directly:

Browser / React Native                    Your backend                LLM providers
useChat / ai.chat()  ── HttpTransport ──► createChatHandler(ai) ──►  Groq/Gemini/…
                                          (keys + failover here)

Recipe 1 — Node/Bun backend, CLI, or script (direct mode)

Step 1. Install:

npm install ai-failover     # or: bun add ai-failover

Step 2. Put at least one provider key in the environment (.env, never committed). All supported variables: GROQ_API_KEY, GEMINI_API_KEY, OPENROUTER_API_KEY, MISTRAL_API_KEY, CEREBRAS_API_KEY, COHERE_API_KEY.

Step 3. Use it:

import { createAI } from "ai-failover";

const ai = createAI(); // auto-detects keys; DirectTransport by default
const answer = await ai.complete("Hello!");
console.log(answer);
ai.destroy(); // on process teardown

Verify: run the script. If it throws AllProvidersExhaustedError immediately, no API key was found in the environment.

Recipe 2 — Next.js app (App Router): UI + API route in one project

Step 1. Install in the Next.js project:

npm install ai-failover

Step 2. Create .env.local with provider keys. Server-side names only — never prefix them with NEXT_PUBLIC_:

GROQ_API_KEY=gsk_...
GEMINI_API_KEY=AI...

Step 3. Create the API route at exactly app/api/ai/[...path]/route.ts:

import { createAI, createChatHandler } from "ai-failover";

const ai = createAI(); // reads keys from server env
const handler = createChatHandler(ai); // same-origin → no CORS needed

export const POST = handler;

Step 4. Create a shared client at lib/ai-client.ts (one instance for the whole app):

import { createAI, httpTransport } from "ai-failover";

// No provider keys here — talks to the route from Step 3
export const aiClient = createAI({ transport: httpTransport("/api/ai") });

Step 5. Use the hook in any client component:

"use client";
import { useChat } from "ai-failover/react";
import { aiClient } from "@/lib/ai-client";

export default function Chat() {
  const { messages, input, setInput, handleSubmit, isLoading, stop } =
    useChat({ client: aiClient });

  return (
    <form onSubmit={handleSubmit}>
      {messages.map((m) => (
        <p key={m.id}><b>{m.role}:</b> {typeof m.content === "string" ? m.content : "[image]"}</p>
      ))}
      <input value={input} onChange={(e) => setInput(e.target.value)} />
      <button type="submit" disabled={isLoading}>Send</button>
      {isLoading && <button type="button" onClick={stop}>Stop</button>}
    </form>
  );
}

Verify (with npm run dev running):

curl -X POST http://localhost:3000/api/ai/chat \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"user","content":"Say ping"}]}'
# → {"content":"...","provider":"groq",...}

curl -N -X POST http://localhost:3000/api/ai/chat/stream \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"user","content":"Count to 3"}]}'
# → data: {"content":"1",...}  … data: [DONE]

Deploying to Vercel/serverless? It works, with degraded proactive rate-limit tracking — see Known Limitations #1.

Recipe 3 — React Native / Expo app + a backend for the keys

The mobile app never holds provider keys; it talks to a backend you control.

Step 1. Backend (skip if you already did Recipe 2 — that route works for mobile too; just enable CORS or keep it same-API-domain). Standalone Bun/Node version:

// server.ts — deploy on Railway/Fly/VPS, or run on your LAN during development
import { createAI, createChatHandler } from "ai-failover";

const ai = createAI();
Bun.serve({ port: 8787, fetch: createChatHandler(ai, { cors: true }) });
console.log("AI proxy on :8787");

Step 2. In the Expo/RN project:

npm install ai-failover

Step 3. Create lib/ai-client.ts. ⚠️ Use an absolute URL. localhost does NOT work from a device/emulator — in development use your machine's LAN IP (e.g. http://192.168.1.50:8787), in production your deployed URL:

import { createAI, httpTransport } from "ai-failover";

export const aiClient = createAI({
  transport: httpTransport({
    baseURL: process.env.EXPO_PUBLIC_AI_API_URL ?? "https://api.yourapp.com/ai",
    // Your app's own user auth (NOT provider keys). Optional.
    headers: async () => ({ Authorization: `Bearer ${await getSessionToken()}` }),
  }),
});

Step 4 (optional, Expo SDK 52+). True incremental streaming. Without this, streaming still works but tokens arrive in one batch at the end (RN's fetch limitation):

import { fetch as expoFetch } from "expo/fetch";
// add inside httpTransport({ ... }):
//   fetch: expoFetch as unknown as typeof fetch,

Step 5. Components use exactly the same hook as on the web:

import { useChat } from "ai-failover/react";
import { aiClient } from "../lib/ai-client";

const { messages, input, setInput, handleSubmit, isLoading } = useChat({ client: aiClient });
// wire to <TextInput>/<Pressable> as usual

Verify: from the device, send a message; on the backend you should see the request and the provider being picked. To test the endpoint without the app: same curl as Recipe 2 against http://<LAN-IP>:8787/chat.

Recipe 4 — Vite/CRA SPA (backend on another origin)

Same as Recipe 3 but in a browser: run the Step-1 backend with cors: true (or cors: "https://app.example.com" to lock the origin) and point the transport at its absolute URL:

export const aiClient = createAI({
  transport: httpTransport("https://api.yourapp.com/ai"),
});

Rules for AI coding agents

Follow these when integrating ai-failover into a project:

Never put provider keys (GROQ_API_KEY, …) in client code, in NEXT_PUBLIC_*/EXPO_PUBLIC_* env vars, or in committed files. They belong only in the server environment.
In any browser or React Native file, construct the client with createAI({ transport: httpTransport(...) }). A bare createAI() is correct only in server/CLI code.
Import ai-failover/node (imageMessageFromFile) only in Node code — it uses fs and breaks client bundles. For images in client apps, pass a Blob/File, bytes, or a data: URI to buildImageMessage.
The handler serves POST <mount>/chat and POST <mount>/chat/stream. The httpTransport baseURL must equal that mount path (e.g. handler at app/api/ai/[...path]/route.ts → baseURL: "/api/ai").
Create one client per app (module-level export) and reuse it; call destroy() only on teardown. Do not create a client per request/render.
getStatus() through HttpTransport returns only local counters (providers: []) — provider state lives on the server.
Error handling: server-side, catch AllProvidersExhaustedError (the handler already maps it to HTTP 503); client-side, HttpTransport throws FailoverError carrying the server's message.
React hooks live in the ai-failover/react subpath and require React ≥ 18 (optional peer dependency).

Environment Variables

Set API keys for auto-detection. Only providers with keys are enabled:

GROQ_API_KEY=gsk_...
GEMINI_API_KEY=AI...
OPENROUTER_API_KEY=sk-or-...
MISTRAL_API_KEY=...
CEREBRAS_API_KEY=csk-...
COHERE_API_KEY=...

Optional behavior overrides:

AI_FAILOVER_STRATEGY=priority     # priority | round-robin | least-used
AI_FAILOVER_TIMEOUT=30000         # Request timeout in ms
AI_FAILOVER_MAX_RETRIES=3         # Max provider retries before giving up
AI_FAILOVER_COOLDOWN=60000        # Cooldown after rate limit in ms

API Reference

`createAI(config?)`

Creates a FailoverClient instance. All parameters are optional.

import { createAI } from "ai-failover";

const ai = createAI(); // auto-detect from env

// or with explicit config
const ai = createAI({
  strategy: "round-robin",   // "priority" | "round-robin" | "least-used"
  maxRetries: 3,             // max providers to try before throwing
  timeout: 15_000,           // per-request timeout in ms
  cooldownMs: 30_000,        // how long to skip a provider after rate-limit
  providers: {
    groq: { apiKey: "gsk_..." },
    gemini: { apiKey: "AI..." },
    cerebras: { enabled: false }, // disable a provider
    openrouter: {
      apiKey: "sk-or-...",
      models: ["google/gemini-2.0-flash-exp:free"],
    },
  },
});

FailoverConfig fields:

| Field | Type | Default | Description | |-------|------|---------|-------------| | strategy | "priority" \| "round-robin" \| "least-used" | "priority" | How to select the next provider | | maxRetries | number | nº of providers | Max providers to try per request | | timeout | number | 30000 | Per-request timeout in ms (also settable per request) | | cooldownMs | number | 60000 | Base cooldown after a 429 without retry-after | | maxCooldownMs | number | 900000 | Cap for the exponential backoff on consecutive 429s | | errorThreshold | number | 2 | Consecutive non-429 errors before a provider gets a short cooldown | | errorCooldownMs | number | 30000 | Cooldown applied when errorThreshold is reached | | defaultModel | string | — | Used as request.model when the selected provider serves it | | providers | Partial<Record<ProviderName, ProviderConfig>> | auto-detect | Provider-specific configuration | | onFailover | (from, to, error) => void | — | Shorthand for ai.on("failover", …) | | transport | Transport | DirectTransport | httpTransport(...) to route through your backend (Integration Guide). When set, the provider/strategy options are ignored — they apply server-side |

ProviderConfig fields:

| Field | Type | Description | |-------|------|-------------| | apiKey | string | Override env-based API key | | models | string[] | Restrict/replace the model catalog. Unknown ids are accepted with default capabilities, so you can use newly released models without a library update. First entry becomes the default model | | enabled | boolean | Set to false to disable | | priority | number | Lower = tried first (default 100) | | baseUrl | string | Override the provider's API base URL (gateways, compatible endpoints) |

`ai.chat(request)`

Send a chat completion request. Returns a Promise<ChatResponse>.

const response = await ai.chat({
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Hello!" },
  ],
  temperature: 0.7,  // optional
  maxTokens: 1024,   // optional
  topP: 0.9,         // optional
  stop: ["\n\n"],    // optional stop sequences
  provider: "groq",  // optional: force a specific provider
  signal: controller.signal, // optional: AbortSignal
});

ChatRequest fields:

| Field | Type | Required | Description | |-------|------|----------|-------------| | messages | ChatMessage[] | Yes | Conversation messages | | model | string | No | Override provider's default model | | temperature | number | No | Sampling temperature (0-2) | | maxTokens | number | No | Max tokens to generate | | topP | number | No | Nucleus sampling | | stop | string[] | No | Stop sequences | | provider | ProviderName | No | Force a specific provider | | signal | AbortSignal | No | Cancellation signal |

ChatResponse fields:

| Field | Type | Description | |-------|------|-------------| | content | string | Generated text | | model | string | Model that was used | | provider | ProviderName | Provider that was used | | usage | TokenUsage | { promptTokens, completionTokens, totalTokens } | | finishReason | string | e.g. "stop", "length" | | latencyMs | number | Round-trip time in ms |

ChatMessage structure:

interface ChatMessage {
  role: "system" | "user" | "assistant";
  content: string | Array<TextContent | ImageContent>;
}

// Text-only message
{ role: "user", content: "Hello" }

// Multimodal message (text + image)
{
  role: "user",
  content: [
    { type: "text", text: "What's in this image?" },
    { type: "image_url", image_url: { url: "data:image/jpeg;base64,..." } },
  ],
}

`ai.stream(request)`

Stream a chat completion. Returns a Promise<ChatStream> (an AsyncIterable<ChatChunk>).

const stream = await ai.stream({
  messages: [{ role: "user", content: "Write a poem" }],
  onChunk(chunk) {
    process.stdout.write(chunk.content); // real-time output
  },
  onDone(response) {
    console.log(`Done! ${response.usage.totalTokens} tokens`);
  },
  onError(error) {
    console.error("Stream error:", error);
  },
});

// You MUST consume the async iterator for the stream to run
for await (const _ of stream) {}

The StreamRequest extends ChatRequest with three optional callbacks:

| Callback | Signature | Description | |----------|-----------|-------------| | onChunk | (chunk: ChatChunk) => void | Called for each streamed token/chunk | | onDone | (response: ChatResponse) => void | Called when the stream completes | | onError | (error: Error) => void | Called on stream error |

ChatChunk fields:

| Field | Type | Description | |-------|------|-------------| | content | string | Partial text for this chunk | | model | string | Model being used | | provider | ProviderName | Provider being used | | finishReason | string \| null | Set on final chunk | | usage | TokenUsage \| null | Set on final chunk (if provider supports it) |

Alternative consumption patterns:

// Pure async iteration (no callbacks)
const stream = await ai.stream({
  messages: [{ role: "user", content: "Hello" }],
});
for await (const chunk of stream) {
  process.stdout.write(chunk.content);
}

// Cancel a stream
stream.controller.abort();

// Convert to web ReadableStream (for HTTP responses)
const readable = stream.toReadableStream();

`ai.complete(prompt, options?)`

Shorthand for a single-message chat. Returns Promise<string>.

const text = await ai.complete("What is TypeScript?");
// Equivalent to:
// const { content } = await ai.chat({ messages: [{ role: "user", content: "..." }] });

Accepts all ChatRequest options except messages:

const text = await ai.complete("Translate to Spanish: Hello world", {
  temperature: 0.3,
  provider: "mistral",
});

`ai.getStatus()`

Returns the current ClientStatus with information about all providers.

const status = ai.getStatus();

console.log(status.strategy);       // "priority"
console.log(status.totalRequests);   // 42
console.log(status.totalTokens);     // 15230

for (const p of status.providers) {
  console.log(`${p.name}: configured=${p.configured}, available=${p.available}`);
  // p.cooldownUntil — timestamp if rate-limited, else null
  // p.usage — { minuteRequests, dayRequests, minuteTokens, dayTokens, ... }
}

`ai.on(event, handler)` / `ai.off(event, handler)`

Subscribe/unsubscribe to lifecycle events. See Events for all event types.

const handler = ({ from, to, error }) => {
  console.log(`Failover: ${from} -> ${to}`);
};

ai.on("failover", handler);
ai.off("failover", handler); // unsubscribe

`ai.destroy()`

Clean up timers, listeners, and internal state. Call this when you're done using the client.

ai.destroy();

`buildImageMessage(source, prompt?, options?)`

Async helper that builds a ChatMessage ready for vision analysis. Universal — works in Node, browsers, and React Native (no fs, no Buffer).

import { buildImageMessage } from "ai-failover";

Parameters:

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | source | string \| Blob \| ArrayBuffer \| Uint8Array | Yes | http(s):// URL, data: URI, a File/Blob (e.g. from <input type="file">), or raw bytes | | prompt | string | No | Text prompt for the image. Defaults to "Describe this image" | | options.mimeType | string | No | MIME type for binary sources. Defaults to image/jpeg |

Returns: Promise<ChatMessage> — a message with role: "user" containing a text + image content array.

Local files (Node only): use imageMessageFromFile from the ai-failover/node subpath — it reads the file and infers the MIME type from the extension:

import { imageMessageFromFile } from "ai-failover/node";

const msg = await imageMessageFromFile("./photo.jpg", "What do you see?");

Examples:

import { createAI, buildImageMessage } from "ai-failover";

const ai = createAI();

// --- Remote URL (universal) ---
const msg = await buildImageMessage(
  "https://example.com/chart.png",
  "Summarize this chart"
);
const response = await ai.chat({ messages: [msg] });

// --- Browser: file picker ---
const file = inputElement.files[0]; // File extends Blob
const msg2 = await buildImageMessage(file, "What's in this photo?");

// --- React Native / Expo: bytes from expo-file-system ---
const base64 = await FileSystem.readAsStringAsync(uri, { encoding: "base64" });
const msg3 = await buildImageMessage(`data:image/jpeg;base64,${base64}`, "Describe");

// --- Raw bytes with explicit MIME ---
const msg4 = await buildImageMessage(bytes, "Extract the total", { mimeType: "image/png" });

// --- Node: local file ---
import { imageMessageFromFile } from "ai-failover/node";
const msg5 = await imageMessageFromFile("./menu.jpg", "List all dishes with prices");

The returned ChatMessage has this structure:

{
  role: "user",
  content: [
    { type: "text", text: "Your prompt here" },
    { type: "image_url", image_url: { url: "data:image/jpeg;base64,/9j/4AAQ..." } },
  ],
}

Note: Vision requests are automatically routed to a vision-capable provider (Gemini, Groq Vision, OpenRouter, Mistral Pixtral). If no vision provider is configured, the request will fail.

Types Reference

All types are exported from "ai-failover" and can be imported for type safety:

import type {
  // Provider
  ProviderName,         // "groq" | "gemini" | "openrouter" | "mistral" | "cerebras" | "cohere"

  // Messages
  ChatRole,             // "system" | "user" | "assistant"
  TextContent,          // { type: "text"; text: string }
  ImageContent,         // { type: "image_url"; image_url: { url: string } }
  MessageContent,       // string | Array<TextContent | ImageContent>
  ChatMessage,          // { role: ChatRole; content: MessageContent }

  // Request / Response
  ChatRequest,          // messages + optional model, temperature, maxTokens, etc.
  ChatResponse,         // content, model, provider, usage, finishReason, latencyMs

  // Streaming
  ChatChunk,            // content, model, provider, finishReason, usage
  ChatStream,           // AsyncIterable<ChatChunk> + controller + toReadableStream()
  StreamCallbacks,      // onChunk, onDone, onError
  StreamRequest,        // ChatRequest & StreamCallbacks

  // Token tracking
  TokenUsage,           // { promptTokens, completionTokens, totalTokens }

  // Rate limits
  RateLimits,           // requestsPerMinute, requestsPerDay, tokensPerMinute, tokensPerDay
  RateLimitState,       // Current usage counters

  // Provider info
  ProviderCapabilities, // { vision, streaming, systemMessage, maxContextTokens }
  ModelInfo,            // { id, name, capabilities, rateLimits }
  Provider,             // Interface for provider implementations

  // Configuration
  FailoverStrategy,     // "priority" | "round-robin" | "least-used"
  ProviderConfig,       // { apiKey?, models?, enabled?, priority? }
  FailoverConfig,       // Full client configuration

  // Events & Status
  FailoverEvents,       // Event type map
  ProviderStatus,       // { name, configured, available, cooldownUntil, usage }
  ClientStatus,         // { providers, strategy, totalRequests, totalTokens }
} from "ai-failover";

Vision / Image Analysis

There are two ways to send images for analysis:

Option 1: `buildImageMessage` helper (recommended)

The simplest approach. Handles URL fetching, base64 encoding, and MIME detection:

import { createAI, buildImageMessage } from "ai-failover";

const ai = createAI();

const msg = await buildImageMessage("https://example.com/photo.jpg", "What's in this image?");
const response = await ai.chat({ messages: [msg] });
console.log(response.content);

// Local files in Node:
import { imageMessageFromFile } from "ai-failover/node";
const msg2 = await imageMessageFromFile("./photo.jpg", "What's in this image?");

Option 2: Manual `ChatMessage` construction

For when you already have the image data or need custom control:

const response = await ai.chat({
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "Describe this image" },
        {
          type: "image_url",
          image_url: { url: "data:image/jpeg;base64,/9j/4AAQ..." },
        },
      ],
    },
  ],
});

The image_url.url field accepts:

Data URIs: data:image/jpeg;base64,... (used by buildImageMessage)
HTTP URLs: Some providers support direct URLs, but base64 data URIs are the most portable

Streaming

// Async iteration
const stream = await ai.stream({
  messages: [{ role: "user", content: "Write a poem" }],
});

for await (const chunk of stream) {
  process.stdout.write(chunk.content);
}

// With callbacks
const stream = await ai.stream({
  messages: [{ role: "user", content: "Write a poem" }],
  onChunk(chunk) {
    process.stdout.write(chunk.content);
  },
  onDone(response) {
    console.log("Done!", response.usage);
  },
  onError(error) {
    console.error("Error:", error);
  },
});

for await (const _ of stream) {} // must consume

// Convert to ReadableStream (for HTTP responses in web servers)
const readable = stream.toReadableStream();

// Cancel mid-stream
stream.controller.abort();

Failover Strategies

| Strategy | Description | | ------------- | --------------------------------------------------- | | priority | Try providers in order of priority (default) | | round-robin | Rotate through providers evenly across requests | | least-used | Prefer the provider with the fewest past requests |

When a provider fails (rate-limit, error, timeout), the client automatically tries the next available provider according to the selected strategy. Failed providers enter a cooldown period.

Recovery behavior

The router is designed so that a provider that comes back is used again automatically — and one that is down stops taxing your latency:

429 with retry-after: the provider's own value is used verbatim — it always wins.
429 without retry-after: exponential backoff on consecutive hits — cooldownMs, ×2, ×4 … capped at maxCooldownMs (15 min default). A success resets the backoff.
Non-429 failures (5xx, timeout, network): after errorThreshold consecutive errors (default 2) the provider gets a short errorCooldownMs cooldown (30 s default), so an outage doesn't add a doomed first attempt to every request.
Stale local counters: daily usage is tracked in-process and may drift from the provider's real reset window. If every provider looks locally exhausted, the router probes the best candidate anyway (cooldowns from real 429s are still respected) — a successful probe resets its local counters. Local tracking is an optimization, never a permanent lockout.
Cooldown expiry: expired cooldowns clear automatically; with the priority strategy, a recovered provider goes back to being tried first.

Events

ai.on("failover", ({ from, to, error }) => {
  console.log(`Switched from ${from} to ${to}: ${error.message}`);
});

ai.on("rateLimit", ({ provider, retryAfterMs }) => {
  console.log(`${provider} rate limited for ${retryAfterMs}ms`);
});

ai.on("request", ({ provider, model }) => {
  console.log(`Sending request to ${provider}/${model}`);
});

ai.on("response", ({ provider, model, latencyMs, usage }) => {
  console.log(`${provider}/${model}: ${latencyMs}ms, ${usage.totalTokens} tokens`);
});

ai.on("error", ({ provider, error }) => {
  console.error(`Error from ${provider}: ${error.message}`);
});

ai.on("exhausted", ({ providers }) => {
  console.log(`All providers failed: ${providers.join(", ")}`);
});

Event types:

| Event | Payload | Description | |-------|---------|-------------| | failover | { from: ProviderName, to: ProviderName, error: Error } | Provider switch occurred | | rateLimit | { provider: ProviderName, retryAfterMs?: number } | Rate limit detected | | request | { provider: ProviderName, model: string } | Request sent | | response | { provider: ProviderName, model: string, latencyMs: number, usage: TokenUsage } | Response received | | error | { provider: ProviderName, error: Error } | Provider error | | exhausted | { providers: ProviderName[] } | All providers failed |

Error Handling

All error classes are exported and can be used for fine-grained catch logic:

import {
  AllProvidersExhaustedError,
  RateLimitError,
  AuthenticationError,
  TimeoutError,
  ProviderError,
  FailoverError,
} from "ai-failover";

try {
  const response = await ai.chat({
    messages: [{ role: "user", content: "Hello" }],
  });
} catch (error) {
  if (error instanceof AllProvidersExhaustedError) {
    // Every provider failed — check individual errors
    console.log("Providers tried:", error.providers);
    console.log("Errors:", error.errors);
  } else if (error instanceof RateLimitError) {
    console.log(`${error.provider} rate limited, retry in ${error.retryAfterMs}ms`);
  } else if (error instanceof AuthenticationError) {
    console.log(`Bad API key for ${error.provider}`);
  } else if (error instanceof TimeoutError) {
    console.log(`${error.provider} timed out after ${error.timeoutMs}ms`);
  }
}

Error hierarchy:

FailoverError (base)
├── ProviderError (single provider failure)
│   ├── RateLimitError    (HTTP 429)
│   ├── AuthenticationError (HTTP 401)
│   └── TimeoutError       (request timeout)
└── AllProvidersExhaustedError (all providers failed)

React Hooks

import { createAI } from "ai-failover";
import { useChat, useCompletion } from "ai-failover/react";

const client = createAI();

`useChat`

function ChatComponent() {
  const {
    messages,     // ChatMessageUI[] — message history with ids
    input,        // string — current input value
    setInput,     // (value: string) => void
    handleSubmit, // (e: FormEvent) => void
    isLoading,    // boolean
    error,        // Error | null
    stop,         // () => void — cancel current stream
    provider,     // ProviderName | null — last used provider
  } = useChat({ client });

  return (
    <form onSubmit={handleSubmit}>
      {messages.map((m) => (
        <div key={m.id}>
          <strong>{m.role}:</strong> {m.content}
        </div>
      ))}
      <input value={input} onChange={(e) => setInput(e.target.value)} />
      <button type="submit" disabled={isLoading}>Send</button>
      {isLoading && <button onClick={stop}>Stop</button>}
      {provider && <span>via {provider}</span>}
    </form>
  );
}

`useCompletion`

function CompletionComponent() {
  const { completion, complete, isLoading, stop } = useCompletion({ client });

  return (
    <div>
      <button onClick={() => complete("Write a haiku")} disabled={isLoading}>
        Generate
      </button>
      {isLoading && <button onClick={stop}>Stop</button>}
      <p>{completion}</p>
    </div>
  );
}

Note: React is an optional peer dependency. Hooks are imported from "ai-failover/react". They work in React Native too — pair them with httpTransport (see Integration Guide).

Known Limitations

Honest by design — these are accepted trade-offs, not bugs:

Serverless (Vercel/Lambda): rate-limit tracking and cooldowns live in process memory. Each cold instance starts fresh, so proactive tracking ("I already spent today's quota") degrades; reactive failover (provider answers 429 → try the next one) still works within every request. For full effectiveness run the handler on a persistent server (VPS, Railway, Fly, a long-lived Bun/Node process).
No mid-stream failover: if a provider dies after streaming began, the error is reported via onError — restarting with another provider would duplicate text the user already saw. Failover applies before the first token.
Free-tier ceiling: the library maximizes free availability and resilience, not frontier-model quality. For top quality, plug a paid key (e.g. OpenRouter) into the same client.
Catalog drift: providers deprecate free models and change quotas over time. providers[].models lets you point at new model ids from your app without waiting for a library update.

Using in Another Project (local link)

To use ai-failover locally without publishing to npm:

1. Build

cd /path/to/ai-failover
bun run build

This generates dist/ with ESM, CJS, and TypeScript declarations.

2. Link globally

bun link

3. Link in your project

cd /path/to/your-project
bun link ai-failover

4. Use it

import { createAI } from "ai-failover";
import { imageMessageFromFile } from "ai-failover/node";

const ai = createAI();

// Text chat
const res = await ai.chat({
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(res.content);

// Image analysis
const msg = await imageMessageFromFile("./photo.jpg", "What's in this photo?");
const res2 = await ai.chat({ messages: [msg] });
console.log(res2.content);

// Streaming
const stream = await ai.stream({
  messages: [{ role: "user", content: "Write a story" }],
  onChunk(chunk) { process.stdout.write(chunk.content); },
});
for await (const _ of stream) {}

ai.destroy();

// React hooks
import { createAI } from "ai-failover";
import { useChat, useCompletion } from "ai-failover/react";

const client = createAI();

function Chat() {
  const { messages, input, setInput, handleSubmit, isLoading } = useChat({ client });
  // ...
}

5. Environment variables

Add API keys in your project's .env or .env.local:

GROQ_API_KEY=gsk_...
GEMINI_API_KEY=...
CEREBRAS_API_KEY=...

Note: After modifying ai-failover source, run bun run build again to update the linked package.

Testing Locally

Create a .env.local file in the project root with at least one API key:

GROQ_API_KEY=gsk_your_key_here
# GEMINI_API_KEY=
# CEREBRAS_API_KEY=
# OPENROUTER_API_KEY=
# MISTRAL_API_KEY=
# COHERE_API_KEY=

Run the interactive chat:

bun run examples/chat.ts

Inside the chat, use image: prefix for vision:

You > Hello, how are you?
You > image:./photo.jpg What do you see?
You > image:https://example.com/img.png Describe this
You > exit

Run other examples:

bun run examples/basic-chat.ts
bun run examples/streaming.ts
bun run examples/custom-priority.ts
bun run examples/vision.ts
bun run examples/try-it.ts

Unit tests (no API keys required):

bun test

E2E tests (requires API keys):

LIVE_TESTS=1 bun test test/e2e/

Complete Integration Example

A full example showing how an agent or application can integrate ai-failover:

import { createAI } from "ai-failover";
import { imageMessageFromFile } from "ai-failover/node";
import type { ChatMessage, ChatResponse } from "ai-failover";

// 1. Initialize — auto-detects API keys from environment
const ai = createAI({
  strategy: "priority",
  maxRetries: 3,
});

// 2. Monitor events (optional)
ai.on("failover", ({ from, to, error }) => {
  console.warn(`Provider ${from} failed, switching to ${to}: ${error.message}`);
});

// 3. Simple text completion
const answer = await ai.complete("Explain what an API is in one sentence.");
console.log(answer);

// 4. Multi-turn conversation
const history: ChatMessage[] = [
  { role: "system", content: "You are a code reviewer." },
  { role: "user", content: "Review this function: const add = (a, b) => a + b;" },
];

const review = await ai.chat({ messages: history });
history.push({ role: "assistant", content: review.content });

// Continue the conversation
history.push({ role: "user", content: "Now add TypeScript types to it." });
const followUp = await ai.chat({ messages: history });
console.log(followUp.content);

// 5. Image analysis
const imgMsg = await imageMessageFromFile("./diagram.png", "Explain this architecture diagram");
const imgRes = await ai.chat({
  messages: [
    { role: "system", content: "You are a software architect." },
    imgMsg,
  ],
});
console.log(imgRes.content);

// 6. Streaming with real-time output
const streamMsg = await imageMessageFromFile("./receipt.jpg", "Extract all line items and total");
const stream = await ai.stream({
  messages: [streamMsg],
  onChunk(chunk) {
    process.stdout.write(chunk.content);
  },
  onDone(res) {
    console.log(`\n[${res.provider}/${res.model} — ${res.usage.totalTokens} tokens]`);
  },
});
for await (const _ of stream) {}

// 7. Check provider status
const status = ai.getStatus();
console.log(`Requests: ${status.totalRequests}, Tokens: ${status.totalTokens}`);

// 8. Clean up
ai.destroy();

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ai-failover

Table of Contents

Supported Providers

Platform Support

Installation

Quick Start

Integration Guide

Which mode do I need?

Recipe 1 — Node/Bun backend, CLI, or script (direct mode)

Recipe 2 — Next.js app (App Router): UI + API route in one project

Recipe 3 — React Native / Expo app + a backend for the keys

Recipe 4 — Vite/CRA SPA (backend on another origin)

Rules for AI coding agents

Environment Variables

API Reference

createAI(config?)

ai.chat(request)

ai.stream(request)

ai.complete(prompt, options?)

ai.getStatus()

ai.on(event, handler) / ai.off(event, handler)

ai.destroy()

buildImageMessage(source, prompt?, options?)

Types Reference

Vision / Image Analysis

Option 1: buildImageMessage helper (recommended)

Option 2: Manual ChatMessage construction

Streaming

Failover Strategies

Recovery behavior

Events

Error Handling

React Hooks

useChat

useCompletion

Known Limitations

Using in Another Project (local link)

1. Build

2. Link globally

3. Link in your project

4. Use it

5. Environment variables

Testing Locally

Complete Integration Example

License

`createAI(config?)`

`ai.chat(request)`

`ai.stream(request)`

`ai.complete(prompt, options?)`

`ai.getStatus()`

`ai.on(event, handler)` / `ai.off(event, handler)`

`ai.destroy()`

`buildImageMessage(source, prompt?, options?)`

Option 1: `buildImageMessage` helper (recommended)

Option 2: Manual `ChatMessage` construction

`useChat`

`useCompletion`