ai-retry

v1.9.1

Published

a day ago

Retry and fallback mechanisms for AI SDK

0High
0Medium
0Low

chriszirkel

ai ai-sdk fallback retry

ai-retry

Automatically handle API failures, content filtering, timeouts and other errors by switching between different AI models and providers.

ai-retry wraps the provided base model with a set of retry conditions (retryables). When a request fails with an error or the response is not satisfying, it iterates through the given retryables to find a suitable fallback model. It automatically tracks which models have been tried and how many attempts have been made to prevent infinite loops.

It supports two types of retries:

Error-based retries: when the model throws an error (e.g. timeouts, API errors, etc.)
Result-based retries: when the model returns a successful response that needs retrying (e.g. content filtering, etc.)

Installation

This library supports both AI SDK v5 and v6. The main branch reflects the latest stable version for AI SDK v6. See the v0 branch for the AI SDK v5 documentation.

[!WARNING] Version compatibility:
Use ai-retry version 0.x for AI SDK v5.
Use ai-retry version 1.x for AI SDK v6.

# AI SDK v5
npm install ai-retry@0

# AI SDK v6
npm install ai-retry@1

Usage

Create a retryable model by providing a base model and a list of retryables or fallback models. When an error occurs, it will evaluate each retryable in order and use the first one that indicates a retry should be attempted with a different model.

[!NOTE] ai-retry supports language models, embedding models, and image models.

import { openai } from '@ai-sdk/openai';
import { generateText, streamText } from 'ai';
import { createRetryable } from 'ai-retry';

// Create a retryable model
const retryableModel = createRetryable({
  // Base model
  model: openai('gpt-4-mini'),
  retries: [
    // Retry strategies and fallbacks...
  ],
});

// Use like any other AI SDK model
const result = await generateText({
  model: retryableModel,
  prompt: 'Hello world!',
});

console.log(result.text);

// Or with streaming
const result = streamText({
  model: retryableModel,
  prompt: 'Write a story about a robot...',
});

for await (const chunk of result.textStream) {
  console.log(chunk.text);
}

This also works with embedding models:

import { openai } from '@ai-sdk/openai';
import { embed } from 'ai';
import { createRetryable } from 'ai-retry';

// Create a retryable model
const retryableModel = createRetryable({
  // Base model
  model: openai.textEmbedding('text-embedding-3-large'),
  retries: [
    // Retry strategies and fallbacks...
  ],
});

// Use like any other AI SDK model
const result = await embed({
  model: retryableModel,
  value: 'Hello world!',
});

console.log(result.embedding);

This also works with image models:

import { openai } from '@ai-sdk/openai';
import { generateImage } from 'ai';
import { createRetryable } from 'ai-retry';

const retryableModel = createRetryable({
  model: openai.image('dall-e-3'),
  retries: [
    // Retry strategies and fallbacks...
  ],
});

const result = await generateImage({
  model: retryableModel,
  prompt: 'A sunset over mountains',
});

console.log(result.images);

Vercel AI Gateway

You can use ai-retry with Vercel AI Gateway by providing the model as a string. Internally, the model will be resolved with the default gateway provider instance from AI SDK.

import { gateway } from 'ai';
import { createRetryable } from 'ai-retry';

const retryableModel = createRetryable({
  model: 'openai/gpt-5',
  retries: ['anthropic/claude-sonnet-4'],
});

// Is the same as:
const retryableModel = createRetryable({
  model: gateway('openai/gpt-5'),
  retries: [gateway('anthropic/claude-sonnet-4')],
});

By default, the gateway provider resolves model strings as language models. If you want to use an embedding model, you need to use the textEmbeddingModel method.

import { gateway } from 'ai';
import { createRetryable } from 'ai-retry';

const retryableModel = createRetryable({
  model: gateway.textEmbeddingModel('openai/text-embedding-3-large'),
});

Retryables

The objects passed to the retries are called retryables and control the retry behavior. We can distinguish between two types of retryables:

Static retryables are simply models instances (language or embedding) that will always be used when an error occurs. They are also called fallback models.
Dynamic retryables are functions that receive the current attempt context (error/result and previous attempts) and decide whether to retry with a different model based on custom logic.

You can think of the retries array as a big if-else block, where each dynamic retryable is an if branch that can match a certain error/result condition, and static retryables are the else branches that match all other conditions. The analogy is not perfect, because the order of retryables matters because retries are evaluated in order until one matches:

import { generateText, streamText } from 'ai';
import { createRetryable } from 'ai-retry';

const retryableModel = createRetryable({
  // Base model
  model: openai('gpt-4'),
  // Retryables are evaluated top-down in order
  retries: [
    // Dynamic retryables act like if-branches:
    // If error.code == 429 (too many requests) happens, retry with this model
    (context) => {
      return context.current.error.statusCode === 429
        ? { model: azure('gpt-4-mini') } // Retry
        : undefined; // Skip
    },

    // If error.message ~= "service overloaded", retry with this model
    (context) => {
      return context.current.error.message.includes('service overloaded')
        ? { model: azure('gpt-4-mini') } // Retry
        : undefined; // Skip
    },

    // Static retryables act like else branches:
    // Else, always fallback to this model
    anthropic('claude-3-haiku-20240307'),
    // Same as:
    // { model: anthropic('claude-3-haiku-20240307'), maxAttempts: 1 }
  ],
});

In this example, if the base model fails with code 429 or a service overloaded error, it will retry with gpt-4-mini on Azure. In any other error case, it will fallback to claude-3-haiku-20240307 on Anthropic. If the order would be reversed, the static retryable would catch all errors first, and the dynamic retryable would never be reached.

Errors vs Results

Dynamic retryables can be further divided based on what triggers them:

Error-based retryables handle API errors where the request throws an error (e.g., timeouts, rate limits, service unavailable, etc.)
Result-based retryables handle successful responses that still need retrying (e.g., content filtering, guardrails, etc.)

Both types of retryables have the same interface and receive the current attempt as context. You can use the isErrorAttempt and isResultAttempt type guards to check the type of the current attempt.

import { generateText } from 'ai';
import { createRetryable, isErrorAttempt, isResultAttempt } from 'ai-retry';
import type { Retryable } from 'ai-retry';

// Error-based retryable: handles thrown errors (e.g., timeouts, rate limits)
const errorBasedRetry: Retryable = (context) => {
  if (isErrorAttempt(context.current)) {
    const { error } = context.current;
    // The request threw an error - e.g., network timeout, 429 rate limit
    console.log('Request failed with error:', error);
    return { model: anthropic('claude-3-haiku-20240307') };
  }
  return undefined;
};

// Result-based retryable: handles successful responses that need retrying
const resultBasedRetry: Retryable = (context) => {
  if (isResultAttempt(context.current)) {
    const { result } = context.current;
    // The request succeeded, but the response indicates a problem
    if (result.finishReason.unified === 'content-filter') {
      console.log('Content was filtered, trying different model');
      return { model: openai('gpt-4') };
    }
  }
  return undefined;
};

const retryableModel = createRetryable({
  model: azure('gpt-4-mini'),
  retries: [
    // Error-based: catches thrown errors like timeouts, rate limits, etc.
    errorBasedRetry,

    // Result-based: catches successful responses that need retrying
    resultBasedRetry,
  ],
});

Result-based retryables apply to language models for both generate (generateText, generateObject) and streaming (streamText, streamObject) calls. For streams, the retry decision happens when the upstream finish part arrives and only fires if no content has been emitted yet, so behavior like finishReason: 'content-filter' on an otherwise empty response can still trigger a fallback. Once any content chunk has been forwarded, the stream is committed and result-based retries are skipped.

Fallbacks

If you don't need precise error matching with custom logic and just want to fallback to different models on any error, you can simply provide a list of models.

[!NOTE] Use the object syntax { model: openai('gpt-4') } if you need to provide additional options like maxAttempts, delay, etc.

import { openai } from '@ai-sdk/openai';
import { generateText, streamText } from 'ai';
import { createRetryable } from 'ai-retry';

const retryableModel = createRetryable({
  // Base model
  model: openai('gpt-4-mini'),
  // List of fallback models
  retries: [
    openai('gpt-3.5-turbo'), // Fallback for first error
    // Same as:
    // { model: openai('gpt-3.5-turbo'), maxAttempts: 1 },

    anthropic('claude-3-haiku-20240307'), // Fallback for second error
    // Same as:
    // { model: anthropic('claude-3-haiku-20240307'), maxAttempts: 1 },
  ],
});

In this example, if the base model fails, it will retry with gpt-3.5-turbo. If that also fails, it will retry with claude-3-haiku-20240307. If that fails again, the whole retry process stops and a RetryError is thrown.

Custom

If you need more control over when to retry and which model to use, you can create your own custom retryable. This function is called with a context object containing the current attempt (error or result) and all previous attempts and needs to return a retry model or undefined to skip to the next retryable. The object you return from the retryable function is the same as the one you provide in the retries array.

[!NOTE] You can return additional options like maxAttempts, delay, etc. along with the model.

[!TIP] If you'd like the same flexibility with a typed, composable condition system, see Experimental: Composable Conditions.

import { anthropic } from '@ai-sdk/anthropic';
import { openai } from '@ai-sdk/openai';
import { APICallError } from 'ai';
import { createRetryable, isErrorAttempt } from 'ai-retry';
import type { Retryable } from 'ai-retry';

// Custom retryable that retries on rate limit errors (429)
const rateLimitRetry: Retryable = (context) => {
  // Only handle error attempts
  if (isErrorAttempt(context.current)) {
    // Get the error from the current attempt
    const { error } = context.current;

    // Check for rate limit error
    if (APICallError.isInstance(error) && error.statusCode === 429) {
      // Retry with a different model
      return { model: anthropic('claude-3-haiku-20240307') };
    }
  }

  // Skip to next retryable
  return undefined;
};

const retryableModel = createRetryable({
  // Base model
  model: openai('gpt-4-mini'),
  retries: [
    // Use custom rate limit retryable
    rateLimitRetry,

    // Other retryables...
  ],
});

In this example, if the base model fails with a 429 error, it will retry with claude-3-haiku-20240307. For any other error, it will skip to the next retryable (if any) or throw the original error.

All Retries Failed

If all retry attempts failed, a RetryError is thrown containing all individual errors. If no retry was attempted (e.g. because all retryables returned undefined), the original error is thrown directly.

import { RetryError } from 'ai';

const retryableModel = createRetryable({
  // Base model = first attempt
  model: azure('gpt-4-mini'),
  retries: [
    // Fallback model 1 = Second attempt
    openai('gpt-3.5-turbo'),
    // Fallback model 2 = Third attempt
    anthropic('claude-3-haiku-20240307'),
  ],
});

try {
  const result = await generateText({
    model: retryableModel,
    prompt: 'Hello world!',
  });
} catch (error) {
  // RetryError is an official AI SDK error
  if (error instanceof RetryError) {
    console.error('All retry attempts failed:', error.errors);
  } else {
    console.error('Request failed:', error);
  }
}

Errors are tracked per unique model (provider + modelId). That means on the first error, it will retry with gpt-3.5-turbo. If that also fails, it will retry with claude-3-haiku-20240307. If that fails again, the whole retry process stops and a RetryError is thrown.

Built-in Retryables

There are several built-in dynamic retryables available for common use cases:

[!TIP] You are missing a retryable for your use case? Open an issue and let's discuss it!

[!NOTE] Looking for a composable alternative? See Experimental: Composable Conditions for a condition().action() API that builds on small primitives.

contentFilterTriggered: Content filter was triggered based on the prompt or completion.
requestTimeout: Request timeout occurred.
requestNotRetryable: Request failed with a non-retryable error.
retryAfterDelay: Retry with delay and exponential backoff and respect retry-after headers.
serviceOverloaded: Response with status code 529 (service overloaded).
serviceUnavailable: Response with status code 503 (service unavailable).
schemaMismatch: Response JSON doesn't match the expected schema from structured output modes (Output.object(), Output.array(), Output.choice()).
noImageGenerated: Image generation failed with NoImageGeneratedError.

Content Filter

Automatically switch to a different model when content filtering blocks your request.

[!NOTE] For streaming requests this retryable can only fire if the content filter trips before any content has been emitted. Once a text chunk flows through, the stream is committed and the fallback is skipped.

import { contentFilterTriggered } from 'ai-retry/retryables';

const retryableModel = createRetryable({
  model: azure('gpt-4-mini'),
  retries: [
    contentFilterTriggered(openai('gpt-4-mini')), // Try OpenAI if Azure filters
  ],
});

Request Timeout

Handle timeouts by switching to potentially faster models.

[!NOTE] You need to use an abortSignal with a timeout on your request.

When a request times out, the requestTimeout retryable will automatically create a fresh abort signal for the retry attempt. This prevents the retry from immediately failing due to the already-aborted signal from the original request. If you do not provide a timeout value, a default of 60 seconds is used for the retry attempt.

import { requestTimeout } from 'ai-retry/retryables';

const retryableModel = createRetryable({
  model: azure('gpt-4'),
  retries: [
    // Defaults to 60 seconds timeout for the retry attempt
    requestTimeout(azure('gpt-4-mini')),

    // Or specify a custom timeout for the retry attempt
    requestTimeout(azure('gpt-4-mini'), { timeout: 30_000 }),
  ],
});

const result = await generateText({
  model: retryableModel,
  prompt: 'Write a vegetarian lasagna recipe for 4 people.',
  abortSignal: AbortSignal.timeout(60_000), // Original request timeout
});

Service Overloaded

Handle service overload errors (status code 529) by switching to a provider.

import { serviceOverloaded } from 'ai-retry/retryables';

const retryableModel = createRetryable({
  model: anthropic('claude-sonnet-4-0'),
  retries: [
    // Retry with delay and exponential backoff
    serviceOverloaded(anthropic('claude-sonnet-4-0'), {
      delay: 5_000,
      backoffFactor: 2,
      maxAttempts: 5,
    }),
    // Or switch to a different provider
    serviceOverloaded(openai('gpt-4')),
  ],
});

const result = streamText({
  model: retryableModel,
  prompt: 'Write a story about a robot...',
});

Service Unavailable

Handle service unavailable errors (status code 503) by switching to a different provider.

import { serviceUnavailable } from 'ai-retry/retryables';

const retryableModel = createRetryable({
  model: azure('gpt-4'),
  retries: [
    serviceUnavailable(openai('gpt-4')), // Switch to OpenAI if Azure is unavailable
  ],
});

No Image Generated

Handle image generation failures by switching to a different model.

import { openai } from '@ai-sdk/openai';
import { google } from '@ai-sdk/google';
import { generateImage } from 'ai';
import { createRetryable } from 'ai-retry';
import { noImageGenerated } from 'ai-retry/retryables';

const retryableModel = createRetryable({
  model: openai.image('dall-e-3'),
  retries: [
    noImageGenerated(google.image('gemini-3-pro-image-preview')), // Switch to Gemini if DALL-E fails to generate an image
  ],
});

const result = await generateImage({
  model: retryableModel,
  prompt: 'A sunset over mountains',
});

Request Not Retryable

Handle cases where the base model fails with a non-retryable error.

[!NOTE] You can check if an error is retryable with the isRetryable property on an APICallError.

import { requestNotRetryable } from 'ai-retry/retryables';

const retryable = createRetryable({
  model: azure('gpt-4-mini'),
  retries: [
    requestNotRetryable(openai('gpt-4')), // Switch provider if error is not retryable
  ],
});

Retry After Delay

If an error is retryable, such as 429 (Too Many Requests) or 503 (Service Unavailable) errors, it will be retried after a delay. The delay and exponential backoff can be configured. If the response contains a retry-after header, it will be prioritized over the configured delay.

Note that this retryable does not accept a model parameter, it will always retry the model from the latest failed attempt.

import { retryAfterDelay } from 'ai-retry/retryables';

const retryableModel = createRetryable({
  model: openai('gpt-4'), // Base model
  retries: [
    // Retry base model 3 times with fixed 2s delay
    retryAfterDelay({ delay: 2_000, maxAttempts: 3 }),

    // Or retry with exponential backoff (2s, 4s, 8s)
    retryAfterDelay({ delay: 2_000, backoffFactor: 2, maxAttempts: 3 }),

    // Or retry only if the response contains a retry-after header
    retryAfterDelay({ maxAttempts: 3 }),
  ],
});

By default, if a retry-after-ms or retry-after header is present in the response, it will be prioritized over the configured delay. The delay from the header will be capped at 60 seconds for safety.

Schema Mismatch

Automatically retry with a different model when the response JSON doesn't match the expected schema.

This is a result-based retryable that validates the model's JSON output against the schema set by structured output modes like Output.object(), Output.array(), and Output.choice(). Normally, schema validation happens outside the model in generateText, so a schema validation error would not be seen by the retryable model. This retryable catches it early and retries with a fallback model.

[!NOTE] This retryable works with generateText and any structured output mode that provides a schema: Output.object(), Output.array(), and Output.choice().

import { openai } from '@ai-sdk/openai';
import { anthropic } from '@ai-sdk/anthropic';
import { generateText, Output } from 'ai';
import { createRetryable } from 'ai-retry';
import { schemaMismatch } from 'ai-retry/retryables';
import { z } from 'zod';

const retryableModel = createRetryable({
  model: openai('gpt-4-mini'), // Weak base model
  retries: [
    // Retry with stronger model on schema mismatch
    schemaMismatch(openai('gpt-5')),
  ],
});

const result = await generateText({
  model: retryableModel,
  output: Output.object({
    schema: z.object({
      name: z.string(),
      age: z.number(),
    }),
  }),
  prompt: 'Generate a person with name and age.',
});

console.log(result.object); // { name: "Alice", age: 30 }

Experimental: Composable Conditions

[!WARNING] This API is experimental and may change. It is not exported from the package root; opt in via one of the per-model deep imports:
import { ... } from 'ai-retry/experimental/language-model';
import { ... } from 'ai-retry/experimental/image-model';
import { ... } from 'ai-retry/experimental/embedding-model';
Each entry point also re-exports createRetryable already typed for that model family, so you can either import everything from one path:
import {
  createRetryable,
  error,
  httpStatus,
} from 'ai-retry/experimental/language-model';
or pull retryables from the dedicated /retryables subpath:
import {
  error,
  httpStatus,
} from 'ai-retry/experimental/language-model/retryables';
// or
import * as retryables from 'ai-retry/experimental/language-model/retryables';

A condition().action() API for retryables. Conditions are built from small primitives (error(fn), result(fn)), composed with .and / .or / .not, and turned into a Retryable by one of two terminal actions: .switch({ model }) or .retry({ delay }). The result drops into the same retries: [...] array as the stable helpers, so you can mix the two styles freely.

import { anthropic } from '@ai-sdk/anthropic';
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';
import {
  createRetryable,
  error,
  finishReason,
  httpStatus,
} from 'ai-retry/experimental/language-model';

const retryableModel = createRetryable({
  model: openai('gpt-4'),
  retries: [
    // Switch on 529 or any "overloaded" message
    httpStatus(529, 'overloaded').switch({
      model: anthropic('claude-3-haiku-20240307'),
    }),

    // Switch when the response was content-filtered
    finishReason('content-filter').switch({ model: openai('gpt-4o') }),

    // Retry the same model with exponential backoff on retryable errors
    error.isRetryable(true).retry({ delay: 1_000, backoffFactor: 2 }),
  ],
});

Picking an entry point

Pick the entry point that matches the model you pass to createRetryable. Each module exposes the helpers that make sense for that model family already typed for it, so you don't need to add type annotations yourself.

Low-level conditions

The primitive builders error(...) and result(...) take a predicate and turn it into a condition; their namespaces bundle the most common field matchers on top.

| Helper | Matches when | Available in | | --------------------------------- | ------------------------------------------------------------------------------------ | ---------------------- | | error(predicate) | The current attempt failed and predicate(err, ctx) returns true | all three entry points | | error.isRetryable(flag) | APICallError.isRetryable === flag (default true) | all three entry points | | error.statusCode(...patterns) | Numbers match exactly; regex matches the stringified code (e.g. /^5\d\d$/ for 5xx) | all three entry points | | error.message(...patterns) | Substring (case-insensitive) or regex match against the error message | all three entry points | | result(predicate) | The current attempt succeeded and predicate(res, ctx) returns true | language-model only | | result.finishReason(...reasons) | The result's finishReason.unified matches one of the given values | language-model only |

import { APICallError } from 'ai';
import { error } from 'ai-retry/experimental/language-model';

error((e) => APICallError.isInstance(e) && e.statusCode === 418).switch({
  model: fallback,
});

High-level conditions

Convenience matchers built on top of the low-level ones for the common cases. Each returns a condition that you finalize with .switch(...) or .retry(...).

| Helper | language-model | image-model | embedding-model | | -------------------------- | :------------: | :---------: | :-------------: | | httpStatus(...patterns) | ✓ | ✓ | ✓ | | timeout() | ✓ | ✓ | ✓ | | aborted() | ✓ | ✓ | ✓ | | finishReason(...reasons) | ✓ | — | — | | schemaInvalid() | ✓ | — | — | | noImage() | — | ✓ | — |

What each one matches:

| Helper | Matches when | | -------------------------- | ------------------------------------------------------------------------------------------ | | httpStatus(...patterns) | Numbers match the status code; strings match the message (substring); regex matches either | | timeout() | Error.name === 'TimeoutError' (AbortSignal.timeout() fired) | | aborted() | Error.name === 'AbortError' (manual controller.abort()) | | finishReason(...reasons) | The result's finishReason.unified matches one of the given values | | schemaInvalid() | The result text fails JSON-schema validation against the call's responseFormat | | noImage() | The image model threw NoImageGeneratedError |

Each high-level helper is a thin wrapper around the low-level ones. For example, timeout() is roughly:

function timeout() {
  return error((err) => err instanceof Error && err.name === 'TimeoutError');
}

and finishReason(...) just delegates to result.finishReason(...):

function finishReason(...reasons: Array<string>) {
  return result.finishReason(...reasons);
}

Actions

Every condition exposes two terminal actions that turn it into a Retryable:

.switch({ model, ...options }) falls back to a different model when the condition matches. Optional fields (maxAttempts, delay, backoffFactor, timeout, options) are the same as on a normal Retry object. maxAttempts defaults to 1.
.retry({ delay?, backoffFactor?, maxAttempts?, ... }) retries the current model when the condition matches. Honors Retry-After and Retry-After-Ms response headers when present, capped at 60 seconds. maxAttempts defaults to 2 (one original attempt + one retry); values below 2 throw, since the retry budget is consumed by the original failure.

Combinators

Compose conditions with .and, .or, .not:

import { error, httpStatus } from 'ai-retry/experimental/language-model';

httpStatus(429).or(error.message('overloaded'));
httpStatus(503).and(error.message('temporary'));
error.isRetryable(true).not();

Mapping from Built-in retryables

Each stable retryable has an equivalent in the new shape (imports from ai-retry/experimental/language-model unless noted):

| Built-in | Composable form | | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- | | contentFilterTriggered(m) | error(/* check e.data.error.code === 'content_filter' */).or(finishReason('content-filter')).switch({ model: m }) | | requestTimeout(m) | timeout().switch({ model: m, timeout: 60_000 }) | | requestNotRetryable(m) | error.isRetryable(false).switch({ model: m }) | | schemaMismatch(m) | schemaInvalid().switch({ model: m }) | | serviceOverloaded(m) | httpStatus(529, 'overloaded').switch({ model: m }) | | serviceUnavailable(m) | error.statusCode(503).switch({ model: m }) | | noImageGenerated(m) | noImage().switch({ model: m }) (from image-model) | | retryAfterDelay({ delay, backoffFactor }) | error.isRetryable(true).retry({ delay, backoffFactor }) |

[!NOTE] error.isRetryable(true) matches whatever the AI SDK's APICallError marks retryable. By default that's status codes 408, 409, 429, and any 5xx, plus network errors and provider-specific overrides (e.g. Anthropic flips it on error.type === 'overloaded_error'). It picks up more cases than a manual status-code list.

Options

Disabling Retries

You can disable retries entirely, which is useful for testing or specific environments. When disabled, the base model will execute directly without any retry logic.

const retryableModel = createRetryable({
  model: openai('gpt-4'), // Base model
  retries: [
    /* ... */
  ],
  disabled: true, // Retries are completely disabled
});

// Or disable based on environment
const retryableModel = createRetryable({
  model: openai('gpt-4'), // Base model
  retries: [
    /* ... */
  ],
  disabled: process.env.NODE_ENV === 'test', // Disable in test environment
});

// Or use a function for dynamic control
const retryableModel = createRetryable({
  model: openai('gpt-4'), // Base model
  retries: [
    /* ... */
  ],
  disabled: () => !featureFlags.isEnabled('ai-retries'), // Check feature flag
});

Retry Delays

You can delay retries with an optional exponential backoff. The delay respects abort signals, so requests can still be cancelled during the delay period.

const retryableModel = createRetryable({
  model: openai('gpt-4'),
  retries: [
    // Retry model 3 times with fixed 2s delay
    { model: openai('gpt-4'), delay: 2_000, maxAttempts: 3 },

    // Or retry with exponential backoff (2s, 4s, 8s)
    { model: openai('gpt-4'), delay: 2_000, backoffFactor: 2, maxAttempts: 3 },
  ],
});

const result = await generateText({
  model: retryableModel,
  prompt: 'Write a vegetarian lasagna recipe for 4 people.',
  // Will be respected during delays
  abortSignal: AbortSignal.timeout(60_000),
});

You can also use delays with built-in retryables:

import { serviceOverloaded } from 'ai-retry/retryables';

const retryableModel = createRetryable({
  model: openai('gpt-4'),
  retries: [
    // Wait 5 seconds before retrying on service overload
    serviceOverloaded(openai('gpt-4'), { maxAttempts: 3, delay: 5_000 }),
  ],
});

Timeouts

When a retry specifies a timeout value, a fresh AbortSignal.timeout() is created for that retry attempt. If the original abortSignal is still alive, the fresh deadline is composed with it via AbortSignal.any() so user cancellation still works mid-retry. If the original signal is already aborted (for example it carried a request-level deadline that already fired), it is dropped so the retry runs against the fresh deadline alone.

If the original abortSignal is already aborted at the time of retry and the chosen retry does not supply a timeout, ai-retry rethrows the original error rather than firing a misleading retry against the dead signal. onError still fires for observability, but onRetry is skipped. Setting retry.timeout is the explicit opt-in for retrying past an aborted signal.

const retryableModel = createRetryable({
  model: openai('gpt-4'),
  retries: [
    // Provide a fresh 30 second timeout for the retry
    {
      model: openai('gpt-3.5-turbo'),
      timeout: 30_000,
    },
  ],
});

// Even if the original request times out, the retry gets a fresh signal
const result = await generateText({
  model: retryableModel,
  prompt: 'Write a story',
  // Original request timeout
  abortSignal: AbortSignal.timeout(60_000),
});

Max Attempts

By default, each retryable will only attempt to retry once per model to avoid infinite loops. You can customize this behavior by returning a maxAttempts value from your retryable function. Note that the initial request with the base model is counted as the first attempt.

const retryableModel = createRetryable({
  model: openai('gpt-4'),
  retries: [
    // Try this once
    anthropic('claude-3-haiku-20240307'),
    // Try this one more time (initial + 1 retry)
    { model: openai('gpt-4'), maxAttempts: 2 },
    // Already tried, won't be retried again
    anthropic('claude-3-haiku-20240307'),
  ],
});

The attempts are counted per unique model (provider + modelId). That means if multiple retryables return the same model, it won't be retried again once the maxAttempts is reached.

Provider Options

You can override provider-specific options for each retry attempt. This is useful when you want to use different configurations for fallback models.

const retryableModel = createRetryable({
  model: openai('gpt-5'),
  retries: [
    // Use different provider options for the retry
    {
      model: openai('gpt-4o-2024-08-06'),
      providerOptions: {
        openai: {
          user: 'fallback-user',
          structuredOutputs: false,
        },
      },
    },
  ],
});

// Original provider options are used for the first attempt
const result = await generateText({
  model: retryableModel,
  prompt: 'Write a story',
  providerOptions: {
    openai: {
      user: 'primary-user',
    },
  },
});

The retry's providerOptions will completely replace the original ones during retry attempts. This works for all model types (language and embedding) and all operations (generate, stream, embed).

Call Options

You can override various call options when retrying requests. This is useful for adjusting parameters like temperature, max tokens, or even the prompt itself for retry attempts. Call options are specified in the options field of the retry object.

const retryableModel = createRetryable({
  model: openai('gpt-4'),
  retries: [
    {
      model: anthropic('claude-3-haiku'),
      options: {
        // Override generation parameters for more deterministic output
        temperature: 0.3,
        topP: 0.9,
        maxOutputTokens: 500,
        // Set a seed for reproducibility
        seed: 42,
      },
    },
  ],
});

The following options can be overridden:

[!NOTE] Override options completely replace the original values (they are not merged). If you don't specify an option, the original value from the request is used.

Language Model Options

| Option | Description | | -------------------------------------------------------------------------------------------------- | ---------------------------------------------- | | prompt | Override the entire prompt for the retry | | temperature | Temperature setting for controlling randomness | | topP | Nucleus sampling parameter | | topK | Top-K sampling parameter | | maxOutputTokens | Maximum number of tokens to generate | | seed | Random seed for deterministic generation | | stopSequences | Stop sequences to end generation | | presencePenalty | Presence penalty for reducing repetition | | frequencyPenalty | Frequency penalty for reducing repetition | | headers | Additional HTTP headers | | providerOptions | Provider-specific options |

Embedding Model Options

| Option | Description | | ---------------------------------------------------------------------------------------- | ---------------------------- | | values | Override the values to embed | | headers | Additional HTTP headers | | providerOptions | Provider-specific options |

Image Model Options

| Option | Description | | ------------------------------------------------------------------------------------------------- | -------------------------------- | | n | Number of images to generate | | size | Size of generated images | | aspectRatio | Aspect ratio of generated images | | seed | Random seed for reproducibility | | headers | Additional HTTP headers | | providerOptions | Provider-specific options |

Dynamic Call Options

You can also override call options dynamically from inside the onRetry callback, instead of declaring them statically on the retry object. This is useful when the override depends on something only known at runtime, like the prompt that just failed, the model that's about to be tried next, or the error that triggered the retry. The overrides apply to the upcoming retry attempt only, and can change the same fields as the static options on a retry. The callback may also be async if computing the override needs to do work (e.g. fetching a fresh credential).

A common use case is sanitizing provider-scoped metadata when falling back to a different provider, for example stripping providerOptions.azure.itemId references from the previous prompt before retrying on OpenAI:

import { createRetryable } from 'ai-retry';
import { azure } from '@ai-sdk/azure';
import { openai } from '@ai-sdk/openai';

const retryableModel = createRetryable({
  model: azure('gpt-5-chat'),
  retries: [openai('gpt-5-chat')],
  onRetry: (context) => {
    const { current, attempts } = context;
    const previous = attempts.at(-1);

    if (current.model.provider !== previous.model.provider) {
      // Strip provider-scoped metadata from the prompt before retrying on a different provider
      return {
        options: {
          prompt: stripProviderMetadata(current.options.prompt),
        },
      };
    }
  },
});

Inside the onRetry callback, context.current.model is the model that's about to be tried next, while context.current.options and context.current.error describe the failed attempt that triggered the retry. The previous model is available at context.attempts.at(-1).model.

onRetry may also be async, which is useful if computing the override needs to do work (e.g. fetching a fresh credential):

const retryableModel = createRetryable({
  model: openai('gpt-4o-mini'),
  retries: [anthropic('claude-sonnet-4-20250514')],
  onRetry: async (context) => {
    const { current } = context;

    const headers = await refreshAuthHeaders(current.model.provider);
    return { options: { headers } };
  },
});

Precedence for the upcoming retry attempt (highest to lowest):

The value returned from onRetry
The options returned from the retryable
The original call options from the request

Logging

You can use the following callbacks to log retry attempts and errors:

onError is invoked if an error occurs.
onRetry is invoked before attempting a retry.
onSuccess is invoked after a successful request with the model that handled it.

const retryableModel = createRetryable({
  model: openai('gpt-4-mini'),
  retries: [
    /* your retryables */
  ],
  onError: (context) => {
    console.error(
      `Attempt ${context.attempts.length} with ${context.current.model.provider}/${context.current.model.modelId} failed:`,
      context.current.error,
    );
  },
  onRetry: (context) => {
    console.log(
      `Retrying attempt ${context.attempts.length + 1} with model ${context.current.model.provider}/${context.current.model.modelId}...`,
    );
  },
  onSuccess: (context) => {
    console.log(
      `Request handled by ${context.current.model.provider}/${context.current.model.modelId}`,
    );
  },
});

Reset

By default, every new request starts with the base model, even if a previous request was retried with a different model. The reset option changes this behavior by making the last successfully retried model sticky, that means subsequent requests will continue using that model instead of switching back to the base model. The reset value controls how long the retry model stays sticky before resetting back to the base model.

| Value | Description | | ------------------ | ------------------------------------------------------------ | | after-request | Reset immediately after the next request (default) | | after-N-requests | Keep the retry model for the next N requests, then reset | | after-N-seconds | Keep the retry model for N seconds, then reset |

Reset after each request (default)

const retryableModel = createRetryable({
  model: openai('gpt-4o-mini'),
  retries: [anthropic('claude-sonnet-4-20250514')],
  reset: 'after-request', // default: always start with the base model
});

Keep the retry model for N requests

const retryableModel = createRetryable({
  model: openai('gpt-4o-mini'),
  retries: [anthropic('claude-sonnet-4-20250514')],
  reset: 'after-5-requests', // use the retry model for 5 more requests before resetting
});

Keep the retry model for N seconds

const retryableModel = createRetryable({
  model: openai('gpt-4o-mini'),
  retries: [anthropic('claude-sonnet-4-20250514')],
  reset: 'after-30-seconds', // use the retry model for 30 seconds before resetting
});

Telemetry

[!NOTE] Experimental: Span names and attributes may change in patch versions.

ai-retry can emit OpenTelemetry spans for each request and every retry attempt. The spans are created on the active OpenTelemetry context, so they nest automatically under the AI SDK's own spans (e.g. ai.generateText.doGenerate) when you also enable experimental_telemetry on generateText/streamText. A single trace then shows the individual attempts — which model each used, why it was retried, and the backoff between them — that the SDK's own span otherwise hides.

Setup

Telemetry uses the optional peer dependency @opentelemetry/api (already present if you use the AI SDK). Register an OpenTelemetry SDK once at startup, then opt in per model:

import { createRetryable } from 'ai-retry';

const retryableModel = createRetryable({
  model: openai('gpt-4o'),
  retries: [anthropic('claude-sonnet-4-5')],
  experimental_telemetry: { isEnabled: true },
});

The settings mirror the AI SDK's experimental_telemetry shape:

interface RetryTelemetrySettings {
  isEnabled?: boolean; // off by default while experimental
  tracer?: Tracer; // defaults to trace.getTracer('ai-retry')
  functionId?: string; // groups telemetry by function
  metadata?: Record<string, AttributeValue>;
}

Spans are emitted only when isEnabled is true. By default the global tracer is used, which is a no-op until an OpenTelemetry SDK is registered — so enabling it in code that runs without an SDK has no effect and no cost.

[!NOTE] Prompts and generated content are not recorded — only metadata (models, outcomes, errors, timing). The AI SDK's own telemetry records the prompt/response on its spans when you enable recordInputs/recordOutputs.

Spans

Each request creates one operation span (ai_retry.doGenerate, ai_retry.doStream, or ai_retry.doEmbed) with one child ai_retry.attempt span per attempt:

ai_retry.doGenerate            outcome=success, attempts=2
├─ ai_retry.attempt #1         outcome=retry,   type=error   (529 → fallback)
└─ ai_retry.attempt #2         outcome=success, type=result

Operation span attributes:

| Attribute | Description | | --------------------------------------------------------------- | ---------------------------------------------------------------------------- | | ai_retry.operation | doGenerate, doStream, or doEmbed | | ai_retry.outcome | success or failure | | ai_retry.attempts | total number of attempts | | ai_retry.model.start | the model the request started with (provider/modelId) | | ai_retry.model.final | the model that produced the final outcome | | ai_retry.error.{name,message,status,cause.name,cause.message,cause.status} | the failing error (on failure); status when it carries an HTTP status code | | ai_retry.function.id, ai_retry.metadata.* | from the telemetry settings |

Attempt span (ai_retry.attempt) attributes:

| Attribute | Description | | ----------------------------------------------------------------------- | ------------------------------------------------------------------------ | | ai_retry.attempt.number | 1-based attempt index | | ai_retry.attempt.model | model used (provider/modelId) | | ai_retry.attempt.outcome | success, retry, or failure | | ai_retry.attempt.type | result or error | | ai_retry.attempt.finish_reason | finish reason (result attempts) | | ai_retry.attempt.delay_ms | backoff scheduled before the next attempt | | ai_retry.attempt.timeout_ms | timeout budget, when the retry set one | | ai_retry.attempt.error.{name,message,status,cause.name,cause.message,cause.status} | the error (error attempts); status when it carries an HTTP status code |

Attempt spans also carry the standard gen_ai.request.model / gen_ai.provider.name attributes so observability tools (Langfuse, etc.) recognize and render them.

[!NOTE] Streaming: retries only happen before the first content chunk (see Streaming), so a ai_retry.doStream attempt is marked success once content begins flowing; mid-stream retries appear as additional attempt spans.

See examples/telemetry for a runnable example that exports to Langfuse.

Streaming

Errors during streaming requests can occur in two ways:

When the stream is initially created (e.g. network error, API error, etc.) by calling streamText.
While the stream is being processed (e.g. timeout, API error, etc.) by reading from the returned result.textStream async iterable.

In the second case, errors during stream processing will not always be retried, because the stream might have already emitted some actual content and the consumer might have processed it. Retrying will be stopped as soon as the first content chunk (e.g. types of text-delta, tool-call, etc.) is emitted. The type of chunks considered as content are the same as the ones that are passed to onChunk().

[!IMPORTANT] Streaming limitation: Retries and fallbacks only apply before the first content chunk is emitted. Once streaming begins delivering content, the response is committed to the current model. Mid-stream errors will propagate to the caller rather than triggering a fallback. If reliable retries are critical for your use case, consider using generateText instead of streamText.

Preamble buffering

Every stream begins with a non-content preamble (stream-start, then optionally response-metadata and text-start / reasoning-start) that providers emit as soon as the response headers arrive, before any content flows. Because a retry can still happen during this window, ai-retry does not forward the preamble immediately. It buffers the leading non-content parts and flushes them only when the first content chunk arrives (or when the stream finishes with no content). If a retry fires before any content, the buffered preamble is discarded and replaced by the fallback's, so the consumer always sees exactly one preamble — the one belonging to the model that actually produced the output, with its own warnings and response-metadata. Without this, a fallback's stream-start would be emitted a second time after the primary's, which some consumers (e.g. streamText) reject.

[!NOTE] One side effect: the consumer's "stream started" signal now arrives at first-content time rather than when the response headers arrive (typically a sub-second difference). For UIs that show a typing indicator off stream-start this is negligible.

API Reference

`createRetryable(options: RetryableModelOptions): LanguageModelV3 | EmbeddingModelV3 | ImageModelV3`

Creates a retryable model that works with language models, embedding models, and image models.

interface RetryableModelOptions<
  MODEL extends LanguageModelV3 | EmbeddingModelV3 | ImageModelV3,
> {
  model: MODEL;
  retries: Array<Retryable<MODEL> | MODEL>;
  disabled?: boolean | (() => boolean);
  reset?: Reset;
  experimental_telemetry?: RetryTelemetrySettings;
  onError?: (context: RetryContext<MODEL>) => void;
  onRetry?: (
    context: RetryContext<MODEL>,
  ) => void | OnRetryOverrides<MODEL> | Promise<void | OnRetryOverrides<MODEL>>;
  onSuccess?: (context: SuccessContext<MODEL>) => void;
}

Options:

model: The base model to use for the initial request.
retries: Array of retryables (functions, models, or retry objects) to attempt on failure.
disabled: Disable all retry logic. Can be a boolean or function returning boolean. Default: false (retries enabled).
reset: Controls when to reset back to the base model after a successful retry. Default: after-request.
experimental_telemetry: OpenTelemetry instrumentation for retries. Off by default. See Telemetry.
onError: Callback invoked when an error occurs.
onRetry: Callback invoked before attempting a retry. May optionally return an OnRetryOverrides object (or a Promise of one) to override options.* for the upcoming attempt only. See Dynamic Call Options via onRetry.
onSuccess: Callback invoked after a successful request. Receives the model that handled the request and all previous attempts.

`Reset`

Controls when the sticky model resets back to the base model after a successful retry.

type Reset =
  | 'after-request'
  | `after-${number}-requests`
  | `after-${number}-seconds`;

after-request — reset immediately after the next request (default).
after-N-requests — keep the retry model for the next N requests, then reset.
after-N-seconds — keep the retry model for N seconds, then reset.

`Retryable`

A Retryable is a function that receives a RetryContext with the current error or result and model and all previous attempts. It should evaluate the error/result and decide whether to retry by returning a Retry or to skip by returning undefined.

type Retryable = (context: RetryContext) => Retry | Promise<Retry> | undefined;

`Retry`

A Retry specifies the model to retry and optional settings. The available options depend on the model type (language model, embedding model, or image model).

interface Retry {
  model: LanguageModelV3 | EmbeddingModelV3 | ImageModelV3;
  maxAttempts?: number; // Maximum retry attempts per model (default: 1)
  delay?: number; // Delay in milliseconds before retrying
  backoffFactor?: number; // Multiplier for exponential backoff
  timeout?: number; // Timeout in milliseconds for the retry attempt
  providerOptions?: ProviderOptions; // @deprecated - use options.providerOptions instead
  options?:
    | LanguageModelV3CallOptions
    | EmbeddingModelV3CallOptions
    | ImageModelV3CallOptions; // Call options to override for this retry
}

`RetryContext`

The RetryContext object contains information about the current attempt and all previous attempts.

interface RetryContext {
  current: RetryAttempt;
  attempts: Array<RetryAttempt>;
}

`SuccessContext`

The SuccessContext object is passed to the onSuccess callback after a successful request.

interface SuccessContext {
  current: SuccessAttempt;
  attempts: Array<RetryAttempt>;
}

`SuccessAttempt`

A SuccessAttempt represents the successful attempt with the model, result, and call options used. The result type depends on the model type.

interface SuccessAttempt {
  type: 'success';
  model: LanguageModelV3 | EmbeddingModelV3 | ImageModelV3;
  result:
    | LanguageModelResult
    | LanguageModelStream
    | EmbeddingModelEmbed
    | ImageModelGenerate;
  options:
    | LanguageModelV3CallOptions
    | EmbeddingModelV3CallOptions
    | ImageModelV3CallOptions;
}

`RetryAttempt`

A RetryAttempt represents a single attempt with a specific model, which can be either an error or a successful result that triggered a retry. Each attempt includes the call options that were used for that specific attempt. For retry attempts, this will reflect any overridden options from the retry configuration.

// For language, embedding, and image models
type RetryAttempt =
  | {
      type: 'error';
      error: unknown;
      model: LanguageModelV3 | EmbeddingModelV3 | ImageModelV3;
      options:
        | LanguageModelV3CallOptions
        | EmbeddingModelV3CallOptions
        | ImageModelV3CallOptions;
    }
  | {
      type: 'result';
      result: LanguageModelResult;
      model: LanguageModelV3;
      options: LanguageModelV3CallOptions;
    };

// Note: Result-based retries only apply to language models (both generate and stream paths). They do not apply to embedding or image models. For streaming, retries are only possible before any content has been emitted; once a text-delta flows through, the stream is committed.

// Type guards for discriminating attempts
function isErrorAttempt(attempt: RetryAttempt): attempt is RetryErrorAttempt;
function isResultAttempt(attempt: RetryAttempt): attempt is RetryResultAttempt;

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ai-retry

Installation

Usage

Vercel AI Gateway

Retryables

Errors vs Results

Fallbacks

Custom

All Retries Failed

Built-in Retryables

Content Filter

Request Timeout

Service Overloaded

Service Unavailable

No Image Generated

Request Not Retryable

Retry After Delay

Schema Mismatch

Experimental: Composable Conditions

Picking an entry point

Low-level conditions

High-level conditions

Actions

Combinators

Mapping from Built-in retryables

Options

Disabling Retries

Retry Delays

Timeouts

Max Attempts

Provider Options

Call Options

Language Model Options

Embedding Model Options

Image Model Options

Dynamic Call Options

Logging

Reset

Reset after each request (default)

Keep the retry model for N requests

Keep the retry model for N seconds

Telemetry

Setup

Spans

Streaming

Preamble buffering

API Reference

createRetryable(options: RetryableModelOptions): LanguageModelV3 | EmbeddingModelV3 | ImageModelV3

Reset

Retryable

Retry

RetryContext

SuccessContext

SuccessAttempt

RetryAttempt

License

`createRetryable(options: RetryableModelOptions): LanguageModelV3 | EmbeddingModelV3 | ImageModelV3`

`Reset`

`Retryable`

`Retry`

`RetryContext`

`SuccessContext`

`SuccessAttempt`

`RetryAttempt`