ai-retry
v1.7.1
Published
Retry and fallback mechanisms for AI SDK
Readme
ai-retry
Automatically handle API failures, content filtering, timeouts and other errors by switching between different AI models and providers.
ai-retry wraps the provided base model with a set of retry conditions (retryables). When a request fails with an error or the response is not satisfying, it iterates through the given retryables to find a suitable fallback model. It automatically tracks which models have been tried and how many attempts have been made to prevent infinite loops.
It supports two types of retries:
- Error-based retries: when the model throws an error (e.g. timeouts, API errors, etc.)
- Result-based retries: when the model returns a successful response that needs retrying (e.g. content filtering, etc.)
Installation
This library supports both AI SDK v5 and v6. The main branch reflects the latest stable version for AI SDK v6. See the v0 branch for the AI SDK v5 documentation.
[!WARNING] Version compatibility:
- Use
ai-retryversion 0.x for AI SDK v5.- Use
ai-retryversion 1.x for AI SDK v6.
# AI SDK v5
npm install ai-retry@0
# AI SDK v6
npm install ai-retry@1Usage
Create a retryable model by providing a base model and a list of retryables or fallback models. When an error occurs, it will evaluate each retryable in order and use the first one that indicates a retry should be attempted with a different model.
[!NOTE]
ai-retrysupports language models, embedding models, and image models.
import { openai } from '@ai-sdk/openai';
import { generateText, streamText } from 'ai';
import { createRetryable } from 'ai-retry';
// Create a retryable model
const retryableModel = createRetryable({
// Base model
model: openai('gpt-4-mini'),
retries: [
// Retry strategies and fallbacks...
],
});
// Use like any other AI SDK model
const result = await generateText({
model: retryableModel,
prompt: 'Hello world!',
});
console.log(result.text);
// Or with streaming
const result = streamText({
model: retryableModel,
prompt: 'Write a story about a robot...',
});
for await (const chunk of result.textStream) {
console.log(chunk.text);
}This also works with embedding models:
import { openai } from '@ai-sdk/openai';
import { embed } from 'ai';
import { createRetryable } from 'ai-retry';
// Create a retryable model
const retryableModel = createRetryable({
// Base model
model: openai.textEmbedding('text-embedding-3-large'),
retries: [
// Retry strategies and fallbacks...
],
});
// Use like any other AI SDK model
const result = await embed({
model: retryableModel,
value: 'Hello world!',
});
console.log(result.embedding);This also works with image models:
import { openai } from '@ai-sdk/openai';
import { generateImage } from 'ai';
import { createRetryable } from 'ai-retry';
const retryableModel = createRetryable({
model: openai.image('dall-e-3'),
retries: [
// Retry strategies and fallbacks...
],
});
const result = await generateImage({
model: retryableModel,
prompt: 'A sunset over mountains',
});
console.log(result.images);Vercel AI Gateway
You can use ai-retry with Vercel AI Gateway by providing the model as a string. Internally, the model will be resolved with the default gateway provider instance from AI SDK.
import { gateway } from 'ai';
import { createRetryable } from 'ai-retry';
const retryableModel = createRetryable({
model: 'openai/gpt-5',
retries: ['anthropic/claude-sonnet-4'],
});
// Is the same as:
const retryableModel = createRetryable({
model: gateway('openai/gpt-5'),
retries: [gateway('anthropic/claude-sonnet-4')],
});By default, the gateway provider resolves model strings as language models. If you want to use an embedding model, you need to use the textEmbeddingModel method.
import { gateway } from 'ai';
import { createRetryable } from 'ai-retry';
const retryableModel = createRetryable({
model: gateway.textEmbeddingModel('openai/text-embedding-3-large'),
});Retryables
The objects passed to the retries are called retryables and control the retry behavior. We can distinguish between two types of retryables:
- Static retryables are simply models instances (language or embedding) that will always be used when an error occurs. They are also called fallback models.
- Dynamic retryables are functions that receive the current attempt context (error/result and previous attempts) and decide whether to retry with a different model based on custom logic.
You can think of the retries array as a big if-else block, where each dynamic retryable is an if branch that can match a certain error/result condition, and static retryables are the else branches that match all other conditions. The analogy is not perfect, because the order of retryables matters because retries are evaluated in order until one matches:
import { generateText, streamText } from 'ai';
import { createRetryable } from 'ai-retry';
const retryableModel = createRetryable({
// Base model
model: openai('gpt-4'),
// Retryables are evaluated top-down in order
retries: [
// Dynamic retryables act like if-branches:
// If error.code == 429 (too many requests) happens, retry with this model
(context) => {
return context.current.error.statusCode === 429
? { model: azure('gpt-4-mini') } // Retry
: undefined; // Skip
},
// If error.message ~= "service overloaded", retry with this model
(context) => {
return context.current.error.message.includes('service overloaded')
? { model: azure('gpt-4-mini') } // Retry
: undefined; // Skip
},
// Static retryables act like else branches:
// Else, always fallback to this model
anthropic('claude-3-haiku-20240307'),
// Same as:
// { model: anthropic('claude-3-haiku-20240307'), maxAttempts: 1 }
],
});In this example, if the base model fails with code 429 or a service overloaded error, it will retry with gpt-4-mini on Azure. In any other error case, it will fallback to claude-3-haiku-20240307 on Anthropic. If the order would be reversed, the static retryable would catch all errors first, and the dynamic retryable would never be reached.
Errors vs Results
Dynamic retryables can be further divided based on what triggers them:
- Error-based retryables handle API errors where the request throws an error (e.g., timeouts, rate limits, service unavailable, etc.)
- Result-based retryables handle successful responses that still need retrying (e.g., content filtering, guardrails, etc.)
Both types of retryables have the same interface and receive the current attempt as context. You can use the isErrorAttempt and isResultAttempt type guards to check the type of the current attempt.
import { generateText } from 'ai';
import { createRetryable, isErrorAttempt, isResultAttempt } from 'ai-retry';
import type { Retryable } from 'ai-retry';
// Error-based retryable: handles thrown errors (e.g., timeouts, rate limits)
const errorBasedRetry: Retryable = (context) => {
if (isErrorAttempt(context.current)) {
const { error } = context.current;
// The request threw an error - e.g., network timeout, 429 rate limit
console.log('Request failed with error:', error);
return { model: anthropic('claude-3-haiku-20240307') };
}
return undefined;
};
// Result-based retryable: handles successful responses that need retrying
const resultBasedRetry: Retryable = (context) => {
if (isResultAttempt(context.current)) {
const { result } = context.current;
// The request succeeded, but the response indicates a problem
if (result.finishReason.unified === 'content-filter') {
console.log('Content was filtered, trying different model');
return { model: openai('gpt-4') };
}
}
return undefined;
};
const retryableModel = createRetryable({
model: azure('gpt-4-mini'),
retries: [
// Error-based: catches thrown errors like timeouts, rate limits, etc.
errorBasedRetry,
// Result-based: catches successful responses that need retrying
resultBasedRetry,
],
});Result-based retryables apply to language models for both generate (generateText, generateObject) and streaming (streamText, streamObject) calls. For streams, the retry decision happens when the upstream finish part arrives and only fires if no content has been emitted yet, so behavior like finishReason: 'content-filter' on an otherwise empty response can still trigger a fallback. Once any content chunk has been forwarded, the stream is committed and result-based retries are skipped.
Fallbacks
If you don't need precise error matching with custom logic and just want to fallback to different models on any error, you can simply provide a list of models.
[!NOTE] Use the object syntax
{ model: openai('gpt-4') }if you need to provide additional options likemaxAttempts,delay, etc.
import { openai } from '@ai-sdk/openai';
import { generateText, streamText } from 'ai';
import { createRetryable } from 'ai-retry';
const retryableModel = createRetryable({
// Base model
model: openai('gpt-4-mini'),
// List of fallback models
retries: [
openai('gpt-3.5-turbo'), // Fallback for first error
// Same as:
// { model: openai('gpt-3.5-turbo'), maxAttempts: 1 },
anthropic('claude-3-haiku-20240307'), // Fallback for second error
// Same as:
// { model: anthropic('claude-3-haiku-20240307'), maxAttempts: 1 },
],
});In this example, if the base model fails, it will retry with gpt-3.5-turbo. If that also fails, it will retry with claude-3-haiku-20240307. If that fails again, the whole retry process stops and a RetryError is thrown.
Custom
If you need more control over when to retry and which model to use, you can create your own custom retryable. This function is called with a context object containing the current attempt (error or result) and all previous attempts and needs to return a retry model or undefined to skip to the next retryable. The object you return from the retryable function is the same as the one you provide in the retries array.
[!NOTE] You can return additional options like
maxAttempts,delay, etc. along with the model.
[!TIP] If you'd like the same flexibility with a typed, composable condition system, see Experimental: Composable Conditions.
import { anthropic } from '@ai-sdk/anthropic';
import { openai } from '@ai-sdk/openai';
import { APICallError } from 'ai';
import { createRetryable, isErrorAttempt } from 'ai-retry';
import type { Retryable } from 'ai-retry';
// Custom retryable that retries on rate limit errors (429)
const rateLimitRetry: Retryable = (context) => {
// Only handle error attempts
if (isErrorAttempt(context.current)) {
// Get the error from the current attempt
const { error } = context.current;
// Check for rate limit error
if (APICallError.isInstance(error) && error.statusCode === 429) {
// Retry with a different model
return { model: anthropic('claude-3-haiku-20240307') };
}
}
// Skip to next retryable
return undefined;
};
const retryableModel = createRetryable({
// Base model
model: openai('gpt-4-mini'),
retries: [
// Use custom rate limit retryable
rateLimitRetry,
// Other retryables...
],
});In this example, if the base model fails with a 429 error, it will retry with claude-3-haiku-20240307. For any other error, it will skip to the next retryable (if any) or throw the original error.
All Retries Failed
If all retry attempts failed, a RetryError is thrown containing all individual errors.
If no retry was attempted (e.g. because all retryables returned undefined), the original error is thrown directly.
import { RetryError } from 'ai';
const retryableModel = createRetryable({
// Base model = first attempt
model: azure('gpt-4-mini'),
retries: [
// Fallback model 1 = Second attempt
openai('gpt-3.5-turbo'),
// Fallback model 2 = Third attempt
anthropic('claude-3-haiku-20240307'),
],
});
try {
const result = await generateText({
model: retryableModel,
prompt: 'Hello world!',
});
} catch (error) {
// RetryError is an official AI SDK error
if (error instanceof RetryError) {
console.error('All retry attempts failed:', error.errors);
} else {
console.error('Request failed:', error);
}
}Errors are tracked per unique model (provider + modelId). That means on the first error, it will retry with gpt-3.5-turbo. If that also fails, it will retry with claude-3-haiku-20240307. If that fails again, the whole retry process stops and a RetryError is thrown.
Built-in Retryables
There are several built-in dynamic retryables available for common use cases:
[!TIP] You are missing a retryable for your use case? Open an issue and let's discuss it!
[!NOTE] Looking for a composable alternative? See Experimental: Composable Conditions for a
condition().action()API that builds on small primitives.
contentFilterTriggered: Content filter was triggered based on the prompt or completion.requestTimeout: Request timeout occurred.requestNotRetryable: Request failed with a non-retryable error.retryAfterDelay: Retry with delay and exponential backoff and respectretry-afterheaders.serviceOverloaded: Response with status code 529 (service overloaded).serviceUnavailable: Response with status code 503 (service unavailable).schemaMismatch: Response JSON doesn't match the expected schema from structured output modes (Output.object(),Output.array(),Output.choice()).noImageGenerated: Image generation failed withNoImageGeneratedError.
Content Filter
Automatically switch to a different model when content filtering blocks your request.
[!NOTE] For streaming requests this retryable can only fire if the content filter trips before any content has been emitted. Once a text chunk flows through, the stream is committed and the fallback is skipped.
import { contentFilterTriggered } from 'ai-retry/retryables';
const retryableModel = createRetryable({
model: azure('gpt-4-mini'),
retries: [
contentFilterTriggered(openai('gpt-4-mini')), // Try OpenAI if Azure filters
],
});Request Timeout
Handle timeouts by switching to potentially faster models.
[!NOTE] You need to use an
abortSignalwith a timeout on your request.
When a request times out, the requestTimeout retryable will automatically create a fresh abort signal for the retry attempt. This prevents the retry from immediately failing due to the already-aborted signal from the original request. If you do not provide a timeout value, a default of 60 seconds is used for the retry attempt.
import { requestTimeout } from 'ai-retry/retryables';
const retryableModel = createRetryable({
model: azure('gpt-4'),
retries: [
// Defaults to 60 seconds timeout for the retry attempt
requestTimeout(azure('gpt-4-mini')),
// Or specify a custom timeout for the retry attempt
requestTimeout(azure('gpt-4-mini'), { timeout: 30_000 }),
],
});
const result = await generateText({
model: retryableModel,
prompt: 'Write a vegetarian lasagna recipe for 4 people.',
abortSignal: AbortSignal.timeout(60_000), // Original request timeout
});Service Overloaded
Handle service overload errors (status code 529) by switching to a provider.
import { serviceOverloaded } from 'ai-retry/retryables';
const retryableModel = createRetryable({
model: anthropic('claude-sonnet-4-0'),
retries: [
// Retry with delay and exponential backoff
serviceOverloaded(anthropic('claude-sonnet-4-0'), {
delay: 5_000,
backoffFactor: 2,
maxAttempts: 5,
}),
// Or switch to a different provider
serviceOverloaded(openai('gpt-4')),
],
});
const result = streamText({
model: retryableModel,
prompt: 'Write a story about a robot...',
});Service Unavailable
Handle service unavailable errors (status code 503) by switching to a different provider.
import { serviceUnavailable } from 'ai-retry/retryables';
const retryableModel = createRetryable({
model: azure('gpt-4'),
retries: [
serviceUnavailable(openai('gpt-4')), // Switch to OpenAI if Azure is unavailable
],
});No Image Generated
Handle image generation failures by switching to a different model.
import { openai } from '@ai-sdk/openai';
import { google } from '@ai-sdk/google';
import { generateImage } from 'ai';
import { createRetryable } from 'ai-retry';
import { noImageGenerated } from 'ai-retry/retryables';
const retryableModel = createRetryable({
model: openai.image('dall-e-3'),
retries: [
noImageGenerated(google.image('gemini-3-pro-image-preview')), // Switch to Gemini if DALL-E fails to generate an image
],
});
const result = await generateImage({
model: retryableModel,
prompt: 'A sunset over mountains',
});Request Not Retryable
Handle cases where the base model fails with a non-retryable error.
[!NOTE] You can check if an error is retryable with the
isRetryableproperty on anAPICallError.
import { requestNotRetryable } from 'ai-retry/retryables';
const retryable = createRetryable({
model: azure('gpt-4-mini'),
retries: [
requestNotRetryable(openai('gpt-4')), // Switch provider if error is not retryable
],
});Retry After Delay
If an error is retryable, such as 429 (Too Many Requests) or 503 (Service Unavailable) errors, it will be retried after a delay.
The delay and exponential backoff can be configured. If the response contains a retry-after header, it will be prioritized over the configured delay.
Note that this retryable does not accept a model parameter, it will always retry the model from the latest failed attempt.
import { retryAfterDelay } from 'ai-retry/retryables';
const retryableModel = createRetryable({
model: openai('gpt-4'), // Base model
retries: [
// Retry base model 3 times with fixed 2s delay
retryAfterDelay({ delay: 2_000, maxAttempts: 3 }),
// Or retry with exponential backoff (2s, 4s, 8s)
retryAfterDelay({ delay: 2_000, backoffFactor: 2, maxAttempts: 3 }),
// Or retry only if the response contains a retry-after header
retryAfterDelay({ maxAttempts: 3 }),
],
});By default, if a retry-after-ms or retry-after header is present in the response, it will be prioritized over the configured delay. The delay from the header will be capped at 60 seconds for safety.
Schema Mismatch
Automatically retry with a different model when the response JSON doesn't match the expected schema.
This is a result-based retryable that validates the model's JSON output against the schema set by structured output modes like Output.object(), Output.array(), and Output.choice().
Normally, schema validation happens outside the model in generateText, so a schema validation error would not be seen by the retryable model. This retryable catches it early and retries with a fallback model.
[!NOTE] This retryable works with
generateTextand any structured output mode that provides a schema:Output.object(),Output.array(), andOutput.choice().
import { openai } from '@ai-sdk/openai';
import { anthropic } from '@ai-sdk/anthropic';
import { generateText, Output } from 'ai';
import { createRetryable } from 'ai-retry';
import { schemaMismatch } from 'ai-retry/retryables';
import { z } from 'zod';
const retryableModel = createRetryable({
model: openai('gpt-4-mini'), // Weak base model
retries: [
// Retry with stronger model on schema mismatch
schemaMismatch(openai('gpt-5')),
],
});
const result = await generateText({
model: retryableModel,
output: Output.object({
schema: z.object({
name: z.string(),
age: z.number(),
}),
}),
prompt: 'Generate a person with name and age.',
});
console.log(result.object); // { name: "Alice", age: 30 }Experimental: Composable Conditions
[!WARNING] This API is experimental and may change. It is not exported from the package root; opt in via one of the per-model deep imports:
import { ... } from 'ai-retry/experimental/language-model'; import { ... } from 'ai-retry/experimental/image-model'; import { ... } from 'ai-retry/experimental/embedding-model';Each entry point also re-exports
createRetryablealready typed for that model family, so you can either import everything from one path:import { createRetryable, error, httpStatus, } from 'ai-retry/experimental/language-model';or pull retryables from the dedicated
/retryablessubpath:import { error, httpStatus, } from 'ai-retry/experimental/language-model/retryables'; // or import * as retryables from 'ai-retry/experimental/language-model/retryables';
A condition().action() API for retryables. Conditions are built from small primitives (error(fn), result(fn)), composed with .and / .or / .not, and turned into a Retryable by one of two terminal actions: .switch({ model }) or .retry({ delay }). The result drops into the same retries: [...] array as the stable helpers, so you can mix the two styles freely.
import { anthropic } from '@ai-sdk/anthropic';
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';
import {
createRetryable,
error,
finishReason,
httpStatus,
} from 'ai-retry/experimental/language-model';
const retryableModel = createRetryable({
model: openai('gpt-4'),
retries: [
// Switch on 529 or any "overloaded" message
httpStatus(529, 'overloaded').switch({
model: anthropic('claude-3-haiku-20240307'),
}),
// Switch when the response was content-filtered
finishReason('content-filter').switch({ model: openai('gpt-4o') }),
// Retry the same model with exponential backoff on retryable errors
error.isRetryable(true).retry({ delay: 1_000, backoffFactor: 2 }),
],
});Picking an entry point
Pick the entry point that matches the model you pass to createRetryable. Each module exposes the helpers that make sense for that model family already typed for it, so you don't need to add type annotations yourself.
Low-level conditions
The primitive builders error(...) and result(...) take a predicate and turn it into a condition; their namespaces bundle the most common field matchers on top.
| Helper | Matches when | Available in |
| --------------------------------- | ------------------------------------------------------------------------------------ | ---------------------- |
| error(predicate) | The current attempt failed and predicate(err, ctx) returns true | all three entry points |
| error.isRetryable(flag) | APICallError.isRetryable === flag (default true) | all three entry points |
| error.statusCode(...patterns) | Numbers match exactly; regex matches the stringified code (e.g. /^5\d\d$/ for 5xx) | all three entry points |
| error.message(...patterns) | Substring (case-insensitive) or regex match against the error message | all three entry points |
| result(predicate) | The current attempt succeeded and predicate(res, ctx) returns true | language-model only |
| result.finishReason(...reasons) | The result's finishReason.unified matches one of the given values | language-model only |
import { APICallError } from 'ai';
import { error } from 'ai-retry/experimental/language-model';
error((e) => APICallError.isInstance(e) && e.statusCode === 418).switch({
model: fallback,
});High-level conditions
Convenience matchers built on top of the low-level ones for the common cases. Each returns a condition that you finalize with .switch(...) or .retry(...).
| Helper | language-model | image-model | embedding-model |
| -------------------------- | :------------: | :---------: | :-------------: |
| httpStatus(...patterns) | ✓ | ✓ | ✓ |
| timeout() | ✓ | ✓ | ✓ |
| aborted() | ✓ | ✓ | ✓ |
| finishReason(...reasons) | ✓ | — | — |
| schemaInvalid() | ✓ | — | — |
| noImage() | — | ✓ | — |
What each one matches:
| Helper | Matches when |
| -------------------------- | ------------------------------------------------------------------------------------------ |
| httpStatus(...patterns) | Numbers match the status code; strings match the message (substring); regex matches either |
| timeout() | Error.name === 'TimeoutError' (AbortSignal.timeout() fired) |
| aborted() | Error.name === 'AbortError' (manual controller.abort()) |
| finishReason(...reasons) | The result's finishReason.unified matches one of the given values |
| schemaInvalid() | The result text fails JSON-schema validation against the call's responseFormat |
| noImage() | The image model threw NoImageGeneratedError |
Each high-level helper is a thin wrapper around the low-level ones. For example, timeout() is roughly:
function timeout() {
return error((err) => err instanceof Error && err.name === 'TimeoutError');
}and finishReason(...) just delegates to result.finishReason(...):
function finishReason(...reasons: Array<string>) {
return result.finishReason(...reasons);
}Actions
Every condition exposes two terminal actions that turn it into a Retryable:
.switch({ model, ...options })falls back to a different model when the condition matches. Optional fields (maxAttempts,delay,backoffFactor,timeout,options) are the same as on a normalRetryobject.maxAttemptsdefaults to1..retry({ delay?, backoffFactor?, maxAttempts?, ... })retries the current model when the condition matches. HonorsRetry-AfterandRetry-After-Msresponse headers when present, capped at 60 seconds.maxAttemptsdefaults to2(one original attempt + one retry); values below2throw, since the retry budget is consumed by the original failure.
Combinators
Compose conditions with .and, .or, .not:
import { error, httpStatus } from 'ai-retry/experimental/language-model';
httpStatus(429).or(error.message('overloaded'));
httpStatus(503).and(error.message('temporary'));
error.isRetryable(true).not();Mapping from Built-in retryables
Each stable retryable has an equivalent in the new shape (imports from ai-retry/experimental/language-model unless noted):
| Built-in | Composable form |
| ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- |
| contentFilterTriggered(m) | error(/* check e.data.error.code === 'content_filter' */).or(finishReason('content-filter')).switch({ model: m }) |
| requestTimeout(m) | timeout().switch({ model: m, timeout: 60_000 }) |
| requestNotRetryable(m) | error.isRetryable(false).switch({ model: m }) |
| schemaMismatch(m) | schemaInvalid().switch({ model: m }) |
| serviceOverloaded(m) | httpStatus(529, 'overloaded').switch({ model: m }) |
| serviceUnavailable(m) | error.statusCode(503).switch({ model: m }) |
| noImageGenerated(m) | noImage().switch({ model: m }) (from image-model) |
| retryAfterDelay({ delay, backoffFactor }) | error.isRetryable(true).retry({ delay, backoffFactor }) |
[!NOTE]
error.isRetryable(true)matches whatever the AI SDK'sAPICallErrormarks retryable. By default that's status codes 408, 409, 429, and any 5xx, plus network errors and provider-specific overrides (e.g. Anthropic flips it onerror.type === 'overloaded_error'). It picks up more cases than a manual status-code list.
Options
Disabling Retries
You can disable retries entirely, which is useful for testing or specific environments. When disabled, the base model will execute directly without any retry logic.
const retryableModel = createRetryable({
model: openai('gpt-4'), // Base model
retries: [
/* ... */
],
disabled: true, // Retries are completely disabled
});
// Or disable based on environment
const retryableModel = createRetryable({
model: openai('gpt-4'), // Base model
retries: [
/* ... */
],
disabled: process.env.NODE_ENV === 'test', // Disable in test environment
});
// Or use a function for dynamic control
const retryableModel = createRetryable({
model: openai('gpt-4'), // Base model
retries: [
/* ... */
],
disabled: () => !featureFlags.isEnabled('ai-retries'), // Check feature flag
});Retry Delays
You can delay retries with an optional exponential backoff. The delay respects abort signals, so requests can still be cancelled during the delay period.
const retryableModel = createRetryable({
model: openai('gpt-4'),
retries: [
// Retry model 3 times with fixed 2s delay
{ model: openai('gpt-4'), delay: 2_000, maxAttempts: 3 },
// Or retry with exponential backoff (2s, 4s, 8s)
{ model: openai('gpt-4'), delay: 2_000, backoffFactor: 2, maxAttempts: 3 },
],
});
const result = await generateText({
model: retryableModel,
prompt: 'Write a vegetarian lasagna recipe for 4 people.',
// Will be respected during delays
abortSignal: AbortSignal.timeout(60_000),
});You can also use delays with built-in retryables:
import { serviceOverloaded } from 'ai-retry/retryables';
const retryableModel = createRetryable({
model: openai('gpt-4'),
retries: [
// Wait 5 seconds before retrying on service overload
serviceOverloaded(openai('gpt-4'), { maxAttempts: 3, delay: 5_000 }),
],
});Timeouts
When a retry specifies a timeout value, a fresh AbortSignal.timeout() is created for that retry attempt. If the original abortSignal is still alive, the fresh deadline is composed with it via AbortSignal.any() so user cancellation still works mid-retry. If the original signal is already aborted (for example it carried a request-level deadline that already fired), it is dropped so the retry runs against the fresh deadline alone.
If the original abortSignal is already aborted at the time of retry and the chosen retry does not supply a timeout, ai-retry rethrows the original error rather than firing a misleading retry against the dead signal. onError still fires for observability, but onRetry is skipped. Setting retry.timeout is the explicit opt-in for retrying past an aborted signal.
const retryableModel = createRetryable({
model: openai('gpt-4'),
retries: [
// Provide a fresh 30 second timeout for the retry
{
model: openai('gpt-3.5-turbo'),
timeout: 30_000,
},
],
});
// Even if the original request times out, the retry gets a fresh signal
const result = await generateText({
model: retryableModel,
prompt: 'Write a story',
// Original request timeout
abortSignal: AbortSignal.timeout(60_000),
});Max Attempts
By default, each retryable will only attempt to retry once per model to avoid infinite loops. You can customize this behavior by returning a maxAttempts value from your retryable function. Note that the initial request with the base model is counted as the first attempt.
const retryableModel = createRetryable({
model: openai('gpt-4'),
retries: [
// Try this once
anthropic('claude-3-haiku-20240307'),
// Try this one more time (initial + 1 retry)
{ model: openai('gpt-4'), maxAttempts: 2 },
// Already tried, won't be retried again
anthropic('claude-3-haiku-20240307'),
],
});The attempts are counted per unique model (provider + modelId). That means if multiple retryables return the same model, it won't be retried again once the maxAttempts is reached.
Provider Options
You can override provider-specific options for each retry attempt. This is useful when you want to use different configurations for fallback models.
const retryableModel = createRetryable({
model: openai('gpt-5'),
retries: [
// Use different provider options for the retry
{
model: openai('gpt-4o-2024-08-06'),
providerOptions: {
openai: {
user: 'fallback-user',
structuredOutputs: false,
},
},
},
],
});
// Original provider options are used for the first attempt
const result = await generateText({
model: retryableModel,
prompt: 'Write a story',
providerOptions: {
openai: {
user: 'primary-user',
},
},
});The retry's providerOptions will completely replace the original ones during retry attempts. This works for all model types (language and embedding) and all operations (generate, stream, embed).
Call Options
You can override various call options when retrying requests. This is useful for adjusting parameters like temperature, max tokens, or even the prompt itself for retry attempts. Call options are specified in the options field of the retry object.
const retryableModel = createRetryable({
model: openai('gpt-4'),
retries: [
{
model: anthropic('claude-3-haiku'),
options: {
// Override generation parameters for more deterministic output
temperature: 0.3,
topP: 0.9,
maxOutputTokens: 500,
// Set a seed for reproducibility
seed: 42,
},
},
],
});The following options can be overridden:
[!NOTE] Override options completely replace the original values (they are not merged). If you don't specify an option, the original value from the request is used.
Language Model Options
| Option | Description |
| -------------------------------------------------------------------------------------------------- | ---------------------------------------------- |
| prompt | Override the entire prompt for the retry |
| temperature | Temperature setting for controlling randomness |
| topP | Nucleus sampling parameter |
| topK | Top-K sampling parameter |
| maxOutputTokens | Maximum number of tokens to generate |
| seed | Random seed for deterministic generation |
| stopSequences | Stop sequences to end generation |
| presencePenalty | Presence penalty for reducing repetition |
| frequencyPenalty | Frequency penalty for reducing repetition |
| headers | Additional HTTP headers |
| providerOptions | Provider-specific options |
Embedding Model Options
| Option | Description |
| ---------------------------------------------------------------------------------------- | ---------------------------- |
| values | Override the values to embed |
| headers | Additional HTTP headers |
| providerOptions | Provider-specific options |
Image Model Options
| Option | Description |
| ------------------------------------------------------------------------------------------------- | -------------------------------- |
| n | Number of images to generate |
| size | Size of generated images |
| aspectRatio | Aspect ratio of generated images |
| seed | Random seed for reproducibility |
| headers | Additional HTTP headers |
| providerOptions | Provider-specific options |
Dynamic Call Options
You can also override call options dynamically from inside the onRetry callback, instead of declaring them statically on the retry object. This is useful when the override depends on something only known at runtime, like the prompt that just failed, the model that's about to be tried next, or the error that triggered the retry. The overrides apply to the upcoming retry attempt only, and can change the same fields as the static options on a retry. The callback may also be async if computing the override needs to do work (e.g. fetching a fresh credential).
A common use case is sanitizing provider-scoped metadata when falling back to a different provider, for example stripping providerOptions.azure.itemId references from the previous prompt before retrying on OpenAI:
import { createRetryable } from 'ai-retry';
import { azure } from '@ai-sdk/azure';
import { openai } from '@ai-sdk/openai';
const retryableModel = createRetryable({
model: azure('gpt-5-chat'),
retries: [openai('gpt-5-chat')],
onRetry: (context) => {
const { current, attempts } = context;
const previous = attempts.at(-1);
if (current.model.provider !== previous.model.provider) {
// Strip provider-scoped metadata from the prompt before retrying on a different provider
return {
options: {
prompt: stripProviderMetadata(current.options.prompt),
},
};
}
},
});Inside the onRetry callback, context.current.model is the model that's about to be tried next, while context.current.options and context.current.error describe the failed attempt that triggered the retry. The previous model is available at context.attempts.at(-1).model.
onRetry may also be async, which is useful if computing the override needs to do work (e.g. fetching a fresh credential):
const retryableModel = createRetryable({
model: openai('gpt-4o-mini'),
retries: [anthropic('claude-sonnet-4-20250514')],
onRetry: async (context) => {
const { current } = context;
const headers = await refreshAuthHeaders(current.model.provider);
return { options: { headers } };
},
});Precedence for the upcoming retry attempt (highest to lowest):
- The value returned from
onRetry - The
optionsreturned from the retryable - The original call options from the request
Logging
You can use the following callbacks to log retry attempts and errors:
onErroris invoked if an error occurs.onRetryis invoked before attempting a retry.onSuccessis invoked after a successful request with the model that handled it.
const retryableModel = createRetryable({
model: openai('gpt-4-mini'),
retries: [
/* your retryables */
],
onError: (context) => {
console.error(
`Attempt ${context.attempts.length} with ${context.current.model.provider}/${context.current.model.modelId} failed:`,
context.current.error,
);
},
onRetry: (context) => {
console.log(
`Retrying attempt ${context.attempts.length + 1} with model ${context.current.model.provider}/${context.current.model.modelId}...`,
);
},
onSuccess: (context) => {
console.log(
`Request handled by ${context.current.model.provider}/${context.current.model.modelId}`,
);
},
});Reset
By default, every new request starts with the base model, even if a previous request was retried with a different model. The reset option changes this behavior by making the last successfully retried model sticky, that means subsequent requests will continue using that model instead of switching back to the base model. The reset value controls how long the retry model stays sticky before resetting back to the base model.
| Value | Description |
| ------------------ | ------------------------------------------------------------ |
| after-request | Reset immediately after the next request (default) |
| after-N-requests | Keep the retry model for the next N requests, then reset |
| after-N-seconds | Keep the retry model for N seconds, then reset |
Reset after each request (default)
const retryableModel = createRetryable({
model: openai('gpt-4o-mini'),
retries: [anthropic('claude-sonnet-4-20250514')],
reset: 'after-request', // default: always start with the base model
});Keep the retry model for N requests
const retryableModel = createRetryable({
model: openai('gpt-4o-mini'),
retries: [anthropic('claude-sonnet-4-20250514')],
reset: 'after-5-requests', // use the retry model for 5 more requests before resetting
});Keep the retry model for N seconds
const retryableModel = createRetryable({
model: openai('gpt-4o-mini'),
retries: [anthropic('claude-sonnet-4-20250514')],
reset: 'after-30-seconds', // use the retry model for 30 seconds before resetting
});Streaming
Errors during streaming requests can occur in two ways:
- When the stream is initially created (e.g. network error, API error, etc.) by calling
streamText. - While the stream is being processed (e.g. timeout, API error, etc.) by reading from the returned
result.textStreamasync iterable.
In the second case, errors during stream processing will not always be retried, because the stream might have already emitted some actual content and the consumer might have processed it. Retrying will be stopped as soon as the first content chunk (e.g. types of text-delta, tool-call, etc.) is emitted. The type of chunks considered as content are the same as the ones that are passed to onChunk().
[!IMPORTANT] Streaming limitation: Retries and fallbacks only apply before the first content chunk is emitted. Once streaming begins delivering content, the response is committed to the current model. Mid-stream errors will propagate to the caller rather than triggering a fallback. If reliable retries are critical for your use case, consider using
generateTextinstead ofstreamText.
API Reference
createRetryable(options: RetryableModelOptions): LanguageModelV3 | EmbeddingModelV3 | ImageModelV3
Creates a retryable model that works with language models, embedding models, and image models.
interface RetryableModelOptions<
MODEL extends LanguageModelV3 | EmbeddingModelV3 | ImageModelV3,
> {
model: MODEL;
retries: Array<Retryable<MODEL> | MODEL>;
disabled?: boolean | (() => boolean);
reset?: Reset;
onError?: (context: RetryContext<MODEL>) => void;
onRetry?: (
context: RetryContext<MODEL>,
) => void | OnRetryOverrides<MODEL> | Promise<void | OnRetryOverrides<MODEL>>;
onSuccess?: (context: SuccessContext<MODEL>) => void;
}Options:
model: The base model to use for the initial request.retries: Array of retryables (functions, models, or retry objects) to attempt on failure.disabled: Disable all retry logic. Can be a boolean or function returning boolean. Default:false(retries enabled).reset: Controls when to reset back to the base model after a successful retry. Default:after-request.onError: Callback invoked when an error occurs.onRetry: Callback invoked before attempting a retry. May optionally return anOnRetryOverridesobject (or aPromiseof one) to overrideoptions.*for the upcoming attempt only. See Dynamic Call Options viaonRetry.onSuccess: Callback invoked after a successful request. Receives the model that handled the request and all previous attempts.
Reset
Controls when the sticky model resets back to the base model after a successful retry.
type Reset =
| 'after-request'
| `after-${number}-requests`
| `after-${number}-seconds`;after-request— reset immediately after the next request (default).after-N-requests— keep the retry model for the next N requests, then reset.after-N-seconds— keep the retry model for N seconds, then reset.
Retryable
A Retryable is a function that receives a RetryContext with the current error or result and model and all previous attempts.
It should evaluate the error/result and decide whether to retry by returning a Retry or to skip by returning undefined.
type Retryable = (context: RetryContext) => Retry | Promise<Retry> | undefined;Retry
A Retry specifies the model to retry and optional settings. The available options depend on the model type (language model, embedding model, or image model).
interface Retry {
model: LanguageModelV3 | EmbeddingModelV3 | ImageModelV3;
maxAttempts?: number; // Maximum retry attempts per model (default: 1)
delay?: number; // Delay in milliseconds before retrying
backoffFactor?: number; // Multiplier for exponential backoff
timeout?: number; // Timeout in milliseconds for the retry attempt
providerOptions?: ProviderOptions; // @deprecated - use options.providerOptions instead
options?:
| LanguageModelV3CallOptions
| EmbeddingModelV3CallOptions
| ImageModelV3CallOptions; // Call options to override for this retry
}RetryContext
The RetryContext object contains information about the current attempt and all previous attempts.
interface RetryContext {
current: RetryAttempt;
attempts: Array<RetryAttempt>;
}SuccessContext
The SuccessContext object is passed to the onSuccess callback after a successful request.
interface SuccessContext {
current: SuccessAttempt;
attempts: Array<RetryAttempt>;
}SuccessAttempt
A SuccessAttempt represents the successful attempt with the model, result, and call options used. The result type depends on the model type.
interface SuccessAttempt {
type: 'success';
model: LanguageModelV3 | EmbeddingModelV3 | ImageModelV3;
result:
| LanguageModelResult
| LanguageModelStream
| EmbeddingModelEmbed
| ImageModelGenerate;
options:
| LanguageModelV3CallOptions
| EmbeddingModelV3CallOptions
| ImageModelV3CallOptions;
}RetryAttempt
A RetryAttempt represents a single attempt with a specific model, which can be either an error or a successful result that triggered a retry. Each attempt includes the call options that were used for that specific attempt. For retry attempts, this will reflect any overridden options from the retry configuration.
// For language, embedding, and image models
type RetryAttempt =
| {
type: 'error';
error: unknown;
model: LanguageModelV3 | EmbeddingModelV3 | ImageModelV3;
options:
| LanguageModelV3CallOptions
| EmbeddingModelV3CallOptions
| ImageModelV3CallOptions;
}
| {
type: 'result';
result: LanguageModelResult;
model: LanguageModelV3;
options: LanguageModelV3CallOptions;
};
// Note: Result-based retries only apply to language models (both generate and stream paths). They do not apply to embedding or image models. For streaming, retries are only possible before any content has been emitted; once a text-delta flows through, the stream is committed.
// Type guards for discriminating attempts
function isErrorAttempt(attempt: RetryAttempt): attempt is RetryErrorAttempt;
function isResultAttempt(attempt: RetryAttempt): attempt is RetryResultAttempt;License
MIT
