best-ai-retry
v1.0.1
Published
Zero-dependency AI API resilience: retry, circuit breaker, fallback, rate limits, token budgets, cost tracking.
Maintainers
Readme
best-ai-retry
Zero-dependency resilience library for AI API calls. Retry with smart backoff, circuit breaking, model/provider fallback, rate limit awareness, request queuing, token budgets, cost tracking, and hedged requests.
Works with any AI provider — OpenAI, Anthropic, Google, Mistral, or your own API.
Install
npm install best-ai-retryQuick Start
import { withRetry, createCircuitBreaker, withFallback } from 'best-ai-retry';
// Simple retry with exponential backoff
const result = await withRetry(
() => openai.chat.completions.create({ model: 'gpt-4', messages }),
{ maxRetries: 3 }
);
// Full resilience pipeline
const breaker = createCircuitBreaker({ failureThreshold: 5 });
const result = await withFallback([
{
name: 'gpt-4',
fn: () => breaker.execute(() =>
withRetry(() => openai.chat.completions.create({ model: 'gpt-4', messages }), { maxRetries: 3 })
),
},
{
name: 'claude',
fn: () => anthropic.messages.create({ model: 'claude-sonnet-4-20250514', messages }),
},
], {
shouldFallback: (err) => err.status === 429 || err.status >= 500,
});
console.log(`Answered by: ${result.usedName}`);API
Retry — withRetry(fn, options?)
Retries a function with configurable backoff, jitter, and Retry-After header support.
import { withRetry } from 'best-ai-retry';
const result = await withRetry(
() => openai.chat.completions.create({ model: 'gpt-4', messages }),
{
maxRetries: 5, // default: 3
baseDelay: 1000, // default: 1000ms
maxDelay: 60000, // default: 60000ms
backoff: 'exponential', // 'exponential' | 'linear' | 'fixed'
jitter: 'full', // 'full' | 'decorrelated' | 'none'
useRetryAfter: true, // honor Retry-After header (default: true)
timeout: 30000, // per-attempt timeout in ms
retryOn: (error) => error.status === 429 || error.status >= 500,
onRetry: (error, attempt) => console.log(`Retry #${attempt}...`),
onSuccess: (result, attempt) => { /* ... */ },
onFailure: (error) => { /* ... */ },
}
);Defaults retry on: 429, 500, 502, 503, 529, and network errors (ECONNRESET, ETIMEDOUT, ECONNREFUSED, EPIPE, ENOTFOUND, ENETUNREACH, EAI_AGAIN).
Backoff strategies:
| Strategy | Formula |
|---|---|
| exponential | baseDelay * 2^attempt |
| linear | baseDelay * (attempt + 1) |
| fixed | baseDelay |
Jitter strategies:
| Strategy | Formula |
|---|---|
| full | random() * delay |
| decorrelated | min(maxDelay, baseDelay + random() * (prevDelay * 3 - baseDelay)) |
| none | No jitter |
Circuit Breaker — createCircuitBreaker(options?)
Prevents cascading failures by fast-failing when a service is down.
import { createCircuitBreaker, CircuitOpenError } from 'best-ai-retry';
const breaker = createCircuitBreaker({
failureThreshold: 5, // consecutive failures to open (default: 5)
resetTimeout: 30000, // ms before trying a probe request (default: 30000)
halfOpenMax: 1, // probe requests in half-open state (default: 1)
onStateChange: (from, to) => console.log(`Circuit: ${from} -> ${to}`),
});
try {
const result = await breaker.execute(
() => anthropic.messages.create({ model: 'claude-sonnet-4-20250514', messages })
);
} catch (err) {
if (err instanceof CircuitOpenError) {
console.log('Service is down, try again later');
}
}
breaker.state; // 'closed' | 'open' | 'half-open'
breaker.stats; // { failures, successes, total, failureRate }
breaker.reset();Sliding window mode — open based on failure rate instead of consecutive failures:
const breaker = createCircuitBreaker({
samplingWindow: 60000, // track over 60s
failureRate: 0.5, // open at 50% failure rate
resetTimeout: 10000,
});States: closed (normal) -> open (fast-fail) -> half-open (probe) -> closed or open.
Fallback — withFallback(strategies, options?)
Try multiple providers/models in order until one succeeds.
import { withFallback } from 'best-ai-retry';
const result = await withFallback([
{ name: 'gpt-4', fn: () => openai.chat.completions.create({ model: 'gpt-4', messages }) },
{ name: 'claude', fn: () => anthropic.messages.create({ model: 'claude-sonnet-4-20250514', messages }) },
{ name: 'gpt-3.5', fn: () => openai.chat.completions.create({ model: 'gpt-3.5-turbo', messages }) },
], {
shouldFallback: (error) => error.status === 429 || error.status >= 500,
onFallback: (from, to, error) => console.log(`${from} failed, trying ${to}`),
});
result.value; // the response
result.usedIndex; // 0, 1, or 2
result.usedName; // 'gpt-4', 'claude', or 'gpt-3.5'
result.errors; // errors from failed strategiesRate Limit Parsing — parseRateLimits(headers)
Parse rate limit headers from OpenAI, Anthropic, or any provider using standard headers.
import { parseRateLimits, shouldWait } from 'best-ai-retry';
const response = await fetch('https://api.openai.com/v1/chat/completions', { ... });
const limits = parseRateLimits(response.headers);
// {
// requestsLimit: 60,
// requestsRemaining: 45,
// requestsReset: 12000, // ms until reset
// tokensLimit: 90000,
// tokensRemaining: 72000,
// tokensReset: 8000,
// }
const { wait, delay } = shouldWait(limits);
if (wait) {
await new Promise(r => setTimeout(r, delay));
}Supported headers:
| Provider | Headers |
|---|---|
| OpenAI | x-ratelimit-limit-requests, x-ratelimit-remaining-requests, x-ratelimit-reset-requests, x-ratelimit-limit-tokens, x-ratelimit-remaining-tokens, x-ratelimit-reset-tokens |
| Anthropic | anthropic-ratelimit-requests-limit, anthropic-ratelimit-requests-remaining, anthropic-ratelimit-requests-reset, anthropic-ratelimit-tokens-limit, anthropic-ratelimit-tokens-remaining, anthropic-ratelimit-tokens-reset |
| Generic | ratelimit-limit, ratelimit-remaining, ratelimit-reset, retry-after |
Parses reset values in multiple formats: seconds (30), duration strings (6m30s, 200ms), and ISO timestamps.
Request Queue — createQueue(options?)
Queue requests with concurrency, RPM, and TPM limits.
import { createQueue } from 'best-ai-retry';
const queue = createQueue({
concurrency: 5, // max parallel requests (default: 5)
rpm: 60, // max requests per minute
tpm: 90000, // max tokens per minute
});
// Requests are queued and executed within limits
const result = await queue.add(
() => openai.chat.completions.create({ model: 'gpt-4', messages }),
{ tokens: 500, priority: 10 }
);
queue.size; // pending items
queue.pending; // currently executing
queue.clear(); // cancel all pendingHigher priority values run first. Default priority is 0.
Token Budget — createTokenBudget(options)
Enforce total and per-request token limits.
import { createTokenBudget, BudgetExceededError } from 'best-ai-retry';
const budget = createTokenBudget({
maxTokens: 1_000_000,
maxTokensPerRequest: 4096,
warnAt: 0.8,
onWarn: (used, total) => console.log(`Warning: ${used}/${total} tokens used`),
});
budget.check(2000); // throws BudgetExceededError if over budget
budget.consume(2000); // deduct tokens
budget.remaining; // 998000
budget.used; // 2000
budget.reset();Cost Tracker — createCostTracker(pricing?)
Track API costs with built-in pricing for popular models.
import { createCostTracker } from 'best-ai-retry';
const tracker = createCostTracker();
// Record usage after each API call
tracker.record({
model: 'gpt-4',
inputTokens: 1500,
outputTokens: 500,
});
tracker.record({
model: 'claude-sonnet-4-20250514',
inputTokens: 2000,
outputTokens: 800,
});
tracker.totalCost; // cumulative $ cost
tracker.costByModel; // { 'gpt-4': 0.06, 'claude-sonnet-4-20250514': 0.018 }
tracker.requestCount; // 2
tracker.summary(); // formatted summary string
tracker.reset();Custom pricing (per 1M tokens):
const tracker = createCostTracker({
'my-custom-model': { input: 5, output: 15 },
});Built-in pricing: gpt-4, gpt-4-turbo, gpt-4o, gpt-4o-mini, gpt-3.5-turbo, o1, o1-mini, claude-opus-4-20250514, claude-sonnet-4-20250514, claude-haiku-4-5-20251001, claude-3-opus-20240229, claude-3-sonnet-20240229, claude-3-haiku-20240307, claude-3-5-sonnet-20241022.
Hedged Requests — withHedge(fns, options?)
Race multiple providers — use the fastest response.
import { withHedge } from 'best-ai-retry';
const result = await withHedge([
() => openai.chat.completions.create({ model: 'gpt-4', messages }),
() => anthropic.messages.create({ model: 'claude-sonnet-4-20250514', messages }),
], {
delay: 2000, // start 2nd request 2s after 1st (default: 2000)
});
result.value; // fastest response
result.winnerIndex; // 0 or 1Starts the first function immediately. If it hasn't resolved after delay ms, starts the next. Returns whichever resolves first.
Pipeline — createPipeline(config)
Compose multiple strategies into a single resilient call.
import { createPipeline, createCircuitBreaker } from 'best-ai-retry';
const breaker = createCircuitBreaker({ failureThreshold: 3, resetTimeout: 30000 });
const pipeline = createPipeline({
timeout: 30000,
circuitBreaker: breaker,
retry: { maxRetries: 3, baseDelay: 1000 },
fallback: [
{ name: 'primary', fn: () => callOpenAI() },
{ name: 'backup', fn: () => callAnthropic() },
],
});
const result = await pipeline.execute(() => callOpenAI());Execution order: timeout -> circuit breaker -> retry -> fallback. Each layer is optional.
Error Types
All custom errors extend Error and include relevant metadata:
import {
RetryExhaustedError, // .attempts, .lastError
CircuitOpenError, // .state, .resetTimeout
TimeoutError, // .timeout
BudgetExceededError, // .requested, .remaining
AllFailedError, // .errors[]
} from 'best-ai-retry';Real-World Example
A production-ready setup combining retry, circuit breaking, fallback, cost tracking, and token budgets:
import {
withRetry,
createCircuitBreaker,
withFallback,
createCostTracker,
createTokenBudget,
BudgetExceededError,
} from 'best-ai-retry';
const budget = createTokenBudget({
maxTokens: 500_000,
warnAt: 0.8,
onWarn: (used, total) => console.warn(`Token budget at ${Math.round(used/total*100)}%`),
});
const cost = createCostTracker();
const openaiBreaker = createCircuitBreaker({
failureThreshold: 5,
resetTimeout: 30000,
onStateChange: (from, to) => console.log(`OpenAI circuit: ${from} -> ${to}`),
});
async function chat(messages: any[]) {
// Check budget before calling
budget.check(4096);
const result = await withFallback([
{
name: 'gpt-4',
fn: () => openaiBreaker.execute(() =>
withRetry(
() => openai.chat.completions.create({ model: 'gpt-4', messages, max_tokens: 4096 }),
{ maxRetries: 3, baseDelay: 1000 }
)
),
},
{
name: 'claude',
fn: () => withRetry(
() => anthropic.messages.create({ model: 'claude-sonnet-4-20250514', messages, max_tokens: 4096 }),
{ maxRetries: 2, baseDelay: 500 }
),
},
], {
shouldFallback: (err) => err.status === 429 || err.status >= 500,
onFallback: (from, to) => console.log(`Falling back: ${from} -> ${to}`),
});
// Track cost and tokens
const usage = result.value.usage;
budget.consume(usage.total_tokens);
cost.record({
model: result.usedName === 'gpt-4' ? 'gpt-4' : 'claude-sonnet-4-20250514',
inputTokens: usage.prompt_tokens,
outputTokens: usage.completion_tokens,
});
return result.value;
}Requirements
- Node.js >= 18
- Zero runtime dependencies
License
MIT
