ai-resilience

v0.3.1

Published

20 days ago

Axios retry++ primitives for resilient AI and backend systems.

0High
0Medium
0Low

ankitchandore

axios retry axios-retry resilience ai

ai-resilience

Reliable Axios retries, AI response recovery, and provider fallback for TypeScript apps.

ai-resilience helps AI and backend systems survive flaky APIs, malformed model output, rate limits, provider outages, and distributed retry problems.

Highlights

| Capability | What it helps with | | -------------------------- | ---------------------------------------------------------------------------- | | Smart Axios retries | Retry temporary network errors, rate limits, and retryable HTTP status codes | | Semantic AI retry | Retry successful 200 OK responses when the AI output is invalid | | JSON repair and validation | Repair malformed JSON and validate responses with Zod | | Provider fallback | Move from a failing provider to a healthy backup provider | | Circuit breakers | Stop sending traffic to unhealthy providers temporarily | | Distributed coordination | Coordinate retries, locks, and rate limits across processes | | Telemetry hooks | Add logs, metrics, lifecycle hooks, and OpenTelemetry-style spans | | TypeScript-first API | Strong types for retry config, policies, hooks, providers, and metrics |

Installation

npm install ai-resilience axios axios-retry

Quick Start

Add retry behavior to any Axios instance.

import axios from "axios";
import { applyAiResilience, ConsoleRetryLogger } from "ai-resilience";

const client = axios.create({
  baseURL: "https://api.example.com",
});

applyAiResilience(client, {
  retries: 3,
  strategy: "exponential",
  baseDelayMs: 150,
  maxDelayMs: 10_000,
  jitter: "full",
  logger: new ConsoleRetryLogger("info"),
  hooks: {
    onRetry: ({ attempt, nextDelayMs }) => {
      console.log(`Retry ${attempt} in ${nextDelayMs}ms`);
    },
  },
});

const response = await client.get("/health");
console.log(response.data);

Why This Exists

Traditional HTTP retries only help when the request fails.

AI systems have a different problem: the request can succeed, but the content can still be unusable. A model might return malformed JSON, an empty answer, a refusal, a truncated response, or a payload that does not match your schema.

ai-resilience handles both layers:

| Layer | Example problem | Tool | | ------------------ | ------------------------------------------- | -------------------------- | | HTTP failure | 429, 503, timeout, connection reset | applyAiResilience | | AI content failure | Invalid JSON, schema mismatch, empty answer | requestWithSemanticRetry | | Provider failure | OpenAI is down or too slow | createProviderFallback | | Platform failure | Multiple workers retrying the same job | RedisRetryCoordinator |

Semantic Retry for AI Responses

Use requestWithSemanticRetry when the response must be valid, structured, and schema-safe.

import axios from "axios";
import { requestWithSemanticRetry } from "ai-resilience";
import { z } from "zod";

const client = axios.create({
  baseURL: "https://api.example.com",
});

const result = await requestWithSemanticRetry(client, {
  request: {
    method: "post",
    url: "/generate",
    data: {
      prompt: "Return a JSON object with title and tags.",
    },
  },
  schema: z.object({
    title: z.string(),
    tags: z.array(z.string()),
  }),
  repairJson: true,
  requireJson: true,
  hooks: {
    onSemanticRetry: ({ issue, attempt }) => {
      console.log(`Semantic retry ${attempt}: ${issue.kind}`);
    },
  },
});

console.log(result.data);

Semantic retry can detect:

Invalid JSON
Schema mismatches
Empty responses
Refusals
Truncated answers

Provider Fallback

Use provider fallback when one AI provider should automatically hand off to another.

import { createProviderFallback } from "ai-resilience";

const fallback = createProviderFallback(
  [
    {
      id: "openai-primary",
      type: "openai",
      priority: 1,
      request: (input) => openaiClient.responses.create(input),
    },
    {
      id: "anthropic-backup",
      type: "anthropic",
      priority: 2,
      request: (input) => anthropicClient.messages.create(input),
    },
  ],
  {
    strategy: "priority",
    circuitBreaker: {
      failureThreshold: 3,
      cooldownMs: 30_000,
    },
    hooks: {
      onProviderFallback: ({ fromProviderId, toProviderId }) => {
        console.log(`Fallback: ${fromProviderId} -> ${toProviderId}`);
      },
    },
  },
);

const response = await fallback.request({
  prompt: "Summarize this text.",
});

const metrics = fallback.snapshot();

Distributed Tools

Retry Coordination

RedisRetryCoordinator coordinates retries and locks across multiple processes.

import { RedisRetryCoordinator } from "ai-resilience";

const coordinator = new RedisRetryCoordinator({
  namespace: "my-api",
  redis,
});

await coordinator.incrementRetry("tenant-a:request-123");

const locked = await coordinator.acquireLock("provider-routing");
console.log(locked);

Rate Limiting

DistributedRateLimiter provides Redis-compatible fixed-window rate limiting.

import { DistributedRateLimiter } from "ai-resilience";

const limiter = new DistributedRateLimiter(redis);

const result = await limiter.consume({
  key: "tenant-a:openai",
  limit: 100,
  windowSeconds: 60,
});

console.log(result.allowed);

Adaptive Routing

AdaptiveRouter ranks providers by latency, failure rate, and cost.

import { AdaptiveRouter } from "ai-resilience";

const router = new AdaptiveRouter(providers, {
  latencyWeight: 1,
  failureWeight: 2,
  costWeight: 0.5,
});

const provider = router.select(metricsByProvider);

Streaming Recovery

recoverStream collects streaming chunks and calls recovery hooks when chunk gaps are longer than your configured threshold.

import { recoverStream } from "ai-resilience";

const chunks = await recoverStream(stream, {
  maxChunkGapMs: 5_000,
  onRecover: (chunks) => {
    console.log(`Stream recovery started after ${chunks.length} chunks`);
  },
});

Telemetry

withTelemetry wraps an async operation with an OpenTelemetry-style tracer.

The package does not force @opentelemetry/api as a runtime dependency. Pass any tracer object that supports startSpan.

import { withTelemetry } from "ai-resilience";

const result = await withTelemetry(
  tracer,
  "ai.request",
  () => client.post("/chat/completions", payload),
  {
    provider: "openai",
  },
);

Feature Overview

| Area | Features | | ------------------- | ----------------------------------------------------------------------------------------- | | Retry engine | Fixed, linear, and exponential retry strategies | | Jitter | none, full, equal, and decorrelated jitter | | Retry rules | Method controls, status-code controls, and async custom retry conditions | | Hooks | Retry, success, give-up, semantic retry, and provider fallback hooks | | Logging | Structured logger interface and console logger | | AI validation | JSON parsing, JSON repair, Zod schema validation, and AI failure detection | | Providers | Priority routing, round-robin routing, least-latency routing, and health-weighted routing | | Protection | Circuit breakers, provider health tracking, and adaptive fallback delay | | Distributed systems | Redis-compatible retry coordination and fixed-window rate limiting | | Observability | Metrics snapshots, lifecycle hooks, and OpenTelemetry-style spans |

API Reference

| API | Purpose | | ---------------------------------------------------- | ------------------------------------------------------------------ | | applyAiResilience(instance, config) | Installs retry behavior on an existing Axios instance | | createAiResilienceClient(config) | Creates a new Axios instance with retry behavior already applied | | requestWithSemanticRetry(instance, config) | Runs an Axios request and retries invalid AI responses | | semanticRetry(operation, policy) | Wraps any async operation with semantic validation and retry | | parseJsonResponse(data, options) | Parses, repairs, and validates JSON responses | | applySemanticRecovery(instance, policy) | Installs a response interceptor for semantic validation and repair | | createProviderFallback(providers, options) | Creates a provider fallback engine | | RedisRetryCoordinator | Coordinates retries and locks across processes | | DistributedRateLimiter | Provides Redis-compatible fixed-window rate limiting | | AdaptiveRouter | Selects providers using latency, failure, and cost signals | | recoverStream(stream, options) | Collects stream chunks and triggers stream recovery hooks | | withTelemetry(tracer, name, operation, attributes) | Wraps an async operation in a telemetry span |

Retry Configuration

type AiResilienceRetryConfig = {
  retries?: number;
  strategy?: "fixed" | "linear" | "exponential";
  baseDelayMs?: number;
  maxDelayMs?: number;
  jitter?: "none" | "full" | "equal" | "decorrelated";
  retryStatusCodes?: number[];
  retryMethods?: string[];
  retryCondition?: (error) => boolean | Promise<boolean>;
  hooks?: RetryHooks;
  logger?: RetryLogger;
  metadata?: Record<string, unknown>;
};

Semantic Retry Policy

type AiRetryPolicy = {
  maxSemanticRetries?: number;
  retryOnFailureKinds?: Array<
    | "invalid_json"
    | "schema_mismatch"
    | "empty_response"
    | "refusal"
    | "truncated"
  >;
  repairJson?: boolean;
  requireJson?: boolean;
  detectRefusals?: boolean;
  detectTruncation?: boolean;
  schema?: z.ZodType;
  validate?: (
    data,
    response,
  ) => SemanticValidationResult | Promise<SemanticValidationResult>;
  hooks?: SemanticRetryHooks;
  logger?: RetryLogger;
  metadata?: Record<string, unknown>;
};

Scripts

npm run build
npm test
npm run lint
npm run format

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ai-resilience

Reliable Axios retries, AI response recovery, and provider fallback for TypeScript apps.

Highlights

Installation

Quick Start

Why This Exists

Semantic Retry for AI Responses

Provider Fallback

Distributed Tools

Retry Coordination

Rate Limiting

Adaptive Routing

Streaming Recovery

Telemetry

Feature Overview

API Reference

Retry Configuration

Semantic Retry Policy

Scripts

License