@p-vbordei/rate-limit-guard

v1.0.0

Published

a month ago

Cross-session API rate limit guard for LLM providers

0High
0Medium
0Low

vlad1987654123

llm rate-limit circuit-breaker api-retry 429 ai-agents

rate-limit-guard

A cross-session circuit breaker and rate limit guard for LLM providers (e.g. OpenAI, OpenRouter). It coordinates rate limit states across multiple processes and instances (CLIs, workers, cron jobs, etc.) to prevent API retry amplification.

License

Apache License 2.0 (100% independent and open-source).

Features

Cross-Process Synchronization: Uses an atomic-renamed JSON file to share rate limit states across multiple Node.js processes, workers, CLIs, or subprocesses.
Retry Amplification Prevention: Prevents concurrent tasks from slamming rate-limited endpoints, avoiding quota multiplier exhaustion.
Genuine vs. Transient Filter: Evaluates HTTP rate limit headers (such as x-ratelimit-* and retry-after) and last-known bucket capacities to differentiate between genuine quota exhaustions (e.g., hourly RPH limits) and transient upstream provider hiccups (e.g., 5-second provider capacity issues).
Human-Readable Durations: Built-in helper to format remaining cooldown durations into compact, readable strings (e.g., 2m 30s or 1h 15m).

Installation

npm install rate-limit-guard

Usage

1. Basic Cooldown Check

Use the RateLimitGuard class statefully to check and record rate limit statuses:

import { RateLimitGuard, formatRemaining } from 'rate-limit-guard';

// Initialize with a shared state file path (defaults to ~/.rate_limits/default.json)
const guard = new RateLimitGuard();

// Check if provider is currently blocked:
const remaining = guard.getRemainingCooldown();
if (remaining !== null) {
  console.log(`Rate limit is active. Please wait ${formatRemaining(remaining)}.`);
  // Wait, failover, or exit early...
  process.exit(1);
}

// Perform LLM API request...
try {
  const res = await callLlmApi();
  // Clear any existing limit state on success
  guard.clear();
} catch (error: any) {
  if (error.status === 429) {
    // Record rate limit state from headers
    guard.recordLimit({
      headers: error.headers,
      defaultCooldownSeconds: 300 // fallback if no reset headers present
    });
  }
}

2. Differentiating Genuine Quota Limits from Transient Hiccups

Large LLM routers multiplex multiple backends. When deep upstream providers hit capacity, they might return a transient 429 that goes away in seconds, while actual account limits can last hours.

Use isGenuineLimit to avoid blocking your agent for hours on a transient 5-second model jitter:

if (error.status === 429) {
  const headers = error.headers;
  
  // Checks headers and historical states to see if it's a real account limit (e.g. hourly limit reset >= 60s)
  const isGenuine = guard.isGenuineLimit({ headers });
  
  if (isGenuine) {
    console.log("Genuine account quota limit reached. Tripping cross-session breaker.");
    guard.recordLimit({ headers });
  } else {
    console.log("Transient upstream provider error. Retrying with a model failover immediately.");
  }
}

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme