@p-vbordei/rate-limit-guard
v1.0.0
Published
Cross-session API rate limit guard for LLM providers
Maintainers
Readme
rate-limit-guard
A cross-session circuit breaker and rate limit guard for LLM providers (e.g. OpenAI, OpenRouter). It coordinates rate limit states across multiple processes and instances (CLIs, workers, cron jobs, etc.) to prevent API retry amplification.
License
Apache License 2.0 (100% independent and open-source).
Features
- Cross-Process Synchronization: Uses an atomic-renamed JSON file to share rate limit states across multiple Node.js processes, workers, CLIs, or subprocesses.
- Retry Amplification Prevention: Prevents concurrent tasks from slamming rate-limited endpoints, avoiding quota multiplier exhaustion.
- Genuine vs. Transient Filter: Evaluates HTTP rate limit headers (such as
x-ratelimit-*andretry-after) and last-known bucket capacities to differentiate between genuine quota exhaustions (e.g., hourly RPH limits) and transient upstream provider hiccups (e.g., 5-second provider capacity issues). - Human-Readable Durations: Built-in helper to format remaining cooldown durations into compact, readable strings (e.g.,
2m 30sor1h 15m).
Installation
npm install rate-limit-guardUsage
1. Basic Cooldown Check
Use the RateLimitGuard class statefully to check and record rate limit statuses:
import { RateLimitGuard, formatRemaining } from 'rate-limit-guard';
// Initialize with a shared state file path (defaults to ~/.rate_limits/default.json)
const guard = new RateLimitGuard();
// Check if provider is currently blocked:
const remaining = guard.getRemainingCooldown();
if (remaining !== null) {
console.log(`Rate limit is active. Please wait ${formatRemaining(remaining)}.`);
// Wait, failover, or exit early...
process.exit(1);
}
// Perform LLM API request...
try {
const res = await callLlmApi();
// Clear any existing limit state on success
guard.clear();
} catch (error: any) {
if (error.status === 429) {
// Record rate limit state from headers
guard.recordLimit({
headers: error.headers,
defaultCooldownSeconds: 300 // fallback if no reset headers present
});
}
}2. Differentiating Genuine Quota Limits from Transient Hiccups
Large LLM routers multiplex multiple backends. When deep upstream providers hit capacity, they might return a transient 429 that goes away in seconds, while actual account limits can last hours.
Use isGenuineLimit to avoid blocking your agent for hours on a transient 5-second model jitter:
if (error.status === 429) {
const headers = error.headers;
// Checks headers and historical states to see if it's a real account limit (e.g. hourly limit reset >= 60s)
const isGenuine = guard.isGenuineLimit({ headers });
if (isGenuine) {
console.log("Genuine account quota limit reached. Tripping cross-session breaker.");
guard.recordLimit({ headers });
} else {
console.log("Transient upstream provider error. Retrying with a model failover immediately.");
}
}