ratelimit-flex

v4.3.0

Published

23 days ago

TypeScript rate limiting for Node.js: Express, Fastify, NestJS, Hono. Redis, PostgreSQL, MongoDB, DynamoDB or in-memory stores. Sliding window, token bucket, fixed window. Prometheus & OpenTelemetry.

ratelimit-flex

Introduction

ratelimit-flex is a TypeScript-first Node.js rate limiting library for HTTP APIs: Express middleware, Fastify plugin, NestJS integration, and Hono middleware. Use Redis, PostgreSQL, MongoDB, or DynamoDB (or in-memory / cluster stores) for distributed rate limiting with sliding window, token bucket, and fixed window algorithms — plus optional Prometheus and OpenTelemetry metrics.

Features

Three algorithms: Sliding window, Token bucket, Fixed window — implemented across MemoryStore, RedisStore (Lua), PgStore, MongoStore (exact for all strategies), and DynamoStore (exact fixed window & token bucket; sliding window uses a weighted approximation on DynamoDB — see docs/stores/dynamo.md)
Frameworks: Express and Fastify (separate entry for Fastify to keep bundles lean); NestJS (ratelimit-flex/nestjs) and Hono (ratelimit-flex/hono)
Stores: MemoryStore, RedisStore, ClusterStore, PgStore, MongoStore, DynamoStore
Request queuing: Queue over-limit requests instead of rejecting them immediately (expressQueuedRateLimiter, fastifyQueuedRateLimiter, createRateLimiterQueue)
TypeScript-first: strict types, discriminated options where it matters
Redis resilience: insurance limiter fallback, circuit breaker, counter sync on recovery; or fail-open / fail-closed when Redis is unavailable without insurance
In-memory block shielding: InMemoryShield / inMemoryBlock — cache blocked keys in process memory so hot keys stop hitting Redis under attack
Metrics & observability (Express & Fastify): aggregated snapshots, Prometheus, OpenTelemetry — metrics: true ([full docs][doc-metrics])
Weighted requests: incrementCost (or store.increment(..., { cost })) so expensive endpoints consume more quota than cheap ones
Presets: singleInstancePreset, multiInstancePreset, redisWithShieldPreset, hybridWindowsPreset, resilientRedisPreset, clusterPreset, queuedClusterPreset, apiGatewayPreset, authEndpointPreset, publicApiPreset, postgresPreset, mongoPreset, dynamoPreset
Limiter composition: compose.all(), compose.overflow(), compose.firstAvailable(), compose.race(), compose.windows(), compose.withBurst(), nested ComposedStore
Programmatic key management: KeyManager for blocks, penalties, rewards, events, audit log, and optional admin HTTP API
Security: key cardinality, Redis namespaces, Lua usage, and locking down admin routes

Installation

npm install ratelimit-flex

yarn add ratelimit-flex

pnpm add ratelimit-flex

Peer dependencies (install only what you use):

| Package | When you need it | |---------|------------------| | express (+ @types/express for TS) | Express middleware | | fastify, fastify-plugin | Fastify plugin (ratelimit-flex/fastify) | | @nestjs/common, @nestjs/core (+ optional @nestjs/graphql for GraphQL context) | NestJS module (ratelimit-flex/nestjs) | | hono | Hono middleware (ratelimit-flex/hono) | | ioredis | RedisStore with url (or use your own Redis client adapter) | | pg | PgStore (ratelimit-flex/postgres) | | mongodb | MongoStore (ratelimit-flex/mongo) | | @aws-sdk/client-dynamodb, @aws-sdk/lib-dynamodb | DynamoStore (ratelimit-flex/dynamo) | | prom-client | Optional: metrics.prometheus.registry integration | | @opentelemetry/api | Optional: metrics.openTelemetry.meter integration |

All peers are optional at install time; the runtime you choose must be present when you import that integration.

Node.js: >= 20.19.0 (see package.json engines).

Quick Start

Redis (shared limits across instances)

import express from 'express';
import { expressRateLimiter, multiInstancePreset } from 'ratelimit-flex';

const app = express();
app.use(expressRateLimiter(multiInstancePreset({ url: process.env.REDIS_URL! })));
app.get('/health', (_req, res) => res.json({ ok: true }));

PostgreSQL

See docs/stores/postgres.md for schema, indexes, and operations notes.

import express from 'express';
import { Pool } from 'pg';
import { expressRateLimiter, postgresPreset } from 'ratelimit-flex';
import { pgStoreSchema } from 'ratelimit-flex/postgres';

const pool = new Pool({ connectionString: process.env.DATABASE_URL });
// Run once during deploy / migrations (not per request):
await pool.query(pgStoreSchema);

const app = express();
app.use(expressRateLimiter(postgresPreset({ pool })));

MongoDB

See docs/stores/mongo.md for TTL indexes and client shapes.

import express from 'express';
import { MongoClient } from 'mongodb';
import { expressRateLimiter, mongoPreset } from 'ratelimit-flex';

const client = new MongoClient(process.env.MONGODB_URI!);
await client.connect();

const app = express();
app.use(expressRateLimiter(mongoPreset({ client, dbName: 'myapp' })));

DynamoDB

See docs/stores/dynamo.md for table creation, TTL, and sliding-window behavior.

import {
  CreateTableCommand,
  DynamoDBClient,
  UpdateTimeToLiveCommand,
} from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient } from '@aws-sdk/lib-dynamodb';
import express from 'express';
import { dynamoPreset, expressRateLimiter } from 'ratelimit-flex';
import {
  dynamoStoreEnableTtlParams,
  dynamoStoreTableSchema,
} from 'ratelimit-flex/dynamo';

const raw = new DynamoDBClient({ region: process.env.AWS_REGION ?? 'us-east-1' });
// Once at deploy (prefer CDK / Terraform in production):
await raw.send(new CreateTableCommand(dynamoStoreTableSchema));
await raw.send(new UpdateTimeToLiveCommand(dynamoStoreEnableTtlParams));

const doc = DynamoDBDocumentClient.from(raw);
const app = express();
app.use(expressRateLimiter(dynamoPreset({ client: doc, tableName: 'rate_limits' })));

Express (in-process defaults)

import express from 'express';
import rateLimit, { RateLimitStrategy } from 'ratelimit-flex';

const app = express();

// Sliding window (default) - smooth, accurate rate limiting
app.use(rateLimit({
  strategy: RateLimitStrategy.SLIDING_WINDOW, // optional, this is the default
  maxRequests: 100,
  windowMs: 60_000,
}));

// Token bucket - allows bursts
app.use(rateLimit({
  strategy: RateLimitStrategy.TOKEN_BUCKET,
  tokensPerInterval: 20,
  interval: 60_000,
  bucketSize: 60,
}));

// Fixed window - simplest, lowest memory
app.use(rateLimit({
  strategy: RateLimitStrategy.FIXED_WINDOW,
  maxRequests: 100,
  windowMs: 60_000,
}));

app.get('/health', (_req, res) => res.json({ ok: true }));

Fastify (same strategies)

import Fastify from 'fastify';
import { fastifyRateLimiter, RateLimitStrategy } from 'ratelimit-flex/fastify';

const app = Fastify();

// Sliding window (default)
await app.register(fastifyRateLimiter, {
  strategy: RateLimitStrategy.SLIDING_WINDOW,
  maxRequests: 100,
  windowMs: 60_000,
});

app.get('/health', async () => ({ ok: true }));

⚠️ Security Considerations: Before deploying to production, review Security and abuse for guidance on key cardinality, Redis namespaces, and admin API authentication.

Framework Integration

NestJS

// app.module.ts
import { Controller, Inject, Injectable, Module, Post } from '@nestjs/common';
import { ConfigModule, ConfigService } from '@nestjs/config';
import { KeyManager, RedisStore } from 'ratelimit-flex';
import { RateLimit, RateLimitModule, SkipRateLimit, RATE_LIMIT_KEY_MANAGER } from 'ratelimit-flex/nestjs';

@Module({
  imports: [
    RateLimitModule.forRoot({
      maxRequests: 100,
      windowMs: 60_000,
    }),
  ],
})
export class AppModule {}

// Async config with ConfigService (use in @Module({ imports: [...] }))
@Module({
  imports: [
    RateLimitModule.forRootAsync({
      imports: [ConfigModule],
      inject: [ConfigService],
      useFactory: (config: ConfigService) => ({
        store: new RedisStore({ url: config.get('REDIS_URL')!, /* ... */ }),
        maxRequests: config.get('RATE_LIMIT_MAX'),
      }),
    }),
  ],
})
export class AppModuleAsync {}

// Per-route override
import { RateLimit, SkipRateLimit } from 'ratelimit-flex/nestjs';

@Controller('auth')
export class AuthController {
  @RateLimit({ maxRequests: 5, windowMs: 60_000 })
  @Post('login')
  async login() {
    // ...
  }
}

@SkipRateLimit()
@Controller('health')
export class HealthController {
  // ...
}

// Inject store/keyManager in services
@Injectable()
export class AdminService {
  constructor(@Inject(RATE_LIMIT_KEY_MANAGER) private km: KeyManager) {}
  async blockUser(key: string) {
    await this.km.block(key, 3600_000);
  }
}

NestJS: Per-Route Configuration

RateLimitGuard uses the same RateLimitEngine and backing store for the whole app (or feature module). Per-route @RateLimit({ ... }) can override maxRequests, windowMs, cost, and keyGenerator.

Per-route strategy: The module uses one strategy for all routes. To apply different algorithms (e.g. token bucket vs sliding window) to different routes, register multiple RateLimitModule instances in separate feature modules with different strategy settings.

Performance note: The guard caches one RateLimitEngine per handler. Prefer static limits in decorators; avoid mutating reflected metadata at runtime.

NestJS: KeyManager Lifecycle

Simple rule: The module destroys KeyManagers it creates. User-supplied KeyManagers are never touched by the module.

Auto-created (from penaltyBox): Module calls keyManager.destroy() on onModuleDestroy
User-supplied (passed via keyManager option): You manage the lifecycle — call destroy() in your own OnModuleDestroy hook
Testing: await app.close() handles cleanup for auto-created KeyManagers
Non-Nest apps: Call keyManager.destroy() when shutting down

NestJS: `globalGuard` and module scope

globalGuard: true (default):

Registers APP_GUARD for automatic rate limiting on all routes
Makes the module global — RATE_LIMIT_* injection tokens available everywhere
Use @SkipRateLimit() decorator to exclude specific controllers/routes

globalGuard: false:

Does NOT register APP_GUARD
Module is NOT global — feature modules must imports: [RateLimitModule] to access tokens
Manually apply @UseGuards(RateLimitGuard) where needed

Upgrading from v2.x? See [Migration Guide][doc-migration] for breaking changes in v3.0.0.

Hono

import { Hono } from 'hono';
import { rateLimiter } from 'ratelimit-flex/hono';

const app = new Hono();

// Basic usage
const limiter = rateLimiter({
  maxRequests: 100,
  windowMs: 60_000,
  keyGenerator: (c) => c.req.header('x-api-key') ?? 'anon',
});

app.use('*', limiter);

// Per-route
app.post('/login', rateLimiter({ maxRequests: 5, windowMs: 60_000 }), async (c) => {
  return c.json({ ok: true });
});

// With Redis and in-memory shield
import { RedisStore } from 'ratelimit-flex';

const REDIS_URL = process.env.REDIS_URL!;

app.use(
  '*',
  rateLimiter({
    store: new RedisStore({ url: REDIS_URL }),
    maxRequests: 100,
    windowMs: 60_000,
    standardHeaders: 'draft-8',
    inMemoryBlock: true, // Enable DoS protection
  }),
);

// With metrics
const limiterWithMetrics = rateLimiter({
  maxRequests: 100,
  windowMs: 60_000,
  metrics: {
    enabled: true,
    intervalMs: 10_000,
  },
});

app.use('*', limiterWithMetrics);

// Access metrics
app.get('/metrics', (c) => {
  const snapshot = limiterWithMetrics.getMetricsSnapshot();
  return c.json(snapshot);
});

// Cleanup on shutdown
process.on('SIGTERM', async () => {
  await limiterWithMetrics.shutdown();
  process.exit(0);
});

// Queued rate limiter (wait instead of reject)
import { queuedRateLimiter } from 'ratelimit-flex/hono';

const apiLimiter = queuedRateLimiter({
  maxRequests: 10,
  windowMs: 60_000,
  maxQueueSize: 50,
  maxQueueTimeMs: 30_000,
});
app.use('/api/*', apiLimiter);
// Graceful shutdown: await apiLimiter.queue.shutdown({ drainTimeoutMs: 10_000, reason: 'server-shutdown' })
// or await apiLimiter.shutdown() — see “Request queuing” for Node/Bun/Workers notes.

// WebSocket rate limiting
import { webSocketLimiter } from 'ratelimit-flex/hono';
import { upgradeWebSocket } from 'hono/cloudflare-workers';

app.get(
  '/ws',
  webSocketLimiter({
    maxRequests: 10,
    windowMs: 60_000,
    keyGenerator: (c) => c.req.header('x-api-key') ?? 'anon',
  }),
  upgradeWebSocket(() => ({
    onMessage(event, ws) {
      ws.send('pong');
    },
  })),
);

Hono: engine parity

Same options as Express: rateLimiter accepts the full merged RateLimitOptions surface — including limits, compose.windows / ComposedStore, draft, groupedWindowStores, penaltyBox, keyManager, onLayerBlock, and incrementCost. Composed layers are available as c.get('rateLimitComposed') (same idea as Express req.rateLimitComposed).

queuedRateLimiter: Uses the same merge path as rateLimiter (full engine options: limits, composed store, inMemoryBlock, metrics, cost / incrementCost, allowlist/blocklist, standard headers, etc.). The returned handler matches rateLimiter for observability (metricsManager, shield, keyManager, openTelemetryAdapter, event hooks, shutdown, …) and adds queue. It still drives RateLimiterQueue via store.increment only — it does not run RateLimitEngine, so engine-only behavior is unavailable: no draft, no pre-increment keyManager / penaltyBox enforcement, and no c.get('rateLimitComposed'). Same trade-off as Express expressQueuedRateLimiter.

skipFailedRequests / skipSuccessfulRequests: The middleware awaits next() after a successful consume, then uses resolvedHonoRollbackStatus (exported from ratelimit-flex/hono) so a missing c.res, 0, or invalid c.res.status values are treated as 200 before applying the rollback. Rollbacks use resolveIncrementOpts / matchingDecrementOptions for weighted, grouped, and composed stores (same as Express / Fastify).

Cloudflare Workers: Pass waitUntil: (p) => c.executionCtx.waitUntil(p) so post-response decrement work for skip-response rules is scheduled on the execution context (optional on Node).

Custom rollback rules: If you need logic beyond HTTP status (e.g. body shape), add middleware after rateLimiter and call store.decrement with resolveIncrementOpts / matchingDecrementOptions; use HONO_RATE_LIMIT_INCREMENT_COST with the cost option for weighted quota.

Core Features

In-memory block shielding

Problem statement

Under DoS conditions, every blocked request still hits Redis — 100k req/sec from an attacker means 100k Redis calls/sec from your own app. InMemoryShield caches blocked keys in local memory so subsequent requests for the same key never touch the store. Result: 7x+ faster under attack, 99%+ fewer store calls.

Quick start

// Option 1: via middleware options (simplest)
app.use(expressRateLimiter({
  store: new RedisStore({ url: REDIS_URL, ... }),
  maxRequests: 100,
  windowMs: 60_000,
  inMemoryBlock: true, // shield kicks in at maxRequests
}));

// Option 2: explicit shield with custom config
import { shield, RedisStore } from 'ratelimit-flex';
const shielded = shield(new RedisStore({ ... }), {
  blockOnConsumed: 100,
  maxBlockedKeys: 10_000,
  onBlock: (key) => console.log(`Shielded: ${key}`),
});
app.use(expressRateLimiter({ store: shielded, maxRequests: 100, windowMs: 60_000 }));

Metrics

const metrics = limiter.shield?.getMetrics();
// {
//   blockedKeyCount: 42,        // keys currently blocked in memory
//   storeCallsSaved: 98721,     // total store calls avoided
//   totalKeysBlocked: 150,      // total keys blocked since startup
//   totalKeysExpired: 80,       // keys removed due to window expiry
//   totalKeysEvicted: 28,       // keys removed due to LRU eviction
//   hitRate: 0.993,             // cache hit rate
//   storeCalls: 684             // actual store calls made
// }

MetricsManager and periodic snapshots: When metrics are enabled, middleware passes the same InMemoryShield instance used as the engine store into MetricsManager. Each onMetrics snapshot may include shield — that object is shield.getMetrics() for that instance (blocked-key counts, hit rate, store calls avoided, etc.). Request, block, and latency totals still describe traffic through the engine, which calls increment on the outer store. If you pass an InMemoryShield as store and set inMemoryBlock: true, a second shield wraps the first; snapshot.shield reflects the outer layer only, and in non-production a one-time console.warn flags possible double-shielding (intentional stacking is supported).

How it works

Each request first checks an in-memory map for the key: if the key is still “shielded” (blocked and not yet expired), the limiter returns the cached blocked result in about ~0.01ms — no Redis round-trip. If there is no entry, or it expired, the request takes the slow path: increment() on the backing store (typically ~2–5ms for Redis, depending on network and load). When the store shows the key has consumed enough quota, the shield records that state locally and keeps serving blocked responses from RAM until the block window expires or you invalidate the entry (for example via KeyManager).

InMemoryShield implements RateLimitStore: wrap Redis, a composed store, or any custom implementation; use it with compose.* and multi-layer setups; expose shield metrics alongside Prometheus/OpenTelemetry; opt into onBlock, onExpire, and onShieldHit callbacks; and wire KeyManager so reward, unblock, and delete operations clear stale shield entries.

Programmatic key management

ratelimit-flex exposes a KeyManager for programmatic control of rate limit keys. Block abusive clients, apply penalty/reward points, inspect state, and react to events — all with full TypeScript types, an audit trail, and optional Redis persistence.

Basic usage

import express from 'express';
import { KeyManager, MemoryStore, RateLimitStrategy, expressRateLimiter } from 'ratelimit-flex';

const app = express();
const store = new MemoryStore({ strategy: RateLimitStrategy.SLIDING_WINDOW, windowMs: 60_000, maxRequests: 100 });
const keyManager = new KeyManager({ store, maxRequests: 100, windowMs: 60_000 });

const limiter = expressRateLimiter({ store, keyManager });
app.use(limiter);

// Programmatic control — from an admin route, webhook handler, etc.
await keyManager.block('abusive-ip', 3600_000, { type: 'manual', message: 'Spam detected' });
await keyManager.penalty('suspicious-user', 5);
await keyManager.reward('verified-user', 10);
const state = await keyManager.get('any-key');

Escalating penalties

import { KeyManager, exponentialEscalation } from 'ratelimit-flex';

const keyManager = new KeyManager({
  store,
  maxRequests: 100,
  windowMs: 60_000,
  penaltyBlockThreshold: 3,
  penaltyEscalation: exponentialEscalation(60_000), // 1min, 2min, 4min, 8min...
});

Event-driven alerting

keyManager.on('blocked', ({ key, reason }) => {
  alerting.send(`Key ${key} blocked: ${reason.type}`);
});

Admin API authentication

The admin router is a SECURITY-SENSITIVE surface. Mounting it without authentication exposes block/unblock/reward endpoints to every caller on the network. The auth option is required — unauthenticated admin routes are only available via the explicit development opt-in below.

Bearer token (recommended for service-to-service)

import { createAdminRouter } from 'ratelimit-flex';

app.use('/admin/ratelimit', createAdminRouter(keyManager, {
  auth: { type: 'bearer', token: process.env.ADMIN_TOKEN! },
}));

Basic auth (simple setups)

app.use('/admin/ratelimit', createAdminRouter(keyManager, {
  auth: {
    type: 'basic',
    username: 'admin',
    password: process.env.ADMIN_PASSWORD!,
  },
}));

Custom middleware (for JWT, OAuth, existing auth systems)

import { requireAuth } from './my-auth';

app.use('/admin/ratelimit', createAdminRouter(keyManager, {
  auth: { type: 'middleware', handler: requireAuth(['admin']) },
}));

Audit logging

createAdminRouter(keyManager, {
  auth: { type: 'bearer', token },
  onAdminAction: (action) => {
    auditLogger.info('admin-action', action);
  },
});

Development escape hatch

In development or tests where you genuinely don't need auth:

createAdminRouter(keyManager, {
  auth: { type: 'unsafe-no-auth', acknowledgeRisk: true },
});

This logs a warning at construction time. Never use this in production.

If you already protect admin routes with your own middleware, you can use type: 'middleware' (outer guard + built-in check for defense in depth, or move auth entirely into the router):

app.use('/admin/ratelimit', createAdminRouter(keyManager, {
  auth: { type: 'middleware', handler: authMiddleware },
}));

Fastify (`fastifyAdminPlugin` / `createFastifyAdminPlugin`)

The Fastify plugin takes a nested options object with the same auth, onAdminAction, and onAuthFailure fields:

await app.register(fastifyAdminPlugin, {
  keyManager,
  prefix: '/admin/ratelimit',
  options: {
    auth: { type: 'bearer', token: process.env.ADMIN_TOKEN! },
  },
});

What `KeyManager` provides

KeyManager gives you typed block reasons (manual, penalty-escalation, abuse-pattern, custom), an event emitter (blocked, unblocked, penalized, rewarded, and more), an audit log with filtering, escalation strategies for automatic penalty blocks, optional admin REST endpoints (createAdminRouter, fastifyAdminPlugin), and optional Redis-backed block persistence (RedisBlockStore) so block state can be shared across processes.

Redis-backed block persistence

Share block state across processes using RedisBlockStore:

import { KeyManager, RedisBlockStore, RedisStore, RateLimitStrategy } from 'ratelimit-flex';
import Redis from 'ioredis';

// Create a single Redis client instance
const redis = new Redis(process.env.REDIS_URL!);

// Share the client between RedisStore (for rate limit counters) and RedisBlockStore (for blocks)
const store = new RedisStore({
  client: redis,
  strategy: RateLimitStrategy.SLIDING_WINDOW,
  windowMs: 60_000,
  maxRequests: 100,
});

const blockStore = new RedisBlockStore(redis, { keyPrefix: 'rlf:blocks:' });

const keyManager = new KeyManager({
  store,
  blockStore,
  maxRequests: 100,
  windowMs: 60_000,
  syncIntervalMs: 5000, // Pull remote blocks every 5 seconds
});

// Blocks are now persisted to Redis and visible across all processes
await keyManager.block('abusive-ip', 3600_000, { type: 'manual', message: 'Spam' });

Cross-process consistency: KeyManager syncs blocks from Redis every syncIntervalMs (default 5000ms). Call await keyManager.syncBlocks() manually for immediate consistency.

Migrating from `penaltyBox`

Why you cannot set penaltyBox and keyManager together: mergeRateLimiterOptions throws if both appear in the same options object. penaltyBox uses the engine’s built-in violation counter and penaltyUntil map. A user-supplied KeyManager adds a separate blocking and penalty-point system (penalty(), escalation, audit). Allowing both would pit two policies against each other for the same keys.

Option A — keep penaltyBox: If you only need “N real rate-limit blocks within violationWindowMs, then ban for penaltyDurationMs, keep penaltyBox and do not pass your own keyManager. (Frameworks may still synthesize an internal KeyManager for Nest lifecycle or related wiring when you only use penaltyBox; that is not the same as configuring both options yourself.)

Option B — migrate to an explicit KeyManager: Drop penaltyBox and drive bans through KeyManager. Map fields roughly like this:

| penaltyBox | KeyManager | |--------------|----------------| | violationsThreshold | penaltyBlockThreshold (penalty points before an automatic block) | | penaltyDurationMs | penaltyBlockDurationMs (base duration), or replace with penaltyEscalation for longer blocks on repeat offenses | | onPenalty | keyManager.on('blocked', …) (and/or audit entries) |

The engine does not call keyManager.penalty() when a request hits the rate limit — you wire that yourself, typically from onLimitReached, so each limit hit adds a penalty point toward the threshold:

Before (penaltyBox):

app.use(
  expressRateLimiter({
    store,
    maxRequests: 100,
    windowMs: 60_000,
    penaltyBox: {
      violationsThreshold: 3,
      violationWindowMs: 3_600_000,
      penaltyDurationMs: 60_000,
    },
  }),
);

After (KeyManager + onLimitReached + escalation):

import { expressRateLimiter, KeyManager, exponentialEscalation } from 'ratelimit-flex';

const keyGenerator = (req: import('express').Request) =>
  /* same key you use for rate limiting, e.g. forwarded IP */ String(req.ip ?? '');

const keyManager = new KeyManager({
  store,
  maxRequests: 100,
  windowMs: 60_000,
  penaltyBlockThreshold: 3,
  penaltyEscalation: exponentialEscalation(60_000), // 1m, 2m, 4m, … after each threshold breach
});

app.use(
  expressRateLimiter({
    store,
    maxRequests: 100,
    windowMs: 60_000,
    keyGenerator,
    keyManager,
    onLimitReached: async (req) => {
      await keyManager.penalty(keyGenerator(req), 1);
    },
  }),
);

keyManager.on('blocked', ({ key, reason }) => {
  console.log(`Blocked: ${key}`, reason);
});

Semantics note: penaltyBox counts blocks in a sliding violationWindowMs (default one hour). KeyManager penalty points are tracked in an adjustment window tied to the limiter’s windowMs, not to violationWindowMs. If your old config relied on a long violation window and a short rate-limit window, either keep penaltyBox or add your own sliding-window counting before calling penalty().

Benefits of Option B:

Typed block reasons (manual, penalty-escalation, abuse-pattern, custom)
Event system for real-time alerting
Audit log with filtering
Escalation strategies (exponential, fibonacci, etc.)
Admin HTTP endpoints
Redis-backed block persistence

Security and abuse

Operational limits

Key cardinality protection

MemoryStore caps the number of distinct keys it tracks in memory. ClusterStore forwards work to a MemoryStore on the cluster primary, so the same LRU eviction and default cap apply to that in-process state. When the cap is reached, the least-recently-used key is evicted.

Default: 100,000 keys per MemoryStore instance (including the primary-side store used by ClusterStore).

This protects against unbounded memory growth from:

High-cardinality key generators (e.g., per-URL limits)
Misconfigured reverse proxies that pass spoofed IPs through
Deliberate attacks that cycle through millions of fake identifiers

Tune via maxKeys:

import { MemoryStore, RateLimitStrategy } from 'ratelimit-flex';

const store = new MemoryStore({
  strategy: RateLimitStrategy.SLIDING_WINDOW,
  windowMs: 60_000,
  maxRequests: 100,
  maxKeys: 50_000,  // tighter cap for memory-constrained environments
  onEvict: (key, reason) => {
    // Optional: track eviction rate as a health signal
    metrics.increment('ratelimit.evictions', { reason });
  },
});

To disable the cap (NOT recommended in production):

new MemoryStore({ /* ... */, maxKeys: 0 });

Monitoring eviction pressure

If totalEvictions (from MemoryStore.getMetrics()) grows rapidly, your maxKeys is too low or your keyGenerator is producing high-cardinality keys that should be normalized.

When metrics are enabled (metrics.enabled and the built-in pipeline), check Prometheus text or your registry for:

ratelimit_store_active_keys{store="memory"} — current distinct keys
ratelimit_store_total_evictions{store="memory"} — cumulative LRU evictions (lifetime for that store instance)
ratelimit_store_max_keys{store="memory"} — configured cap (0 means unlimited)

The same values appear on each interval in MetricsSnapshot.store when the engine store is (or unwraps to) a MemoryStore.

`keyGenerator` and storage keys

Rate limit state (InMemoryShield block maps, KeyManager bookkeeping, Redis keys, etc.) still grows with distinct storage keys. A keyGenerator that returns a new high-cardinality value per request (full URL including unbounded query strings, raw JWTs, unbounded device fingerprints) lets attackers inflate memory or Redis usage—even below the maxKeys cap.

Mitigations: Prefer stable, low-cardinality identifiers (user id, tenant id, API key id). Normalize or hash untrusted inputs before using them as keys. The library does not cap key string length—enforce a maximum or digest in your keyGenerator if inputs are user-controlled. Use InMemoryShieldOptions.maxBlockedKeys and related limits where applicable.

Redis namespace (`keyPrefix`)

RedisStore (and RedisBlockStore) prefix all logical keys. Use a different keyPrefix (and/or Redis DB index) per application or tenant when multiple services share one Redis so counters and blocks do not collide. Document the convention for your org.

Lua scripts (`RedisStore`)

All Lua in RedisStore is static source in the package. Quota and key data are passed only as KEYS / ARGV to EVAL—never build Lua by concatenating user input into the script body.

Atomicity & Distributed Systems

Redis operations are atomic

All RedisStore operations use Lua scripts for atomicity. Each rate limit check executes as a single atomic operation on the Redis server—no race conditions, no requests slipping through under concurrent load from multiple processes or nodes.

What this means:

Sliding window: ZREMRANGEBYSCORE (prune expired) + ZADD (add entries) + ZCARD (count) + PEXPIRE — all in one EVAL
Fixed window: INCRBY + conditional PEXPIRE + PTTL — all in one EVAL
Token bucket: HGET (read state) + refill calculation + token deduction + HSET (write) + PEXPIRE — all in one EVAL

No interleaving: Other Redis clients cannot execute commands between the steps of a rate limit operation. The entire check-and-increment logic runs atomically.

Why Lua? Redis EVAL executes Lua scripts as atomic blocks. While a script runs, Redis does not process other commands from other clients. This guarantees that:

Concurrent requests from multiple app instances cannot race
Distributed systems get consistent, accurate rate limiting
Multi-step operations (read → calculate → write) are safe

Script caching: Most Redis clients (including ioredis and node-redis) automatically cache Lua scripts server-side after the first execution using EVALSHA. Subsequent calls reuse the cached script, reducing network overhead. The library passes the full script source on every call; the client handles optimization transparently.

MemoryStore & ClusterStore: In-process stores use JavaScript synchronous operations (no atomicity concerns within a single event loop). ClusterStore uses IPC message passing with acknowledgments to coordinate across Node.js cluster workers.

Distributed deployment considerations

When running multiple app instances with RedisStore:

Shared state: All instances see the same counters in Redis
Consistent limits: A user hitting 100 req/min is enforced globally, not per instance
No coordination needed: Each instance independently calls Redis; Lua atomicity handles races
Network latency: Redis round-trip adds ~1-5ms per request (use InMemoryShield to cache blocked keys and eliminate Redis calls for hot attackers)

Cluster vs Redis:

ClusterStore: Coordinates rate limits across Node.js cluster workers in a single machine (IPC, no network)
RedisStore: Coordinates across multiple machines/containers/regions (network, shared Redis)

For multi-instance deployments (Kubernetes, serverless, multiple VMs), use RedisStore. For single-machine concurrency (one server, multiple CPU cores), use ClusterStore.

Limiter composition

Combine multiple rate limiters with the compose builder. Every composition mode implements RateLimitStore, so composed stores plug directly into expressRateLimiter / fastifyRateLimiter.

Composition Modes

| Mode | Behavior | Use case | |------|----------|----------| | all | Block if any layer blocks | Multi-window limiting (10/sec AND 100/min) | | overflow | Try primary first; if blocked, try burst pool | Steady rate + burst allowance | | first-available | Try layers in order; first that allows wins | Failover chain (Redis → memory) | | race | Fire all layers in parallel; fastest wins | Multi-region latency optimization |

Quick Examples

Multi-window (10/sec AND 100/min):

import { compose, expressRateLimiter } from 'ratelimit-flex';

const store = compose.windows(
  { windowMs: 1_000, maxRequests: 10 },
  { windowMs: 60_000, maxRequests: 100 },
);

app.use(expressRateLimiter({ store }));

Burst allowance (steady + burst):

const store = compose.withBurst({
  steady: { windowMs: 1_000, maxRequests: 5 },
  burst:  { windowMs: 60_000, maxRequests: 20 },
});

app.use(expressRateLimiter({ store }));

Failover chain (Redis → memory):

const store = compose.firstAvailable(
  compose.layer('redis', redisStore),
  compose.layer('memory', memoryStore),
);

app.use(expressRateLimiter({ store }));

Full documentation: See [docs/COMPOSITION.md][doc-composition] for:

Nested composition patterns
Per-layer observability
Redis composition presets
Migration from limits array

Request queuing

Source of truth: Full FIFO semantics, head-of-line blocking, and multi-key patterns are documented in JSDoc on src/queue/RateLimiterQueue.ts (RateLimiterQueueOptions, RateLimiterQueue). That file is the canonical explanation; this section summarizes it for README readers.

Typical use case: Outbound API throttling (one queue per external API, single key for all requests).

Head-of-line blocking (by design): The queue is one FIFO array. If you share that queue across different keys, a waiting request for key A sits in front of a request for key B — even when B still has rate-limit capacity — because release order follows enqueue order, not per-key fairness.

flowchart LR
  A["enqueue: key A — over limit, waits first"] --> B["enqueue: key B — has quota but queued after A"]
  B --> C["B cannot skip ahead — one FIFO per RateLimiterQueue"]

Over-limit requests wait in a FIFO queue until the backing store.increment allows them (not the full RateLimitEngine path — see Engine vs queued parity and Failure modes).

For many distinct keys hitting the same Express/Fastify scope, treat KeyedRateLimiterQueue (or separate queues per key) as the default pattern — a single RateLimiterQueue is FIFO across unrelated keys (head‑of‑line blocking).

Quick Start

Express:

import { expressQueuedRateLimiter } from 'ratelimit-flex';

app.use('/api', expressQueuedRateLimiter({
  maxRequests: 5,
  windowMs: 10_000,
  maxQueueSize: 50,
  maxQueueTimeMs: 30_000,
}));

Fastify:

import { fastifyQueuedRateLimiter } from 'ratelimit-flex/fastify';

await app.register(fastifyQueuedRateLimiter, {
  maxRequests: 5,
  windowMs: 10_000,
  maxQueueSize: 50,
  maxQueueTimeMs: 30_000,
});

Graceful shutdown

Rate limiter queues expose a shutdown({ drainTimeoutMs, reason }) method that gracefully rejects pending requests during process termination.

Express:

const limiter = expressQueuedRateLimiter({ maxRequests: 10, windowMs: 60_000 });
app.use(limiter);

process.on('SIGTERM', async () => {
  const result = await limiter.queue.shutdown({ drainTimeoutMs: 10_000 });
  console.log(`Queue drained: ${result.drained}, rejected: ${result.rejected}`);
  await server.close();
});

Fastify:

Fastify's onClose hook automatically calls queue.shutdown() when the server closes. No manual wiring needed.

await app.register(fastifyQueuedRateLimiter, {
  maxRequests: 10,
  windowMs: 60_000,
});
// queue is drained automatically on app.close()

Hono:

import { Hono } from 'hono';
import { queuedRateLimiter } from 'ratelimit-flex/hono';

const app = new Hono();
const limiter = queuedRateLimiter({ maxRequests: 10, windowMs: 60_000 });
app.use('*', limiter);

// On Cloudflare Workers, there's no process shutdown — the queue drains
// when the worker instance is evicted. On Node/Bun, call shutdown manually:
process.on('SIGTERM', async () => {
  await limiter.queue.shutdown({ drainTimeoutMs: 5_000 });
});

Requests rejected during shutdown receive a 503 Service Unavailable response with Retry-After: 10. Clients should retry. The error code on the thrown ShutdownError is E_RATELIMIT_SHUTDOWN.

Outbound API throttling:

import { createRateLimiterQueue } from 'ratelimit-flex';

const githubQueue = createRateLimiterQueue({
  maxRequests: 30,
  windowMs: 60_000,
  maxQueueSize: 200,
});

await githubQueue.removeTokens('github-api');
const response = await fetch('https://api.github.com/repos/...');

Multi-key fairness: Prefer KeyedRateLimiterQueue (or one **RateLimiterQueue per logical key)—see Engine vs queued parity and [Multi-key patterns][doc-queuing].

Full documentation: See [docs/QUEUING.md][doc-queuing] for:

Multi-key patterns
Graceful shutdown
Store ownership
Advanced patterns (per-tenant, priority queuing)

Backing store implementation (waiting / slot accounting):

Redis: ZSET with ZREMRANGEBYSCORE + ZADD + ZCARD in atomic Lua
Memory: Sorted array of timestamps per key
Boundary behavior: Smooth - no 2x burst at window edges

Choosing a strategy

Examples use expressRateLimiter; fastifyRateLimiter, rateLimiter (Hono), and presets accept the same strategy fields unless the integration README calls out an exception.

Sliding window (default)

Smooth limiting without boundary spikes typical of naive fixed slicing.

Algorithm implementation (rate-limit store — not queue):

Redis: ZSET-based Lua prune + score + cardinality
Memory: Per-key sliding timestamp list within windowMs

import { expressRateLimiter, RateLimitStrategy } from 'ratelimit-flex';

app.use(
  expressRateLimiter({
    strategy: RateLimitStrategy.SLIDING_WINDOW, // default
    windowMs: 60_000,
    maxRequests: 100,
  }),
);

Token bucket (for bursty traffic)

Refills tokens on a schedule; clients can burst up to bucketSize. Best for spiky traffic (mobile apps, retries, webhooks).

Implementation:

Redis: HASH with atomic refill calculation + token deduction in Lua
Memory: Stores { tokens, lastRefill } per key
Burst control: Allows bursts when bucket is full

import { expressRateLimiter, RateLimitStrategy } from 'ratelimit-flex';

app.use(
  expressRateLimiter({
    strategy: RateLimitStrategy.TOKEN_BUCKET,
    tokensPerInterval: 20,  // Add 20 tokens per minute
    interval: 60_000,       // Every 60 seconds
    bucketSize: 60,         // Max 60 tokens (allows 3x burst)
  }),
);

Fixed window (simplest)

One counter per fixed time slice. Simplest and lowest memory; acceptable when occasional boundary spikes are OK.

Implementation:

Redis: INCRBY + PEXPIRE in atomic Lua script
Memory: Single counter per key
Warning: Users can burst 2x limit at boundaries (50 at 11:59:59, 50 at 12:00:00)

import { expressRateLimiter, RateLimitStrategy } from 'ratelimit-flex';

app.use(
  expressRateLimiter({
    strategy: RateLimitStrategy.FIXED_WINDOW,
    windowMs: 60_000,
    maxRequests: 100,
  }),
);

Performance Benchmarks

Benchmarks measured on Apple M1 Pro, Node.js v20, using isolated test harness. Your results may vary based on hardware, network latency (Redis), and load patterns.

Throughput (requests/second)

| Store | Strategy | Throughput | Notes | |-------|----------|------------|-------| | MemoryStore | Sliding Window | ~450,000 | Single process, in-memory only | | MemoryStore | Fixed Window | ~750,000 | Lowest overhead | | MemoryStore | Token Bucket | ~550,000 | Refill calculation overhead | | RedisStore | Sliding Window | ~35,000 | Network-bound, local Redis | | RedisStore | Fixed Window | ~45,000 | Simpler Lua script | | InMemoryShield (hit) | — | ~1,800,000 | Blocked keys cached in memory | | InMemoryShield (miss) | — | ~35,000 | Falls through to Redis |

Latency Overhead (p50 / p95 / p99)

| Store | p50 | p95 | p99 | Notes | |-------|-----|-----|-----|-------| | MemoryStore | 0.05ms | 0.12ms | 0.25ms | Pure JavaScript, no I/O | | RedisStore (local) | 1.8ms | 4.2ms | 8.5ms | Includes network + Lua execution | | RedisStore (remote) | 5-15ms | 15-30ms | 30-50ms | Depends on network latency | | InMemoryShield (hit) | 0.01ms | 0.03ms | 0.06ms | Hash map lookup only | | InMemoryShield (miss) | 1.8ms | 4.2ms | 8.5ms | Same as RedisStore |

Memory Usage (per 10k keys)

| Store | Strategy | Memory | Notes | |-------|----------|--------|-------| | MemoryStore | Sliding Window | ~2.5 MB | Stores timestamps per hit | | MemoryStore | Fixed Window | ~0.8 MB | Single counter per key | | MemoryStore | Token Bucket | ~1.2 MB | Stores tokens + lastRefill | | InMemoryShield | — | ~1.5 MB | Blocked keys + expiry times |

Scalability

Single process (MemoryStore):

Linear scaling with CPU cores (use Node.js cluster or ClusterStore)
No network overhead
Memory grows with unique keys

Multi-process (RedisStore):

Horizontal scaling across machines
Network latency adds ~1-5ms per request (local Redis)
Shared state across all instances

InMemoryShield + Redis:

Best of both: shared state + local caching for hot keys
7x faster for blocked keys under attack
99%+ reduction in Redis calls for repeat offenders

Benchmark Methodology

Published numbers below were gathered with isolated harnesses at a fixed point in time (hardware-dependent). npm run benchmark in this repo runs only MemoryStore increment micro-benchmarks (round-robin keys, configurable BENCHMARK_OPS; no Redis). For RedisStore, reproduce with your Redis topology and tooling (see examples/redis/README.md).

Run the MemoryStore script:

git clone https://github.com/ashwinpaulallen/ratelimit-flex.git
cd ratelimit-flex
npm install
npm run benchmark

Optional: BENCHMARK_OPS=500000 npm run benchmark (--expose-gc if you tweak the script for heap deltas).

Note: These are micro-benchmarks. Real-world performance depends on your application's request patterns, key cardinality, network topology, and Redis configuration.

Weighted / cost-based rate limiting

By default each request consumes one quota unit. For endpoints that should count more (file uploads, heavy database work, high GraphQL complexity), use a cost greater than 1.

Middleware / engine — set incrementCost on the rate limiter options (number or function of the request):

import { expressRateLimiter } from 'ratelimit-flex';

app.use(
  expressRateLimiter({
    maxRequests: 100,
    windowMs: 60_000,
    incrementCost: (req) =>
      String((req as import('express').Request).path ?? '').startsWith('/upload') ? 10 : 1,
  }),
);

Custom pipelines — call the store directly with increment / decrement options:

await store.increment(key, { cost: 10 });
// … later, undo the same weight (e.g. custom skip logic):
await store.decrement(key, { cost: 10 });

Dynamic caps plus cost still work together: increment accepts { maxRequests?, cost? } on window strategies.

Helpers resolveIncrementOpts(options, req) and matchingDecrementOptions(incOpts) are exported if you build your own middleware and need the same increment/decrement pairing as the built-in engine.

Redis implementation note: for sliding windows with cost > 1, each ZSET member is a distinct random value so Redis never silently merges two hits into one.

Store backends

Choose a backend from latency, consistency, and operational constraints. Deeper setup for SQL, MongoDB, and DynamoDB lives in docs/stores/postgres.md, docs/stores/mongo.md, and docs/stores/dynamo.md.

| Backend | Atomic sliding window | TTL cleanup | Latency | Best for | |---------|----------------------|-------------|---------|----------| | MemoryStore | exact (in-process) | in-process | <0.01ms | Single process | | RedisStore | exact (Lua ZSET) | Redis EXPIRE | 1–5ms | Multi-instance, high TPS | | ClusterStore | exact (IPC) | in-process | <0.1ms | Node cluster, one host | | PgStore | exact (JSONB array) | background sweep | 2–10ms | Postgres shops without Redis | | MongoStore | exact (aggregation pipeline) | TTL index | 2–10ms | MongoDB shops | | DynamoStore | approximate (weighted sub-window) | DynamoDB TTL | 5–20ms | AWS-native deployments |

Deployment guide

When to use MemoryStore

Use MemoryStore when:

One Node process serves all traffic (no horizontal scale)
Local development and prototyping
Automated tests
Small deployments with a single instance

Counters live only in that process. No Redis required.

import { expressRateLimiter, MemoryStore, RateLimitStrategy } from 'ratelimit-flex';

const store = new MemoryStore({
  strategy: RateLimitStrategy.SLIDING_WINDOW,
  windowMs: 60_000,
  maxRequests: 100,
});

app.use(expressRateLimiter({ store, windowMs: 60_000, maxRequests: 100 }));

If you omit store, the middleware creates a MemoryStore from windowMs / maxRequests (or token-bucket fields).

When to use ClusterStore

Use ClusterStore when:

Node.js native cluster module (not PM2)
No Redis available or desired
Single server with multiple CPU cores

// primary.ts (ESM — top-level await)
import cluster from 'node:cluster';
import { ClusterStorePrimary } from 'ratelimit-flex';

if (cluster.isPrimary) {
  ClusterStorePrimary.init();
  for (let i = 0; i < 4; i++) cluster.fork();
} else {
  await import('./app.js');
}

// app.ts (worker)
import express from 'express';
import { expressRateLimiter, clusterPreset } from 'ratelimit-flex';

const app = express();
app.use(expressRateLimiter(clusterPreset({ maxRequests: 100, windowMs: 60_000 })));

IPC protocol version: Worker init and primary init_ack carry protocolVersion (constants CLUSTER_IPC_PROTOCOL_VERSION and MIN_CLUSTER_IPC_PROTOCOL_VERSION in src/cluster/protocol.ts). During rolling deploys, if a worker’s version is newer than the primary, the primary responds with init_nack so the process fails fast instead of corrupting counters. Legacy peers that omit protocolVersion are treated as version 1.

When to use RedisStore

Use RedisStore when:

Multiple Node processes (e.g. PM2 cluster)
Multiple servers behind a load balancer
Kubernetes, Docker Swarm, or similar
Microservices where the same client can hit different instances
You need one global limit across replicas

import { expressRateLimiter, RedisStore, RateLimitStrategy } from 'ratelimit-flex';

const store = new RedisStore({
  strategy: RateLimitStrategy.SLIDING_WINDOW,
  windowMs: 60_000,
  maxRequests: 100,
  url: process.env.REDIS_URL!,
});

app.use(expressRateLimiter({ store, strategy: RateLimitStrategy.SLIDING_WINDOW }));

Prefer passing a shared Redis URL or client from every instance. Use a distinct key prefix (keyPrefix) per app or per limiter if several services share one Redis.

Clients and adapters: The default url path uses optional peer ioredis. For @redis/client (node-redis), adaptNodeRedisClient; for ioredis, adaptIoRedisClient—see RedisLikeClient in the API reference. Bun and Upstash need thin wrappers (Lua EVAL required); copy-paste starters live in examples/redis/README.md (not published packages—maintain locally).

Lua EVAL, EVALSHA, and connections: RedisStore always invokes eval(fullScript, …) on your client. It does not embed EVALSHA. Clients often optimize repeated EVAL into EVALSHA after Redis caches the script. Reuse one long-lived client per process (or warm serverless instance) where possible—per-request connections add latency and can reduce script-cache hits on the Redis side.

Multi-window: The limits: [{ windowMs, max }, …] option (see Multi-window limits (limits)) defaults to one MemoryStore per window. Pass a sliding/fixed-window RedisStore as store together with limits to reuse connection settings (and optional resilience, cloned per slot) and get one Redis-backed slot per window with distinct key prefixes. A MemoryStore with limits is accepted and ignored (same as omitting store). Alternatively use compose.windows(redisTemplate, …), multiWindowPreset, or groupedWindowStores.

When to use PgStore

Use PgStore when:

You already run PostgreSQL and prefer not to add Redis
You want exact sliding, fixed-window, and token-bucket semantics with atomic INSERT … ON CONFLICT / JSONB sliding arrays
A small amount of background sweep work for expired rows is acceptable

import { expressRateLimiter, postgresPreset } from 'ratelimit-flex';
import { pgStoreSchema } from 'ratelimit-flex/postgres';
import { Pool } from 'pg';

const pool = new Pool({ connectionString: process.env.DATABASE_URL! });
await pool.query(pgStoreSchema);

app.use(expressRateLimiter(postgresPreset({ pool })));

Run pgStoreSchema (or your migration) once at deploy. Tune autoSweepIntervalMs / worker estimates if you shard many app processes—see docs/stores/postgres.md.

When to use MongoStore

Use MongoStore when:

Your system of record is MongoDB (Atlas or self-hosted)
You are on MongoDB 4.2+ (aggregation pipelines in findOneAndUpdate)
You can maintain a TTL index on the reset field for passive expiry

import { expressRateLimiter, mongoPreset } from 'ratelimit-flex';
import { MongoClient } from 'mongodb';

const client = new MongoClient(process.env.MONGODB_URI!);
await client.connect();

app.use(expressRateLimiter(mongoPreset({ client, dbName: 'myapp' })));

See docs/stores/mongo.md for index requirements and failure modes.

When to use DynamoStore

Use DynamoStore when:

You deploy on AWS and want a managed, serverless-friendly store
Fixed window and token bucket must be exact on DynamoDB
Sliding window can be approximate (weighted sub-windows; typically <2% error, higher near window edges—see docs/stores/dynamo.md)

import { dynamoPreset, expressRateLimiter } from 'ratelimit-flex';
import { DynamoDBDocumentClient } from '@aws-sdk/lib-dynamodb';

const doc = DynamoDBDocumentClient.from(/* DynamoDBClient */);
app.use(expressRateLimiter(dynamoPreset({ client: doc, tableName: 'rate_limits' })));

Create the table and enable TTL on the ttl attribute once (CDK / Terraform / console). dynamoPreset defaults to fixed window so out-of-the-box counting is exact; pass strategy: RateLimitStrategy.SLIDING_WINDOW when you accept the weighted approximation.

Deployment topology

| Setup | Store | What’s shared | What’s per-process | |-------|--------|----------------|---------------------| | Single process | MemoryStore | Everything (one process) | N/A | | Node.js native cluster (same host, forked workers) | ClusterStore + ClusterStorePrimary | Rate limit counters (on primary) | Allowlist, blocklist, penalty | | PM2 cluster (same host) | RedisStore | Rate limit counters | Allowlist, blocklist, penalty | | Multiple servers + LB | RedisStore | Rate limit counters | Allowlist, blocklist, penalty | | Multiple servers + LB | PgStore / MongoStore (shared DB) | Rate limit rows in the database | Allowlist, blocklist, penalty | | Kubernetes pods | RedisStore | Rate limit counters | Allowlist, blocklist, penalty | | Kubernetes pods | PgStore / MongoStore | Rate limit rows | Allowlist, blocklist, penalty | | AWS Lambda / Fargate / multi-AZ | DynamoStore | DynamoDB table & TTL | Allowlist, blocklist, penalty | | Microservices (one global limit) | RedisStore (same namespace/prefix) | Rate limit counters | Allowlist, blocklist, penalty | | Microservices (per-service limits) | RedisStore (different prefix/DB) | Per-service counters | Allowlist, blocklist, penalty |

PM2 vs Node cluster: ClusterStore (Node’s native cluster IPC with ClusterStorePrimary on the primary) is not for PM2 cluster mode. PM2 runs independent worker processes and uses its own IPC to the daemon, not a Node cluster primary/worker tree. For PM2, use RedisStore (or another shared store). At startup, ClusterStore detects PM2 (PM2_HOME or pm_id) and throws a clear error if the process is not a Node cluster worker.

Sticky sessions: If your load balancer uses sticky sessions, MemoryStore can appear to work, but it is fragile—deploys and restarts reset counters per instance. RedisStore survives restarts and stays consistent across nodes.

Auto-detection and warnings

detectEnvironment() returns flags such as isKubernetes, isDocker, isCluster, isMultiInstance, and a recommended store ('memory' | 'redis'). Use it in your own startup logging or configuration.

import { detectEnvironment } from 'ratelimit-flex';

const env = detectEnvironment();
if (env.recommended === 'redis' && !process.env.REDIS_URL) {
  console.warn('Production-like environment detected; consider Redis for shared limits.');
}

Express and Fastify integrations also call warnIfMemoryStoreInCluster once at startup: if a MemoryStore is used and the process looks like a multi-instance environment (e.g. Docker, Kubernetes, PM2), a one-time stderr warning is printed.

Suppress with:

RATELIMIT_FLEX_NO_MEMORY_WARN=1

Similarly, if RedisStore is used without an insurance limiter (resilience.insuranceLimiter) in a multi-instance-looking environment, a one-time stderr reminder suggests resilientRedisPreset or configuring insurance for failover protection.

Suppress with:

RATELIMIT_FLEX_NO_RESILIENCE_WARN=1

Presets

Presets return a Partial<RateLimitOptions> you can pass to expressRateLimiter / fastifyRateLimiter (or spread and override).

`singleInstancePreset(options?)`

When: Dev, tests, single-process apps.

Sliding window, 100 req / min (defaults), in-memory (no store in preset—middleware builds MemoryStore).

import { expressRateLimiter, singleInstancePreset } from 'ratelimit-flex';

app.use(expressRateLimiter(singleInstancePreset({ maxRequests: 200 })));

`multiInstancePreset(redisOptions, options?)`

When: Production with Redis, multiple workers or nodes.

RedisStore, sliding window, 100 req / min
onRedisError: fail-open by default (override via redisOptions.onRedisError)

import { expressRateLimiter, multiInstancePreset } from 'ratelimit-flex';

app.use(
  expressRateLimiter(
    multiInstancePreset({ url: process.env.REDIS_URL! }, { maxRequests: 500 }),
  ),
);

`resilientRedisPreset(redisOptions, options?)`

When: Production Redis with insurance (in-memory fallback), circuit breaker, optional counter sync on recovery, and per-worker limit scaling. See Redis resilience for behavior, examples, and comparison with fail-open / fail-closed.

`clusterPreset(options?)`

When: Node.js native cluster module (not PM2), single server with multiple CPU cores, no Redis.

ClusterStore, sliding window, 100 req / min
Requires ClusterStorePrimary.init() on the primary process

// primary.ts
import cluster from 'node:cluster';
import { ClusterStorePrimary } from 'ratelimit-flex/cluster';

if (cluster.isPrimary) {
  ClusterStorePrimary.init();
  for (let i = 0; i < 4; i++) cluster.fork();
} else {
  await import('./app.js');
}

// app.ts (worker)
import { expressRateLimiter, clusterPreset } from 'ratelimit-flex';

app.use(expressRateLimiter(clusterPreset({ maxRequests: 100, windowMs: 60_000 })));

`queuedClusterPreset(options?)`

When: Node.js native cluster + request queuing (queue over-limit requests instead of rejecting them).

ClusterStore + expressQueuedRateLimiter / fastifyQueuedRateLimiter
Sliding window, 100 req / min, queue size 100, 30s max wait
Requires ClusterStorePrimary.init() on the primary process

// primary.ts
import cluster from 'node:cluster';
import { ClusterStorePrimary } from 'ratelimit-flex/cluster';

if (cluster.isPrimary) {
  ClusterStorePrimary.init();
  for (let i = 0; i < 4; i++) cluster.fork();
} else {
  await import('./app.js');
}

// app.ts (worker)
import { expressQueuedRateLimiter, queuedClusterPreset } from 'ratelimit-flex';

app.use('/api', expressQueuedRateLimiter(queuedClusterPreset({
  maxRequests: 50,
  windowMs: 60_000,
  maxQueueSize: 200,
})));

`apiGatewayPreset(redisOptions, options?)`

When: API gateway–style traffic, key per client credential.

Token bucket (~30 tokens/min, burst 60), x-api-key key generator
fail-closed when Redis is down (override possible)

import { expressRateLimiter, apiGatewayPreset } from 'ratelimit-flex';

app.use('/v1', expressRateLimiter(apiGatewayPreset({ url: process.env.REDIS_URL! })));

`authEndpointPreset(redisOptions, options?)`

When: Login, signup, password reset—brute-force protection.

Fixed window, 5 req / min per IP (default), IP-based key
fail-closed when Redis is down

import { expressRateLimiter, authEndpointPreset } from 'ratelimit-flex';

app.post(
  '/login',
  expressRateLimiter(authEndpointPreset({ url: process.env.REDIS_URL! }, { maxRequests: 10 })),
  loginHandler,
);

`publicApiPreset(options?)`

When: Public HTTP APIs with a simple in-memory limit and structured JSON errors.

Sliding window, 60 req / min, default message object

import { expressRateLimiter, publicApiPreset } from 'ratelimit-flex';

app.use('/public', expressRateLimiter(publicApiPreset()));

Redis failure handling

| Mode | Behavior if Redis errors during quota check | |------|-----------------------------------------------| | fail-open (default for RedisStore) | Request is allowed; warning logged | | fail-closed | Request is treated as blocked; middleware responds 503 with { error: 'Service temporarily unavailable' } |

Recommendation: fail-open for most general APIs (availability over strict quota). fail-closed for auth, payments, or when you must not serve traffic without a working limiter.

// Fail-open (default)
new RedisStore({ url: REDIS_URL, strategy: RateLimitStrategy.SLIDING_WINDOW, windowMs: 60_000, maxRequests: 100 });

// Fail-closed
new RedisStore({
  url: REDIS_URL,
  strategy: RateLimitStrategy.SLIDING_WINDOW,
  windowMs: 60_000,
  maxRequests: 100,
  onRedisError: 'fail-closed',
});

Policy vs counters: Allowlist, blocklist, and penalty box are enforced in the RateLimitEngine (in-memory) before the store runs. They still apply when Redis is down. Only quota / window / bucket counting depends on RedisStore.increment.

Redis resilience

When Redis is unavailable, the default fail-open / fail-closed modes either allow every request or block every request globally—there is no per-client quota during the outage. An insurance limiter fixes that: a dedicated MemoryStore that activates automatically when the circuit breaker decides Redis is unhealthy, so each process still enforces per-process limits. Configure that in-memory cap as roughly total shared limit ÷ expected worker count (e.g. 300 requests/minute across 5 replicas → 60 per process) so failover traffic stays in the same ballpark as your global Redis budget.

Manual setup (`RedisStore` + `resilience`)

import { expressRateLimiter, RedisStore, MemoryStore, RateLimitStrategy } from 'ratelimit-flex';

const insuranceStore = new MemoryStore({
  strategy: RateLimitStrategy.SLIDING_WINDOW,
  windowMs: 60_000,
  maxRequests: 60, // 300 / 5 workers
});

const store = new RedisStore({
  strategy: RateLimitStrategy.SLIDING_WINDOW,
  windowMs: 60_000,
  maxRequests: 300,
  url: process.env.REDIS_URL!,
  resilience: {
    insuranceLimiter: { store: insuranceStore },
    circuitBreaker: { failureThreshold: 3, recoveryTimeMs: 5000 },
    hooks: {
      onFailover: (err) => console.error('Redis down, using

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ratelimit-flex

Introduction

Features

Table of Contents

Installation

Quick Start

Redis (shared limits across instances)

PostgreSQL

MongoDB

DynamoDB

Express (in-process defaults)

Fastify (same strategies)

Framework Integration

NestJS

NestJS: Per-Route Configuration

NestJS: KeyManager Lifecycle

NestJS: globalGuard and module scope

Hono

Hono: engine parity

Core Features

In-memory block shielding

Problem statement

Quick start

Metrics

How it works

Programmatic key management

Basic usage

Escalating penalties

Event-driven alerting

Admin API authentication

Bearer token (recommended for service-to-service)

Basic auth (simple setups)

Custom middleware (for JWT, OAuth, existing auth systems)

Audit logging

Development escape hatch

Fastify (fastifyAdminPlugin / createFastifyAdminPlugin)

What KeyManager provides

Redis-backed block persistence

Migrating from penaltyBox

Security and abuse

Operational limits

Key cardinality protection

Monitoring eviction pressure

keyGenerator and storage keys

Redis namespace (keyPrefix)

Lua scripts (RedisStore)

Atomicity & Distributed Systems

Redis operations are atomic

Distributed deployment considerations

Limiter composition

Composition Modes

Quick Examples

Request queuing

Quick Start

Graceful shutdown

Choosing a strategy

Sliding window (default)

Token bucket (for bursty traffic)

Fixed window (simplest)

Performance Benchmarks

Throughput (requests/second)

Latency Overhead (p50 / p95 / p99)

Memory Usage (per 10k keys)

Scalability

Benchmark Methodology

Weighted / cost-based rate limiting

Store backends

Deployment guide

When to use MemoryStore

When to use ClusterStore

When to use RedisStore

When to use PgStore

When to use MongoStore

When to use DynamoStore

NestJS: `globalGuard` and module scope

Fastify (`fastifyAdminPlugin` / `createFastifyAdminPlugin`)

What `KeyManager` provides

Migrating from `penaltyBox`

`keyGenerator` and storage keys

Redis namespace (`keyPrefix`)

Lua scripts (`RedisStore`)

`singleInstancePreset(options?)`

`multiInstancePreset(redisOptions, options?)`

`resilientRedisPreset(redisOptions, options?)`

`clusterPreset(options?)`

`queuedClusterPreset(options?)`

`apiGatewayPreset(redisOptions, options?)`

`authEndpointPreset(redisOptions, options?)`

`publicApiPreset(options?)`

Manual setup (`RedisStore` + `resilience`)