@wildix/xbees-conversations-utils

v1.3.0

Published

3 days ago

This package wraps a `StreamChat` client into a proxy that adds shared rate-limit protection around Stream SDK calls without changing the normal sync API contract. Async calls go through retry / cooldown handling, while sync methods stay sync. If a proxie

`streamRateLimitHandlingProxy`

This package wraps a StreamChat client into a proxy that adds shared rate-limit protection around Stream SDK calls without changing the normal sync API contract. Async calls go through retry / cooldown handling, while sync methods stay sync. If a proxied call returns a channel-like object, that object is wrapped too, so the same protection continues on nested channel operations.

Public entry points

createRateLimitedStreamProxy(stream, {redis, logger}) Creates the proxy, attaches response interceptors once, and returns a rate-limit-aware StreamChat instance. Code: src/streamRateLimitHandlingProxy/createRateLimitedStreamProxy.ts
withStreamRateLimitOptions(options) Builds a marker token that can be passed as the last argument of a proxied async call to override retry behavior for that single call. Code: src/streamRateLimitHandlingProxy/withStreamRateLimitOptions.ts
Public exports are re-exported from src/streamRateLimitHandlingProxy/index.ts and then from src/index.ts.

How it works

createRateLimitedStreamProxy(...) tries to register axios response interceptors on the underlying Stream transport.
The proxy classifies Stream methods into groups generated in generatedMethodNames.ts:
- pure sync methods: returned as-is;
- sync methods returning channel/client objects: returned sync, but results are re-wrapped;
- async methods: executed through the rate-limit handler.
Every async call is normalized through executeWithRateLimitHandling(...).
Returned channel-like objects are detected and wrapped again, so calls like stream.channel(...).sendMessage(...) stay protected end to end.

If the underlying Stream SDK transport does not expose an axios instance, the proxy still works, but shared protection driven by response headers becomes less effective because no interceptors can be attached.

Protection layers

1. Soft global throttle

The interceptor reads x-ratelimit-limit and x-ratelimit-remaining. When usage becomes high, it enables a short shared Redis cooldown before real 429 responses start happening. This smooths bursts instead of waiting for hard throttling.

Trigger source: response headers on successful or failed requests
Effect:
- usage >= 95%: delay next requests by 3000ms;
- usage between 85% and 95%: delay by 1500ms;
- usage between 70% and 85%: delay by 500ms;
- usage < 70%: clear soft throttle.
Scope: global shared throttle
Code: helpers/processStreamRateLimitHeaders.ts

2. Hard per-operation 429 cooldown

If Stream returns HTTP 429, the handler converts that error into RateLimitExceededException, calculates retry delay from retry-after or fallback values, stores a Redis cooldown for the specific operation, and optionally retries the call.

If another worker hits the same operation while that cooldown is still active, the request is short-circuited locally before touching Stream. To keep that synthetic error useful, the latest real x-ratelimit-* snapshot is cached in Redis next to the cooldown and reused in the locally generated exception.

Trigger source: HTTP 429 / retry-after / x-ratelimit-*
Effect: per-operation block, optional retry, synthetic local short-circuit when cooldown is already known
Scope: operation-level
Code:
- execution flow: helpers/executeWithRateLimitHandling.ts
- 429 parsing: helpers/processStreamRateLimitException.ts
- exception creation: helpers/createRateLimitExceededException.ts
- snapshot serialization: helpers/rateLimitSnapshot.ts

3. App-wide budget cooldown

The interceptor also reads x-budget-* headers. If overall Stream app budget usage becomes high, it enables a global Redis cooldown. While this cooldown is active, requests are delayed in small sleep chunks so they can resume early if a fresher response lowers the cooldown.

Trigger source: x-budget-limit-ms, x-budget-remaining-ms, x-budget-used-ms
Effect:
- usage >= 80%: enable strong cooldown (30s-60s);
- usage between 70% and 80%: keep a relaxed cooldown (5s-10s);
- usage between 60% and 70%: keep a minimal cooldown (1s-2s);
- usage < 60%: clear cooldown.
Scope: whole application, not a single operation
Code: helpers/processStreamBudgetHeaders.ts

Retry and cooldown behavior

Local preflight checks happen before request execution when enableCooldown is on:
- active per-operation cooldown: fail fast with a synthetic RateLimitExceededException;
- active soft throttle: delay request briefly;
- active budget cooldown: delay request globally.
Real 429 responses can still be retried if:
- attempt < maxAttempts;
- computed delay <= maxRetryableDelayMs.
Delay calculation rules:
- if Stream provides Retry-After / reset information, that server value is preferred even if it is larger than maxDelayMs;
- maxDelayMs only caps fallback delay when server retry timing is missing or malformed;
- maxRetryableDelayMs decides whether that computed delay is still retryable or should be propagated immediately.
If retry is not allowed, the processed rate-limit exception is propagated to the caller.

Core logic: src/streamRateLimitHandlingProxy/helpers/executeWithRateLimitHandling.ts

Configuration

Default behavior is defined in src/streamRateLimitHandlingProxy/constants.ts:

maxAttempts = 3
enableCooldown = true
maxDelayMs = 5000
maxRetryableDelayMs = 10000

The same file also contains:

Redis key prefixes for hard cooldown, soft throttle, and budget cooldown
fallback retry values and jitter
budget thresholds and cooldown ranges
soft-throttle thresholds and delays

Notes:

enableCooldown controls local Redis-based preflight checks and shared cooldown reuse.
maxDelayMs is a fallback cap, not a hard upper bound for server-provided Retry-After.
Missing / invalid rate-limit headers do not break the proxy, but they reduce how much shared throttling state can be inferred.

Per-call overrides are passed as the last argument:

stream.queryChannels(filters, sort, options, withStreamRateLimitOptions({
  maxAttempts: 1,
  maxRetryableDelayMs: 0,
}));

Argument extraction is implemented in helpers/splitCallArgsAndOptions.ts.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

streamRateLimitHandlingProxy