npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

throttleai

v1.1.0

Published

Lightweight, token-based AI governance for TypeScript

Downloads

363

Readme

ThrottleAI is a zero-dependency governor for concurrency, rate, and token budgets, with adapters for fetch / OpenAI / tools / Express / Hono.

Every AI application eventually hits a wall: rate limits, blown budgets, noisy tenants, or cascading failures from uncontrolled concurrency. ThrottleAI sits between your code and the model call, enforcing limits with a lease-based protocol that guarantees cleanup — even when things go wrong.


At a glance

| | | |---|---| | Zero dependencies | Nothing to audit, nothing to break. Pure TypeScript. | | Lease-based | Acquire before calling, release after. Auto-expire on timeout. No leaked slots. | | 5 limiters | Concurrency, request rate, token rate, fairness, adaptive tuning — mix and match. | | 5 adapters | fetch, OpenAI, tool wrapper, Express middleware, Hono middleware — tree-shakeable. | | 3 presets | quiet(), balanced(), aggressive() — start in seconds, tune later. | | Observability built in | Structured events, formatted logs, snapshot inspection, stats collector. | | Test-friendly | Deterministic clock injection, no timers in your test suite. | | Dual build | ESM + CJS via tsup. Works everywhere Node 18+ runs. |


Install

pnpm add throttleai    # or npm / yarn / bun

60-second quickstart

import { createGovernor, withLease, presets } from "throttleai";

const gov = createGovernor(presets.balanced());

const result = await withLease(
  gov,
  { actorId: "user-1", action: "chat" },
  async () => await callMyModel(),
);

if (result.granted) {
  console.log(result.result);
} else {
  console.log("Throttled:", result.decision.recommendation);
}

That's it. The governor enforces concurrency, rate limits, and fairness. Leases auto-expire if you forget to release.


Why ThrottleAI exists

AI applications hit rate limits, blow budgets, and create stampedes. Without governance, a single runaway loop can exhaust your API quota, a noisy tenant can starve everyone else, and a slow upstream can cascade into timeouts across your stack.

ThrottleAI solves this with five composable limiters:

  • Concurrency — cap in-flight calls with weighted slots and interactive reserve
  • Rate — requests/min and tokens/min with rolling windows
  • Fairness — no single actor monopolizes capacity
  • Adaptive — auto-tune concurrency based on deny rate and upstream latency
  • Leases — acquire before, release after, auto-expire on timeout

You configure what you need and skip the rest. Most apps only need concurrency.


Choose your limiter

| Limiter | What it caps | When to use | |---------|-------------|-------------| | Concurrency | Simultaneous in-flight calls | Always — this is the most important knob | | Rate | Requests per minute | When the upstream API has a documented rate limit | | Token rate | Tokens per minute | When you have a per-minute token budget | | Fairness | Per-actor share of capacity | Multi-tenant apps where one user shouldn't hog slots | | Adaptive | Auto-tuned concurrency ceiling | When upstream latency is unpredictable |

Start with concurrency. Add rate only if needed. See the Tuning Cheatsheet for scenario-based guidance.


Presets

import { presets } from "throttleai";

// Single user, CLI tools — 1 call at a time, 10 req/min
createGovernor(presets.quiet());

// SaaS backend — 5 concurrent (2 interactive reserve), 60 req/min, fairness
createGovernor(presets.balanced());

// Batch processing — 20 concurrent, 300 req/min, fairness + adaptive tuning
createGovernor(presets.aggressive());

// Override any field
createGovernor({ ...presets.balanced(), leaseTtlMs: 30_000 });

| Preset | maxInFlight | interactiveReserve | req/min | tok/min | Fairness | Adaptive | Best for | |--------|:-----------:|:---------:|:-------:|:-------:|:--------:|:--------:|----------| | quiet() | 1 | 0 | 10 | — | No | No | CLI tools, scripts, single-user | | balanced() | 5 | 2 | 60 | 100K | Yes | No | SaaS backends, API servers | | aggressive() | 20 | 5 | 300 | 500K | Yes | Yes | Batch pipelines, high-volume |


Common patterns

Server endpoint: 429 vs queue

// Option A: immediate deny with 429
const result = await withLease(gov, request, fn);
// result.granted === false → respond with 429

// Option B: wait with bounded retries
const result = await withLease(gov, request, fn, {
  strategy: "wait-then-deny",
  maxAttempts: 3,
  maxWaitMs: 5_000,
});

UI interactive vs background

// User-facing chat gets priority
gov.acquire({ actorId: "user", action: "chat", priority: "interactive" });

// Background embedding can wait
gov.acquire({ actorId: "pipeline", action: "embed", priority: "background" });

With interactiveReserve: 2, background tasks are blocked when only 2 slots remain, keeping those for interactive requests.

Streaming calls

const decision = gov.acquire({ actorId: "user", action: "stream" });
if (!decision.granted) return;

try {
  const stream = await openai.chat.completions.create({ stream: true, ... });
  for await (const chunk of stream) {
    // process chunk
  }
  gov.release(decision.leaseId, { outcome: "success" });
} catch (err) {
  gov.release(decision.leaseId, { outcome: "error" });
  throw err;
}

Acquire once, release once — the lease holds for the entire stream duration.

Weighted calls

// Embedding: cheap (weight 1, the default)
gov.acquire({ actorId: "user", action: "embed" });

// GPT-4 with vision: expensive (weight 4 → consumes 4 concurrency slots)
gov.acquire({
  actorId: "user",
  action: "vision",
  estimate: { weight: 4 },
});

Idempotency

// Same key = same lease (no double-acquire)
const d1 = gov.acquire({ actorId: "user", action: "chat", idempotencyKey: "req-123" });
const d2 = gov.acquire({ actorId: "user", action: "chat", idempotencyKey: "req-123" });
// d1.leaseId === d2.leaseId — only one slot consumed

Observability: see why it throttles

import { createGovernor, formatEvent, formatSnapshot } from "throttleai";

const gov = createGovernor({
  ...presets.balanced(),
  onEvent: (e) => console.log(formatEvent(e)),
  // [deny] actor=user-1 action=chat reason=concurrency retryAfterMs=500 — All 5 slots in use...
});

// Point-in-time view
console.log(formatSnapshot(gov.snapshot()));
// concurrency=3/5 rate=12/60 leases=3

Stats collector

import { createGovernor, createStatsCollector } from "throttleai";

const stats = createStatsCollector();
const gov = createGovernor({ ...presets.balanced(), onEvent: stats.handler });

// Periodically check metrics
setInterval(() => {
  const s = stats.snapshot();
  console.log(`deny rate: ${(s.denyRate * 100).toFixed(1)}%, avg latency: ${s.avgLatencyMs.toFixed(0)}ms`);
}, 10_000);

Configuration

createGovernor({
  // Concurrency (optional)
  concurrency: {
    maxInFlight: 5,          // max simultaneous weight
    interactiveReserve: 1,   // slots reserved for interactive priority
  },

  // Rate limiting (optional)
  rate: {
    requestsPerMinute: 60,   // request-rate cap
    tokensPerMinute: 100_000, // token-rate cap
    windowMs: 60_000,         // rolling window (default 60s)
  },

  // Advanced (optional)
  fairness: true,             // prevent actor monopolization
  adaptive: true,             // auto-tune concurrency from deny rate + latency
  strict: true,               // throw on double release / unknown ID (dev mode)

  // Lease settings
  leaseTtlMs: 60_000,         // auto-expire (default 60s)
  reaperIntervalMs: 5_000,    // sweep interval (default 5s)

  // Observability
  onEvent: (e) => { /* acquire, deny, release, expire, warn */ },
});

API

createGovernor(config): Governor

Factory function. Returns a Governor instance.

governor.acquire(request): AcquireDecision

Request a lease. Returns:

// Granted
{ granted: true, leaseId: string, expiresAt: number }

// Denied
{ granted: false, reason, retryAfterMs, recommendation, limitsHint? }

Deny reasons: "concurrency" | "rate" | "budget" | "policy"

governor.release(leaseId, report?): void

Release a lease. Always call this — even on errors.

withLease(governor, request, fn, options?)

Execute fn under a lease with automatic release.

withLease(gov, request, fn, {
  strategy: "deny",           // default — fail immediately
  strategy: "wait",           // retry with backoff until maxWaitMs
  strategy: "wait-then-deny", // retry up to maxAttempts
  maxWaitMs: 10_000,          // max total wait (default 10s)
  maxAttempts: 3,             // for "wait-then-deny" (default 3)
  initialBackoffMs: 250,      // starting backoff (default 250ms)
});

governor.snapshot(): GovernorSnapshot

Point-in-time state: concurrency, rate, tokens, last deny.

formatEvent(event): string / formatSnapshot(snap): string

One-line human-readable formatters.

createStatsCollector(): StatsCollector

Zero-dep stats collector. Wire to onEvent for grants, denials, outcomes, latency tracking, and deny-rate calculation.

createTestClock(startMs?): Clock

Deterministic clock for testing. Advances manually — no flaky timers.

Status getters

gov.activeLeases         // active lease count
gov.concurrencyActive    // in-flight weight
gov.concurrencyAvailable // remaining capacity
gov.rateCount            // requests in current window
gov.tokenRateCount       // tokens in current window

governor.dispose(): void

Stop the TTL reaper. Call on shutdown.


Adapters

Tree-shakeable wrappers — import only what you use. No runtime deps.

| Adapter | Import | Auto-reports | |---------|--------|-------------| | fetch | throttleai/adapters/fetch | outcome (from HTTP status) + latency | | OpenAI | throttleai/adapters/openai | outcome + latency + token usage | | Tool | throttleai/adapters/tools | outcome + latency + custom weight | | Express | throttleai/adapters/express | outcome (from res.statusCode) + latency | | Hono | throttleai/adapters/hono | outcome + latency |

All adapters return { ok: true, result, latencyMs } on grant, { ok: false, decision } on deny.

fetch

import { wrapFetch } from "throttleai/adapters/fetch";
const throttledFetch = wrapFetch(fetch, { governor: gov });
const r = await throttledFetch("https://api.example.com/v1/chat");
if (r.ok) console.log(r.response.status);

OpenAI-compatible

import { wrapChatCompletions } from "throttleai/adapters/openai";
const chat = wrapChatCompletions(openai.chat.completions.create, { governor: gov });
const r = await chat({ model: "gpt-4", messages });
if (r.ok) console.log(r.result.choices[0].message.content);

Tool call

import { wrapTool } from "throttleai/adapters/tools";
const embed = wrapTool(myEmbedFn, { governor: gov, toolId: "embed", costWeight: 2 });
const r = await embed("hello");
if (r.ok) console.log(r.result);

Express

import { throttleMiddleware } from "throttleai/adapters/express";
app.use("/ai", throttleMiddleware({ governor: gov }));
// 429 + Retry-After header + JSON body on deny

See examples/express-adaptive/ for a full runnable server with adaptive tuning.

Hono

import { throttle } from "throttleai/adapters/hono";
app.use("/ai/*", throttle({ governor: gov }));
// 429 JSON on deny, leaseId stored on context

Docs

| Document | What it covers | |----------|---------------| | Handbook | End-to-end usage guide: architecture, patterns, production checklist | | Tuning cheatsheet | Scenario-based config guide, decision tree, knob reference | | Troubleshooting | Common issues: always denied, stalls, adaptive oscillation | | API stability | Public vs internal API surface, versioning policy | | Release manifest | Release process and artifact details | | Repo hygiene | Asset policy and history rewrite log |


Tuning quick reference

| You see this | Adjust this | |---|---| | reason: "concurrency" | Increase maxInFlight or decrease call duration | | reason: "rate" | Increase requestsPerMinute / tokensPerMinute | | reason: "policy" (fairness) | Lower softCapRatio or increase maxInFlight | | High retryAfterMs | Reduce leaseTtlMs so expired leases free faster | | Background tasks starved | Increase maxInFlight or reduce interactiveReserve | | Interactive latency high | Increase interactiveReserve | | Adaptive shrinks too fast | Lower alpha or raise targetDenyRate |

For deeper guidance, see the Tuning Cheatsheet.


Examples

See examples/ for runnable demos:

npx tsx examples/node-basic.ts

Stability

ThrottleAI follows Semantic Versioning. The public API — everything exported from throttleai and throttleai/adapters/* — is stable as of v1.0.0. Breaking changes require a major version bump.

For details on what's considered public vs internal, see API stability. For security reporting, see SECURITY.md.


License

MIT