npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

llm-hard-cap

v0.2.1

Published

Hard spend caps for OpenAI, Anthropic Claude, Google Gemini, and any LLM API. Real-time token cost tracking and daily / monthly / per-user USD limits that stop runaway AI bills before they happen.

Downloads

524

Readme

llm-hard-cap

Hard spend limits for OpenAI, Anthropic Claude, Google Gemini, and any LLM API. Track token costs in real time, enforce daily / monthly / per-user USD caps, and stop runaway AI bills before they happen.

npm downloads bundle size types license

npm install llm-hard-cap

llm-hard-cap is a zero-dependency TypeScript library that puts a hard ceiling on what your application can spend on LLM APIs. It supports OpenAI (GPT-4o, GPT-4-turbo, o1, o3-mini), Anthropic Claude (Opus 4.7, Sonnet 4.6, Haiku 4.5), Google Gemini (1.5/2.0/2.5), Mistral, DeepSeek, and any custom model you add.

If you've ever woken up to a $30,000 OpenAI bill from a runaway loop, or shipped an AI feature without per-user limits and learned the hard way that one user can drain your monthly quota in an hour — this is for you.


Why this exists

LLM provider dashboards show you what you spent yesterday. Rate limits stop you at 10,000 RPM, not at $500. So when a bug, retry loop, or one heavy user starts burning tokens, you find out from the bill.

llm-hard-cap enforces spend at the call site, before the request goes out:

  • Hard caps in USD, not RPM. daily: 10 means "$10/day, full stop."
  • Per-user / per-route scoping. Free users get $0.10/day; paid users get $5; an experimental route gets $1.
  • Pre-flight estimate + post-flight reconciliation. Block expensive calls before they hit the API, then record the actual cost from the response.
  • Provider-agnostic. Built-in pricing for 25+ models; bring your own for fine-tunes, Bedrock, Ollama, etc.
  • Zero dependencies. ~3 KB gzipped. TypeScript-first. Works in Node 18+, Bun, Deno.

Quick start

import OpenAI from "openai";
import { BudgetGuard } from "llm-hard-cap";

const openai = new OpenAI();
const guard = new BudgetGuard({
  limits: { daily: 10, monthly: 200 },
});

const response = await guard.wrap(
  { model: "gpt-4o-mini", estimatedInputTokens: 500, estimatedOutputTokens: 300 },
  () =>
    openai.chat.completions.create({
      model: "gpt-4o-mini",
      messages: [{ role: "user", content: "Hello!" }],
    }),
);

If today's spend would push past $10, guard.wrap throws BudgetExceededError before the OpenAI call is made. Actual token usage is recorded automatically after the call returns.


Per-user spend limits in 5 lines

Free vs. paid tiers without writing a quota system:

const guard = new BudgetGuard({ limits: { daily: 100 } }); // global ceiling

const userGuard = guard.for(`user:${userId}`, {
  daily: plan === "pro" ? 5 : 0.1,
});

await userGuard.wrap({ model: "gpt-4o", estimatedInputTokens: 1000 }, callOpenAi);

Each scope is tracked independently. Reset a scope when a user upgrades, or query their current usage:

await guard.usage("user:alice"); // { day: 0.0427, month: 1.13, total: 9.42, requests: 87 }

Three modes of use

1. wrap — protect a single call (recommended)

const result = await guard.wrap(
  { model: "claude-sonnet-4-6", estimatedInputTokens: 800, estimatedOutputTokens: 400 },
  () => anthropic.messages.create({ /* ... */ }),
);

wrap runs a pre-flight estimate, executes the call, and then records actual usage from the response. It auto-detects OpenAI-style (usage.prompt_tokens / usage.completion_tokens) and Anthropic-style (usage.input_tokens / usage.output_tokens) responses.

For other providers, pass an extract function:

await guard.wrap(
  { model: "gemini-1.5-pro", estimatedInputTokens: 1000 },
  () => gemini.generateContent({ /* ... */ }),
  (r) => ({
    inputTokens: r.usageMetadata.promptTokenCount,
    outputTokens: r.usageMetadata.candidatesTokenCount,
  }),
);

2. estimate — pre-flight check only

const { projectedUsd } = await guard.estimate({
  model: "gpt-4o",
  estimatedInputTokens: 50_000,
});
// Throws BudgetExceededError if not affordable, otherwise returns the cost.

3. check — record after the fact

When you already have real token counts (custom client, streaming, batch jobs):

await guard.check({
  model: "gpt-4o",
  inputTokens: response.usage.prompt_tokens,
  outputTokens: response.usage.completion_tokens,
});

Limits you can set

new BudgetGuard({
  limits: {
    perRequest: 0.25, // refuse any single call over $0.25
    daily: 10,        // $10 per UTC day
    monthly: 200,     // $200 per UTC month
    total: 1000,      // $1000 lifetime cap (per scope)
  },
  onExceeded: "throw", // or "warn" / "block"
});

Limits are checked in this order: perRequest, daily, monthly, total. The first violation throws BudgetExceededError, which exposes .window, .limitUsd, .currentUsd, .projectedUsd, and .scope.

What onExceeded does when a limit is hit

| Mode | check() | estimate() | wrap() | |-----------|--------------------------------------------------------|-------------------------------|-----------------------------------| | "throw" | throws BudgetExceededError | throws | throws (call never runs) | | "block" | returns { recorded: false }, spend not recorded | returns { allowed: false } | throws (call never runs) | | "warn" | logs, still records the spend (recorded: true) | logs, returns allowed: true | logs, runs the call, records it |

check() resolves to { costUsd, summary, recorded } and estimate() to { projectedUsd, summary, allowed }. In "block" mode, inspect allowed/recorded to decide what to do; wrap() can't return a value without making the call, so it throws instead.

Unknown models

By default an unpriced model name (e.g. a typo) throws UnknownModelError so a mistake can't silently disable the guard. Add the model via pricing, or pass onUnknownModel: "zero" to treat unknown models as free.


Handling rejected calls

import { BudgetExceededError } from "llm-hard-cap";

try {
  await guard.wrap({ model: "gpt-4o", estimatedInputTokens: 1000 }, call);
} catch (err) {
  if (err instanceof BudgetExceededError) {
    return res.status(429).json({
      error: "budget_exceeded",
      window: err.window, // "perRequest" | "day" | "month" | "total"
      limitUsd: err.limitUsd,
      currentUsd: err.currentUsd,
    });
  }
  throw err;
}

Persistence

The default storage is in-memory — fine for short-lived scripts and tests. For real apps:

import { BudgetGuard, FileStorage } from "llm-hard-cap";

const guard = new BudgetGuard({
  limits: { daily: 10 },
  storage: new FileStorage("./.llm-hard-cap.json"),
});

For distributed / multi-process setups, implement the Storage interface against Redis, Postgres, or your existing database:

import type { Storage, SpendEvent, UsageSummary } from "llm-hard-cap";

class RedisStorage implements Storage {
  async record(event: SpendEvent) { /* INCRBYFLOAT keys */ }
  async summary(scope: string): Promise<UsageSummary> { /* GET keys */ }
  async reset(scope?: string) { /* DEL */ }
}

Supported models out of the box

Pricing is built-in for:

| Provider | Models | |------------|--------| | OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, o1, o1-mini, o3-mini | | Anthropic | claude-opus-4-7, claude-opus-4-6, claude-sonnet-4-6, claude-sonnet-4, claude-haiku-4-5, plus all Claude 3.x snapshots | | Google | gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash, gemini-2.5-pro | | Mistral | mistral-large-latest, mistral-small-latest | | DeepSeek | deepseek-chat, deepseek-reasoner |

Snapshot IDs like gpt-4o-2024-11-20 fall back via prefix match. Override or extend at any time:

new BudgetGuard({
  limits: { daily: 5 },
  pricing: {
    "my-fine-tune": { input: 0.8, output: 2.4 }, // USD per 1M tokens
  },
});

FAQ

How is this different from OpenAI's usage limits in the dashboard?

OpenAI's caps are organization-wide, settle a day later, and don't tell you who spent what. llm-hard-cap enforces in real time, per scope (user, route, environment), and rejects calls before they leave your server.

Does this work with streaming responses?

Yes. Use estimate before opening the stream, then check once you receive the final usage event (OpenAI emits usage in the last chunk if you pass stream_options: { include_usage: true }).

Does this work with prompt caching discounts?

Compute the cost yourself with the cached vs. non-cached split and pass it via the pricing override or call calculateCost and use check with the real counts.

Does it count tokens for me?

No — it expects you to pass token counts (from the API response, or your own pre-flight estimate via tiktoken / @anthropic-ai/tokenizer). This keeps the package zero-dependency and accurate.

What happens on rate limit / 5xx errors?

The wrapped call propagates the error untouched. Nothing is recorded if the response doesn't include usage. This means failed calls don't count against your budget — exactly what you want.

Is it safe for multi-process servers?

The default MemoryStorage is per-process. Use FileStorage for single-host setups or implement Storage against Redis / Postgres for distributed apps.


API reference

new BudgetGuard(options)

| Option | Type | Default | |----------------|-----------------------------------|----------------| | limits | BudgetLimits | required | | onExceeded | "throw" \| "warn" \| "block" | "throw" | | storage | Storage | MemoryStorage| | pricing | Record<string, ModelPricing> | — | | onUnknownModel | "throw" \| "zero" | "throw" | | onSpend | (event: SpendEvent) => void | — |

Methods

  • guard.wrap(args, call, extract?) — pre-check, run, record. Returns the call's result.
  • guard.estimate(args) — pre-check only. Throws on violation.
  • guard.check(args) — record actual usage. Throws on violation.
  • guard.for(scope, limits?) — scoped child guard (per-user / per-route).
  • guard.usage(scope?) — current { day, month, total, requests }.
  • guard.reset(scope?) — clear usage for a scope (or all).

Examples

See examples/ for runnable scripts:


Comparison

| | llm-hard-cap | Provider dashboards | API gateway proxies | |--|--|--|--| | Real-time enforcement | ✅ | ❌ (delayed) | ✅ | | Per-user / per-scope | ✅ | ❌ | partial | | Zero infrastructure | ✅ | ✅ | ❌ (extra hop) | | Works across providers | ✅ | one each | ✅ | | Refuses calls before request | ✅ | ❌ | ✅ | | Bundle size | < 4 KB | n/a | n/a |


Contributing

Issues and PRs welcome. Pricing updates appreciated — providers change rates often.

License

MIT