npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

token-budget-guard

v1.0.1

Published

Control token usage before expensive AI API calls

Readme

token-budget-guard

npm npm downloads license

Control token usage before expensive AI API calls.

A lightweight utility to enforce token budgets for LLM requests and prevent cost spikes, latency issues, and runaway context growth.

It works with multiple AI providers and can automatically trim context, warn, or fail fast when a request exceeds a configured token limit.

Install

npm install token-budget-guard

Quick example

import { withTokenBudget } from "token-budget-guard";

await withTokenBudget({
  maxTokens: 2000,
  prompt,
  context,
  expectedOutputTokens: 200,
  strategy: "trim_context",
  call: async ({ prompt, context }) => aiClient(prompt, context),
});

Usage

import { withTokenBudget } from "token-budget-guard";

const { result, usage } = await withTokenBudget({
  model: "gpt-4",
  maxTokens: 8000,
  prompt,
  context,
  expectedOutputTokens: 500,
  strategy: "trim_context",
  call: async ({ prompt, context }) => {
    return client.responses.create({
      model: "gpt-4",
      input: [{ role: "user", content: [prompt, ...context] }],
    });
  },
});

console.log(usage);

Features

  • Token estimation with lightweight heuristic (~4 chars/token)
  • Budget enforcement before expensive AI API calls
  • Multiple strategies: fail-fast, trim_context, warn_only
  • Provider adapters (OpenAI, Anthropic, Gemini, Bedrock, Azure, Cohere)
  • Optional cost estimation for supported providers

Compatible AI Providers

  • OpenAI
  • Anthropic
  • Google Gemini
  • AWS Bedrock
  • Azure OpenAI
  • Cohere

Why

Tokens affect cost, latency, and reliability. This utility makes token usage explicit and enforceable. Monitoring per-request usage helps detect cost spikes, inefficient prompts, and unexpected growth in context size.

Use cases

  • Prevent unexpected token cost spikes
  • Control prompt + context growth
  • Add guardrails to production AI APIs
  • Monitor token usage per request
  • Enforce safe token budgets in LLM pipelines

How it works

prompt + context
      ↓
token estimation
      ↓
budget check
      ↓
strategy applied
  • fail-fast
  • trim-context
  • warn-only
      ↓
safe AI API call

Token estimation

Uses a rough heuristic (~4 chars/token). Counts may differ from model-specific tokenizers, especially for non-English text or code/JSON.

Planned improvements

  • Custom tokenizer adapters
  • Observability hooks
  • Cost estimation
  • Provider integrations

Real examples

Custom tokenizer adapter + hooks

import { withTokenBudget } from "token-budget-guard";

const { result, usage } = await withTokenBudget({
  model: "gpt-4",
  maxTokens: 2000,
  prompt,
  context,
  expectedOutputTokens: 300,
  tokenizer: {
    name: "tiktoken",
    estimate: (text, model) => myTokenizerEstimate(text, model),
  },
  onWarn: ({ reason, usage, maxTokens }) => {
    if (reason === "over_budget") {
      console.warn("Over budget", usage.totalTokens, maxTokens);
    }
  },
  onTrim: ({ removedCount, trimmedContext }) => {
    console.info("Trimmed", removedCount, "items", trimmedContext);
  },
  strategy: "trim_context",
  call: async ({ prompt, context }) => {
    return client.responses.create({
      model: "gpt-4",
      input: [{ role: "user", content: [prompt, ...context] }],
    });
  },
});

console.log(usage);

Warn-only behavior

await withTokenBudget({
  model: "gpt-4",
  maxTokens: 100,
  prompt,
  context,
  expectedOutputTokens: 20,
  strategy: "warn_only",
  onWarn: ({ usage, maxTokens }) => {
    console.warn("Over budget", usage.totalTokens, maxTokens);
  },
  call: async ({ prompt, context }) => {
    return client.responses.create({
      model: "gpt-4",
      input: [{ role: "user", content: [prompt, ...context] }],
    });
  },
});

Cost estimation

const { usage, cost } = await withTokenBudget({
  model: "gpt-4o-mini",
  maxTokens: 2000,
  prompt,
  expectedOutputTokens: 200,
  cost: {
    provider: "openai",
    model: "gpt-4o-mini",
    pricingOverrides: {
      "gpt-4o-mini": { inputPer1M: 0.15, outputPer1M: 0.6, currency: "USD" },
    },
  },
  call: async ({ prompt }) => client.responses.create({ model: "gpt-4o-mini", input: prompt }),
});

console.log(cost);

Provider adapters

Supported adapters: OpenAI, Anthropic, Gemini, AWS Bedrock, Azure OpenAI, Cohere. Each adapter provides a default payload mapping and accepts a toPayload override when you need to customize the request shape.

Default mappings (high level):

  • OpenAI: messages[] (system/user/assistant), max_output_tokens
  • Anthropic: messages[] + optional system, max_tokens
  • Gemini: contents[] + generationConfig.maxOutputTokens
  • Bedrock: { modelId, body: JSON.stringify({ prompt, context, max_output_tokens }) }
  • Azure OpenAI: deployment + messages[], max_output_tokens
  • Cohere: message + chat_history[], max_output_tokens
import {
  withOpenAI,
  withAnthropic,
  withGemini,
  withBedrock,
  withAzureOpenAI,
  withCohere,
} from "token-budget-guard";

const budget = {
  maxTokens: 2000,
  prompt,
  context: ["Earlier assistant message", "Follow-up detail"],
  expectedOutputTokens: 200,
};
const systemPrompt = "You are a helpful assistant.";

// OpenAI (messages[] + max_output_tokens)
const openai = await withOpenAI({
  ...budget,
  model: "gpt-4o-mini",
  systemPrompt,
  call: async (payload) => openaiClient(payload),
});

// Anthropic (messages[] + system + max_tokens)
const anthropic = await withAnthropic({
  ...budget,
  model: "claude-3-5-sonnet-latest",
  systemPrompt,
  call: async (payload) => anthropicClient(payload),
});

// Gemini (contents[] + generationConfig.maxOutputTokens)
const gemini = await withGemini({
  ...budget,
  model: "gemini-1.5-flash",
  call: async (payload) => geminiClient(payload),
});

// Bedrock (modelId + JSON body)
const bedrock = await withBedrock({
  ...budget,
  model: "amazon.titan-text-lite-v1",
  call: async (payload) => bedrockClient(payload),
});

// Azure OpenAI (deployment + messages[] + max_output_tokens)
const azure = await withAzureOpenAI({
  ...budget,
  model: "gpt-4o-mini",
  deployment: "my-deployment",
  systemPrompt,
  call: async (payload) => azureClient(payload),
});

// Cohere (message + chat_history[] + max_output_tokens)
const cohere = await withCohere({
  ...budget,
  model: "command-r",
  call: async (payload) => cohereClient(payload),
});

console.log(
  openai.usage,
  anthropic.usage,
  gemini.usage,
  bedrock.usage,
  azure.usage,
  cohere.usage
);

Limitations

  • Heuristic estimation (~4 chars/token) can diverge from model tokenizers, especially for non-English text or code/JSON.
  • Only context is trimmed; prompt and expected output tokens are never reduced automatically.
  • If a custom tokenizer throws or returns an invalid value, the heuristic fallback is used.
  • Cost estimates rely on defaults or overrides and may become stale; verify prices for your provider/region.
  • Bedrock and Azure pricing vary by region and model; defaults may be incomplete.

Keywords

  • ai
  • llm
  • openai
  • anthropic
  • token-budget
  • token-estimation
  • ai-cost-control
  • ai-observability
  • developer-tools
  • nodejs

Support

If this library helps you:

⭐ Star the repository
☕ Buy me a coffee: https://buymeacoffee.com/mostafahanafy

License

MIT