llm-token-budget

v1.0.0

Published

14 days ago

Token budget manager for LLM applications — track usage, enforce limits, estimate costs per user/session. Zero dependencies. Works with OpenAI, Anthropic, Groq, Ollama.

0High
0Medium
0Low

botfit

llm tokens budget openai anthropic groq ollama ai-agents token-counting cost-tracking rate-limiting usage quota ai-cost gpt claude

llm-token-budget

Token budget manager for LLM applications — track usage, enforce limits, estimate costs per user/session. Zero dependencies. Works with OpenAI, Anthropic, Groq, Ollama, and more.

Why?

When building LLM-powered applications, you quickly face these problems:

Cost overruns — one user sends 10,000 messages and your bill explodes
No visibility — which user/session is using the most tokens?
No enforcement — you want per-user free tiers and paid quotas
Multi-model complexity — different pricing for GPT-4o vs Claude vs Groq

llm-token-budget solves all four with a simple, zero-dependency API.

Install

npm install llm-token-budget

Quick Start

const { BudgetManager } = require('llm-token-budget');

// Create a manager with per-user limits
const budget = new BudgetManager({
  perUserTokenLimit: 100_000,   // 100K tokens per user
  perUserCostLimit: 1.00,       // $1.00 max spend per user
  defaultModel: 'gpt-4o-mini',
});

// Before calling your LLM — check if the user is within budget
const check = budget.check({
  estimatedInputTokens: 500,
  estimatedOutputTokens: 500,
  userKey: 'user-123',
  model: 'gpt-4o-mini',
});

if (!check.allowed) {
  console.log(`Blocked: ${check.reason}`);
  // → "user_token_limit_exceeded" or "user_cost_limit_exceeded"
} else {
  // Call your LLM here...
  const response = await openai.chat.completions.create({ ... });

  // After the call — record actual usage
  budget.record({
    inputTokens: response.usage.prompt_tokens,
    outputTokens: response.usage.completion_tokens,
    userKey: 'user-123',
    model: 'gpt-4o-mini',
  });
}

API Reference

`new BudgetManager(options?)`

| Option | Type | Description | |--------|------|-------------| | globalTokenLimit | number | Max total tokens across ALL users | | globalCostLimit | number | Max total cost (USD) across ALL users | | perUserTokenLimit | number | Max tokens per user key | | perUserCostLimit | number | Max cost (USD) per user key | | perSessionTokenLimit | number | Max tokens per session | | defaultModel | string | Default model for cost calculation | | customPricing | object | Custom pricing overrides { model: { input, output } } | | resetIntervalMs | number | Auto-reset all counters every N ms (e.g. 86400000 for daily) |

`.check(params)` → `{ allowed, reason?, wouldCost }`

Pre-flight check before making an LLM call.

const result = budget.check({
  estimatedInputTokens: 500,
  estimatedOutputTokens: 500,
  userKey: 'user-abc',       // optional
  sessionId: 'sess-xyz',     // optional
  model: 'claude-3-5-haiku', // optional, uses defaultModel if omitted
});

// result.allowed → true | false
// result.reason  → 'user_token_limit_exceeded' | 'user_cost_limit_exceeded' | ...
// result.wouldCost → estimated cost in USD

Limit reasons:

global_token_limit_exceeded
global_cost_limit_exceeded
user_token_limit_exceeded
user_cost_limit_exceeded
session_token_limit_exceeded

`.record(params)` → `{ cost, global }`

Record actual token usage after a successful LLM call.

budget.record({
  inputTokens: response.usage.prompt_tokens,
  outputTokens: response.usage.completion_tokens,
  userKey: 'user-abc',
  sessionId: 'sess-xyz',
  model: 'gpt-4o',
});

`.getUserStats(userKey)` → `UsageStats`

const stats = budget.getUserStats('user-abc');
// { inputTokens, outputTokens, totalCost, requests, blocked }

`.getGlobalStats()` → `GlobalStats`

const stats = budget.getGlobalStats();
// { inputTokens, outputTokens, totalTokens, totalCost, requests }

`.getAllUserStats()` → `Array<{ userKey, stats }>`

Returns all users sorted by cost (highest first) — useful for admin dashboards.

`.middleware(options?)` — Express/Fastify Middleware

Drop-in middleware for Express, Fastify, or any Connect-compatible framework.

const express = require('express');
const { BudgetManager } = require('llm-token-budget');

const app = express();
const budget = new BudgetManager({ perUserTokenLimit: 50_000 });

app.use('/api/chat', budget.middleware({
  getUserKey: (req) => req.user?.id || req.ip,
  getSessionId: (req) => req.headers['x-session-id'],
  getModel: (req) => req.body?.model || 'gpt-4o-mini',
  onBlocked: (req, res, reason) => {
    res.status(429).json({
      error: 'quota_exceeded',
      reason,
      message: 'You have exceeded your token quota. Upgrade to Pro for more.',
    });
  },
}));

When blocked, the default response is:

{
  "error": "budget_exceeded",
  "reason": "user_token_limit_exceeded",
  "message": "Token budget exceeded. Please try again later or upgrade your plan."
}

Utility Functions

`estimateTokens(text)` → `number`

Quick token estimate (no external deps). Accuracy: ±15%.
For production accuracy, pair with tiktoken or model-specific counters.

const { estimateTokens } = require('llm-token-budget');
estimateTokens('Hello, how are you today?'); // → ~7

`calculateCost(inputTokens, outputTokens, model)` → `CostResult`

const { calculateCost } = require('llm-token-budget');
calculateCost(10_000, 2_000, 'gpt-4o');
// → { inputCost: 0.025, outputCost: 0.02, totalCost: 0.045, currency: 'USD' }

Supported Models (Built-in Pricing)

| Provider | Models | |----------|--------| | OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo, o1, o1-mini, o3-mini | | Anthropic | claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus, claude-sonnet-4, claude-haiku-4 | | Groq | llama-3.3-70b, llama-3.1-8b, mixtral-8x7b, gemma2-9b | | Ollama | ollama (local, $0 cost) |

Custom models: pass customPricing option or calculateCost(i, o, 'my-model', { 'my-model': { input: 0.5, output: 1.5 } }).

Patterns

Daily Reset (Free Tier)

const budget = new BudgetManager({
  perUserTokenLimit: 10_000,          // 10K tokens/day free
  resetIntervalMs: 24 * 60 * 60 * 1000, // reset daily
});

Per-User Pricing Tiers

function getLimitForUser(user) {
  if (user.plan === 'pro') return { perUserTokenLimit: 1_000_000 };
  if (user.plan === 'free') return { perUserTokenLimit: 10_000 };
  return { perUserTokenLimit: 0 }; // blocked
}

// Create separate BudgetManager per tier, or check dynamically

Multi-tenant SaaS

const budget = new BudgetManager({
  globalCostLimit: 100.00,  // $100/month hard cap
  perUserCostLimit: 5.00,   // $5/user/month
  defaultModel: 'gpt-4o-mini',
});

// Admin dashboard
app.get('/admin/usage', (req, res) => {
  res.json({
    global: budget.getGlobalStats(),
    users: budget.getAllUserStats(),
  });
});

TypeScript

Full TypeScript support included (no @types/ package needed).

import { BudgetManager, BudgetManagerOptions, CheckResult } from 'llm-token-budget';

const options: BudgetManagerOptions = {
  perUserTokenLimit: 100_000,
  defaultModel: 'gpt-4o',
};

const budget = new BudgetManager(options);
const result: CheckResult = budget.check({ estimatedInputTokens: 500, userKey: 'user-1' });

Comparison

| Feature | llm-token-budget | Manual counters | Commercial solutions | |---------|-----------------|-----------------|---------------------| | Zero dependencies | ✅ | ✅ | ❌ | | Per-user limits | ✅ | Manual | ✅ | | Cost tracking | ✅ | Manual | ✅ | | Express middleware | ✅ | Manual | ✅ | | TypeScript support | ✅ | Varies | ✅ | | Open source | ✅ | ✅ | ❌ | | Free | ✅ | ✅ | ❌ |

Contributing

Issues and PRs welcome at github.com/mariusfit/llm-token-budget.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

llm-token-budget

Why?

Install

Quick Start

API Reference

new BudgetManager(options?)

.check(params) → { allowed, reason?, wouldCost }

.record(params) → { cost, global }

.getUserStats(userKey) → UsageStats

.getGlobalStats() → GlobalStats

.getAllUserStats() → Array<{ userKey, stats }>

.middleware(options?) — Express/Fastify Middleware

Utility Functions

estimateTokens(text) → number

calculateCost(inputTokens, outputTokens, model) → CostResult

Supported Models (Built-in Pricing)

Patterns

Daily Reset (Free Tier)

Per-User Pricing Tiers

Multi-tenant SaaS

TypeScript

Comparison

Contributing

License

`new BudgetManager(options?)`

`.check(params)` → `{ allowed, reason?, wouldCost }`

`.record(params)` → `{ cost, global }`

`.getUserStats(userKey)` → `UsageStats`

`.getGlobalStats()` → `GlobalStats`

`.getAllUserStats()` → `Array<{ userKey, stats }>`

`.middleware(options?)` — Express/Fastify Middleware

`estimateTokens(text)` → `number`

`calculateCost(inputTokens, outputTokens, model)` → `CostResult`