llm-cost-guard
v1.2.3
Published
Drop-in spend tracking, rolling budgets, and kill switches for LLM API calls.
Readme
llm-cost-guard
Hard budget limits and kill switches for LLM API calls. No more surprise bills.
Drop-in spend tracking, rolling-window budgets, and automatic kill switches for any LLM provider. Wraps OpenAI, Anthropic, and Gemini SDKs transparently — or track usage manually. Zero runtime dependencies.
Install
npm install llm-cost-guardQuick Start
Track a single call
import { createGuard } from "llm-cost-guard";
const guard = createGuard({
budgets: [{ id: "global-hourly", limitUsd: 50, windowMs: 60 * 60 * 1000 }],
});
const result = await guard.track({
model: "gpt-5",
inputTokens: 1200,
outputTokens: 800,
userId: "u_123",
feature: "chat",
});
console.log(`Cost: $${result.event.costUsd.toFixed(6)}`);Wrap an OpenAI client (auto-tracking)
import OpenAI from "openai";
import { createGuard } from "llm-cost-guard";
const guard = createGuard({
budgets: [
{ id: "global", limitUsd: 100, windowMs: 3_600_000 },
{ id: "per-user", limitUsd: 10, windowMs: 86_400_000, scopeBy: "user" },
],
});
const openai = guard.wrap(new OpenAI(), { userId: "u_42", feature: "assistant" });
// Every call is tracked automatically — budget enforced on every response
await openai.responses.create({ model: "gpt-5", input: "Summarize this transcript" });Listen for alerts and kills
guard.onBudgetAlert((alert) => {
console.warn(`⚠️ [${alert.thresholdPercent}%] ${alert.scopeKey}: $${alert.usageUsd.toFixed(4)} / $${alert.limitUsd}`);
});
guard.onKill((event) => {
console.error(`🛑 Kill switch: ${event.scopeKey} blew past $${event.limitUsd}`);
});Query usage summaries
const usage = await guard.getUsage({ windowMs: 3_600_000 });
console.log(`Last hour: $${usage.totalSpendUsd.toFixed(4)} across ${usage.totalCalls} calls`);
console.log("By model:", usage.byModel);
console.log("By user:", usage.byUser);Express middleware precheck
import express from "express";
import { createGuard, createExpressMiddleware } from "llm-cost-guard";
const guard = createGuard({
budgets: [{ id: "api", limitUsd: 500, windowMs: 86_400_000 }],
});
const app = express();
app.use(
createExpressMiddleware(guard, {
precheck: { enabled: true, maxSpendUsd: 500, windowMs: 86_400_000 },
overBudgetStatusCode: 429,
overBudgetMessage: "Daily LLM budget exceeded",
})
);Budget Rules
Budget rules are the core of llm-cost-guard. Each rule defines a spending limit over a rolling time window.
const guard = createGuard({
budgets: [
// $100/hour global limit
{ id: "global", limitUsd: 100, windowMs: 3_600_000 },
// $10/day per user
{ id: "user-daily", limitUsd: 10, windowMs: 86_400_000, scopeBy: "user" },
// $25/hour per feature
{ id: "feature-hourly", limitUsd: 25, windowMs: 3_600_000, scopeBy: "feature" },
// $5/day per user+feature combo
{ id: "user-feature", limitUsd: 5, windowMs: 86_400_000, scopeBy: "user_feature" },
// Scope to a specific model
{ id: "expensive-model", limitUsd: 50, windowMs: 3_600_000, model: "gpt-5.2-pro" },
// Alert only (no kill switch)
{ id: "soft-limit", limitUsd: 200, windowMs: 86_400_000, killSwitch: false },
],
});| Field | Type | Default | Description |
|---|---|---|---|
| id | string | auto | Rule identifier (used in alert scope keys) |
| limitUsd | number | required | Spending limit in USD |
| windowMs | number | required | Rolling time window in milliseconds |
| scopeBy | "global" \| "user" \| "feature" \| "user_feature" | "global" | How to partition spend |
| model | string | — | Restrict rule to a specific model |
| userId | string | — | Restrict rule to a specific user |
| feature | string | — | Restrict rule to a specific feature |
| killSwitch | boolean | true | Throw BudgetExceededError when limit is breached |
SDK Wrapping
guard.wrap() creates a transparent proxy around any LLM SDK client. It intercepts responses, extracts token usage, and tracks spend automatically.
// OpenAI
const openai = guard.wrap(new OpenAI(), { userId: "u_1" });
// Anthropic
const anthropic = guard.wrap(new Anthropic(), { feature: "summarizer" });
// Any client that returns { usage: { input_tokens, output_tokens } }
const client = guard.wrap(myClient, {
metadataExtractor: (args) => ({
userId: args[0]?.metadata?.userId,
feature: "custom",
}),
});Supported usage response formats:
usage.prompt_tokens/usage.completion_tokens(OpenAI)usage.input_tokens/usage.output_tokens(Anthropic)usageMetadata.promptTokenCount/usageMetadata.candidatesTokenCount(Gemini)
Alerts & Kill Switch
Threshold alerts fire at 80%, 90%, and 100% of each budget rule. Alerts only fire once per threshold per scope (they reset when usage drops below 80%).
// Subscribe to alerts
const unsubscribe = guard.onBudgetAlert((alert) => {
// alert.thresholdPercent → 80 | 90 | 100
// alert.usageUsd → current spend
// alert.limitUsd → budget limit
// alert.scopeKey → e.g. "global|global" or "per-user|user:u_42"
slack.send(`Budget alert: ${alert.scopeKey} at ${alert.thresholdPercent}%`);
});
// Subscribe to kills
guard.onKill((event) => {
pagerduty.trigger(`LLM budget exceeded: ${event.scopeKey}`);
});
// Unsubscribe when done
unsubscribe();When a budget is exceeded and killSwitch is enabled (default), guard.track() throws a BudgetExceededError:
import { BudgetExceededError } from "llm-cost-guard";
try {
await guard.track({ model: "gpt-5", inputTokens: 50000, outputTokens: 20000 });
} catch (err) {
if (err instanceof BudgetExceededError) {
console.error(err.event.scopeKey, "→", err.message);
}
}Set throwOnKill: false in config to suppress throws and handle kills via callbacks only.
Built-in Pricing
Includes pricing for 40+ models across OpenAI, Anthropic, Google, DeepSeek, and MiniMax. Override or extend:
import { BUILT_IN_PRICING, calculateCostUsd } from "llm-cost-guard";
// Check a model's pricing
console.log(BUILT_IN_PRICING["gpt-5"]);
// → { inputPerMillionUsd: 1.25, outputPerMillionUsd: 10 }
// Calculate cost manually
const cost = calculateCostUsd("claude-sonnet-4-6", 5000, 2000);
console.log(`$${cost.toFixed(6)}`);
// Add custom model pricing
const guard = createGuard({
budgets: [{ limitUsd: 100, windowMs: 3_600_000 }],
pricing: {
"my-fine-tune": { inputPerMillionUsd: 6, outputPerMillionUsd: 12 },
},
});Unknown models throw UnknownModelPricingError by default. Set onUnknownModel: "zero" to treat them as free.
HTTP Middleware
Pre-check guards for Express and Fastify that reject requests before they hit your LLM code.
Express
import { createGuard, createExpressMiddleware } from "llm-cost-guard";
const guard = createGuard({
budgets: [{ id: "api", limitUsd: 500, windowMs: 86_400_000 }],
});
app.use(
createExpressMiddleware(guard, {
precheck: { enabled: true, maxSpendUsd: 500, windowMs: 86_400_000 },
userIdResolver: (req) => req.headers["x-user-id"],
featureResolver: (req) => req.headers["x-feature"],
overBudgetStatusCode: 429,
overBudgetMessage: "Budget exceeded",
})
);Fastify
import { createGuard, createFastifyPreHandler } from "llm-cost-guard";
const guard = createGuard({
budgets: [{ id: "api", limitUsd: 500, windowMs: 86_400_000 }],
});
fastify.addHook(
"preHandler",
createFastifyPreHandler(guard, {
precheck: { enabled: true, maxSpendUsd: 500, windowMs: 86_400_000 },
})
);Custom Storage
The default MemoryStorageAdapter is process-local with binary-search optimized time-window queries. For distributed deployments, implement the StorageAdapter interface:
import { StorageAdapter, UsageEvent, StorageQuery } from "llm-cost-guard";
class RedisStorageAdapter implements StorageAdapter {
async append(event: UsageEvent): Promise<void> {
await redis.zadd("llm:usage", event.createdAt, JSON.stringify(event));
}
async list(filter?: StorageQuery): Promise<UsageEvent[]> {
const min = filter?.since ?? 0;
const max = filter?.until ?? "+inf";
const raw = await redis.zrangebyscore("llm:usage", min, max);
return raw.map(JSON.parse).filter((e) => matchesFilter(e, filter));
}
async reset(): Promise<void> {
await redis.del("llm:usage");
}
}
const guard = createGuard({
budgets: [{ limitUsd: 100, windowMs: 3_600_000 }],
storage: new RedisStorageAdapter(),
});Input Validation
guard.track() validates all inputs. Invalid calls throw InvalidTrackInputError:
import { InvalidTrackInputError } from "llm-cost-guard";
try {
await guard.track({ model: "", inputTokens: -1, outputTokens: NaN });
} catch (err) {
if (err instanceof InvalidTrackInputError) {
console.error(err.message);
// "model must be a non-empty string"
}
}CLI
npx llm-cost-guard statusPrints built-in model pricing. Useful for verifying what's included.
Configuration Reference
| Option | Type | Default | Description |
|---|---|---|---|
| budgets | BudgetRule[] | required | Budget rules to enforce |
| pricing | PricingCatalog | built-in catalog | Override or extend model pricing |
| storage | StorageAdapter | MemoryStorageAdapter | Pluggable usage event storage |
| throwOnKill | boolean | true | Throw BudgetExceededError on budget breach |
| onUnknownModel | "error" \| "zero" | "error" | Behavior for models not in pricing catalog |
| now | () => number | Date.now | Injectable clock (useful for testing) |
Exports
// Main
import { createGuard, BudgetExceededError, InvalidTrackInputError, UnknownModelPricingError } from "llm-cost-guard";
// Middleware
import { createExpressMiddleware, createFastifyPreHandler } from "llm-cost-guard/middleware";
// Pricing utilities
import { BUILT_IN_PRICING, calculateCostUsd, getModelPricing } from "llm-cost-guard/pricing";
// Storage
import { MemoryStorageAdapter } from "llm-cost-guard/storage";Built with teeth. 🌑
