agent-cost-guard
v1.0.0
Published
Spending limits for AI agents. 3 lines of code. Zero dependencies.
Maintainers
Readme
agent-cost-guard
Spending limits for AI agents. 3 lines of code. Zero dependencies.
The Problem
A LangChain agent loop cost $47K in November 2025 when four agents ping-ponged for 11 days before anyone noticed. A separate retry storm cost $47K in February 2026 after 2.3M API calls ran unchecked over a weekend. OpenAI removed dashboard budget caps -- client-side enforcement is the only option left.
The Solution
import { guard } from "agent-cost-guard";
const openai = guard(new OpenAI(), { budget: "$5.00" });
// All calls now tracked and enforcedHow It Works
- Wraps your OpenAI SDK client via method-level interception (no HTTP proxies, no monkey-patching globals)
- Reads token usage from API responses -- ground truth, not estimation
- Calculates cost with cached token discounts and reasoning token awareness
- Enforces hard limits -- throws
BudgetExceededErrorwhen budget is exceeded
What's Supported
- Chat Completions API (streaming + non-streaming)
- Responses API (non-streaming)
- Embeddings API
- Cached token cost calculation (50-75% discount depending on model)
- Reasoning token tracking (o3, o4-mini)
Installation
npm install agent-cost-guardRequires openai SDK v4+ as a peer dependency. Node.js >= 18.
API Reference
guard(client, options)
Wraps an OpenAI client with budget tracking and enforcement.
import OpenAI from "openai";
import { guard } from "agent-cost-guard";
const openai = guard(new OpenAI(), {
budget: "$5.00", // Required. String ("$5.00") or number (5)
warn: 0.8, // Optional. Warning threshold as fraction (0-1). Default: 0.8
onWarn: (info) => { // Optional. Callback when warn threshold is reached
console.log(`Warning: ${info.percent}% of budget used`);
},
onExceeded: "throw", // Optional. "throw" (default) or "warn-only"
pricing: { // Optional. Custom model pricing overrides (per 1M tokens)
"my-fine-tune": { input: 3.00, output: 6.00, cachedInput: 1.50 },
},
});| Option | Type | Default | Description |
|--------|------|---------|-------------|
| budget | string \| number | required | Budget in dollars. Accepts "$5.00" or 5. |
| warn | number | 0.8 | Fraction (0-1) of budget at which onWarn fires. |
| onWarn | (info: WarnInfo) => void | undefined | Called once when spending crosses the warn threshold. |
| onExceeded | "throw" \| "warn-only" | "throw" | Whether to throw or just warn when budget is exceeded. |
| pricing | Record<string, ModelPrice> | undefined | Custom pricing per model (per 1M tokens). |
client.$budget
After wrapping, the client exposes a $budget property with live budget status.
console.log(openai.$budget.spent); // 0.0023 (dollars spent so far)
console.log(openai.$budget.remaining); // 4.9977 (dollars remaining)
console.log(openai.$budget.calls); // 1 (number of API calls tracked)
console.log(openai.$budget.budget); // 5 (total budget in dollars)
openai.$budget.reset(); // Reset counters to zero| Property | Type | Description |
|----------|------|-------------|
| spent | number | Total dollars spent so far. |
| remaining | number | Remaining budget in dollars. |
| calls | number | Number of API calls tracked. |
| budget | number | Total budget in dollars. |
| reset() | () => void | Reset spent amount and call count to zero. |
BudgetExceededError
Thrown when the budget is exceeded (unless onExceeded is "warn-only").
import { BudgetExceededError } from "agent-cost-guard";
try {
await openai.chat.completions.create({ model: "gpt-4o", messages });
} catch (err) {
if (err instanceof BudgetExceededError) {
console.log(err.spent); // Total dollars spent
console.log(err.budget); // Budget limit in dollars
console.log(err.lastCall); // { model, inputTokens, outputTokens, cachedTokens, reasoningTokens, cost }
}
}| Property | Type | Description |
|----------|------|-------------|
| spent | number | Total dollars spent when the error was thrown. |
| budget | number | The budget limit in dollars. |
| lastCall | LastCallInfo \| undefined | Details about the call that pushed spending over budget. |
With Streaming
stream_options: { include_usage: true } is injected automatically so you don't have to remember it.
const stream = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "hello" }],
stream: true,
});
// stream_options: { include_usage: true } is injected automatically
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
// Cost tracked after stream completesWith Responses API
const response = await openai.responses.create({
model: "gpt-4.1",
input: "Explain quantum computing",
});With Embeddings
const embeddings = await openai.embeddings.create({
model: "text-embedding-3-small",
input: "hello world",
});Supported Models
Built-in pricing for 11 OpenAI models (per 1M tokens):
| Model | Input | Output | Cached Input |
|-------|------:|-------:|-------------:|
| gpt-4o | $2.50 | $10.00 | $1.25 |
| gpt-4o-mini | $0.15 | $0.60 | $0.075 |
| gpt-4.1 | $2.00 | $8.00 | $0.50 |
| gpt-4.1-mini | $0.40 | $1.60 | $0.10 |
| gpt-4.1-nano | $0.10 | $0.40 | $0.025 |
| gpt-5 | $1.25 | $10.00 | $0.125 |
| gpt-5-mini | $0.25 | $2.00 | $0.025 |
| o3 | $2.00 | $8.00 | $0.50 |
| o3-mini | $1.10 | $4.40 | $0.55 |
| o4-mini | $1.10 | $4.40 | $0.275 |
| o1 | $15.00 | $60.00 | $7.50 |
Model names are matched by longest prefix, so gpt-4o-2024-11-20 automatically resolves to gpt-4o pricing.
Custom Pricing
Add pricing for any model -- fine-tunes, embedding models, or new releases:
const openai = guard(new OpenAI(), {
budget: "$10.00",
pricing: {
"text-embedding-3-small": { input: 0.02, output: 0, cachedInput: 0 },
"ft:gpt-4o-mini:my-org": { input: 0.30, output: 1.20, cachedInput: 0.15 },
},
});Custom pricing takes precedence over built-in pricing. For models not found in either, cost resolves to $0 (a warning, not a silent failure -- check $budget.spent to verify tracking).
FAQ
How does this compare to Langfuse/Helicone? They are observability platforms -- dashboards, traces, analytics. agent-cost-guard is a one-line enforcement layer that stops runaway spending in real time. They are complementary: use both.
How does this compare to LiteLLM Proxy?
LiteLLM requires deploying and maintaining a gateway server. agent-cost-guard is npm install + 3 lines of code. No infrastructure.
How does this compare to agent-guard? agent-guard depends on tiktoken and uses HTTP interception. agent-cost-guard is zero-dependency and uses method-level wrapping -- lighter, simpler, and no token estimation drift.
What about provider dashboard limits? OpenAI's dashboard limits are org-level monthly caps. agent-cost-guard provides per-session, per-agent, real-time enforcement you can configure in code.
Roadmap
- Anthropic SDK support
- Gemini SDK support
- MCP server for budget enforcement
- Persistent cross-session budgets
- Framework integrations (Vercel AI SDK, LangChain.js)
