agent-cost-guard

v1.0.0

Published

2 months ago

Spending limits for AI agents. 3 lines of code. Zero dependencies.

0High
0Medium
0Low

udhaykumarbala

openai llm budget cost agent spending-limit guard token-tracking

agent-cost-guard

Spending limits for AI agents. 3 lines of code. Zero dependencies.

The Problem

A LangChain agent loop cost $47K in November 2025 when four agents ping-ponged for 11 days before anyone noticed. A separate retry storm cost $47K in February 2026 after 2.3M API calls ran unchecked over a weekend. OpenAI removed dashboard budget caps -- client-side enforcement is the only option left.

The Solution

import { guard } from "agent-cost-guard";
const openai = guard(new OpenAI(), { budget: "$5.00" });
// All calls now tracked and enforced

How It Works

Wraps your OpenAI SDK client via method-level interception (no HTTP proxies, no monkey-patching globals)
Reads token usage from API responses -- ground truth, not estimation
Calculates cost with cached token discounts and reasoning token awareness
Enforces hard limits -- throws BudgetExceededError when budget is exceeded

What's Supported

Chat Completions API (streaming + non-streaming)
Responses API (non-streaming)
Embeddings API
Cached token cost calculation (50-75% discount depending on model)
Reasoning token tracking (o3, o4-mini)

Installation

npm install agent-cost-guard

Requires openai SDK v4+ as a peer dependency. Node.js >= 18.

API Reference

`guard(client, options)`

Wraps an OpenAI client with budget tracking and enforcement.

import OpenAI from "openai";
import { guard } from "agent-cost-guard";

const openai = guard(new OpenAI(), {
  budget: "$5.00",          // Required. String ("$5.00") or number (5)
  warn: 0.8,                // Optional. Warning threshold as fraction (0-1). Default: 0.8
  onWarn: (info) => {       // Optional. Callback when warn threshold is reached
    console.log(`Warning: ${info.percent}% of budget used`);
  },
  onExceeded: "throw",      // Optional. "throw" (default) or "warn-only"
  pricing: {                 // Optional. Custom model pricing overrides (per 1M tokens)
    "my-fine-tune": { input: 3.00, output: 6.00, cachedInput: 1.50 },
  },
});

| Option | Type | Default | Description | |--------|------|---------|-------------| | budget | string \| number | required | Budget in dollars. Accepts "$5.00" or 5. | | warn | number | 0.8 | Fraction (0-1) of budget at which onWarn fires. | | onWarn | (info: WarnInfo) => void | undefined | Called once when spending crosses the warn threshold. | | onExceeded | "throw" \| "warn-only" | "throw" | Whether to throw or just warn when budget is exceeded. | | pricing | Record<string, ModelPrice> | undefined | Custom pricing per model (per 1M tokens). |

`client.$budget`

After wrapping, the client exposes a $budget property with live budget status.

console.log(openai.$budget.spent);      // 0.0023 (dollars spent so far)
console.log(openai.$budget.remaining);  // 4.9977 (dollars remaining)
console.log(openai.$budget.calls);      // 1 (number of API calls tracked)
console.log(openai.$budget.budget);     // 5 (total budget in dollars)

openai.$budget.reset();                 // Reset counters to zero

| Property | Type | Description | |----------|------|-------------| | spent | number | Total dollars spent so far. | | remaining | number | Remaining budget in dollars. | | calls | number | Number of API calls tracked. | | budget | number | Total budget in dollars. | | reset() | () => void | Reset spent amount and call count to zero. |

`BudgetExceededError`

Thrown when the budget is exceeded (unless onExceeded is "warn-only").

import { BudgetExceededError } from "agent-cost-guard";

try {
  await openai.chat.completions.create({ model: "gpt-4o", messages });
} catch (err) {
  if (err instanceof BudgetExceededError) {
    console.log(err.spent);    // Total dollars spent
    console.log(err.budget);   // Budget limit in dollars
    console.log(err.lastCall); // { model, inputTokens, outputTokens, cachedTokens, reasoningTokens, cost }
  }
}

| Property | Type | Description | |----------|------|-------------| | spent | number | Total dollars spent when the error was thrown. | | budget | number | The budget limit in dollars. | | lastCall | LastCallInfo \| undefined | Details about the call that pushed spending over budget. |

With Streaming

stream_options: { include_usage: true } is injected automatically so you don't have to remember it.

const stream = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "hello" }],
  stream: true,
});
// stream_options: { include_usage: true } is injected automatically
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
// Cost tracked after stream completes

With Responses API

const response = await openai.responses.create({
  model: "gpt-4.1",
  input: "Explain quantum computing",
});

With Embeddings

const embeddings = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: "hello world",
});

Supported Models

Built-in pricing for 11 OpenAI models (per 1M tokens):

| Model | Input | Output | Cached Input | |-------|------:|-------:|-------------:| | gpt-4o | $2.50 | $10.00 | $1.25 | | gpt-4o-mini | $0.15 | $0.60 | $0.075 | | gpt-4.1 | $2.00 | $8.00 | $0.50 | | gpt-4.1-mini | $0.40 | $1.60 | $0.10 | | gpt-4.1-nano | $0.10 | $0.40 | $0.025 | | gpt-5 | $1.25 | $10.00 | $0.125 | | gpt-5-mini | $0.25 | $2.00 | $0.025 | | o3 | $2.00 | $8.00 | $0.50 | | o3-mini | $1.10 | $4.40 | $0.55 | | o4-mini | $1.10 | $4.40 | $0.275 | | o1 | $15.00 | $60.00 | $7.50 |

Model names are matched by longest prefix, so gpt-4o-2024-11-20 automatically resolves to gpt-4o pricing.

Custom Pricing

Add pricing for any model -- fine-tunes, embedding models, or new releases:

const openai = guard(new OpenAI(), {
  budget: "$10.00",
  pricing: {
    "text-embedding-3-small": { input: 0.02, output: 0, cachedInput: 0 },
    "ft:gpt-4o-mini:my-org": { input: 0.30, output: 1.20, cachedInput: 0.15 },
  },
});

Custom pricing takes precedence over built-in pricing. For models not found in either, cost resolves to $0 (a warning, not a silent failure -- check $budget.spent to verify tracking).

FAQ

How does this compare to Langfuse/Helicone? They are observability platforms -- dashboards, traces, analytics. agent-cost-guard is a one-line enforcement layer that stops runaway spending in real time. They are complementary: use both.

How does this compare to LiteLLM Proxy? LiteLLM requires deploying and maintaining a gateway server. agent-cost-guard is npm install + 3 lines of code. No infrastructure.

How does this compare to agent-guard? agent-guard depends on tiktoken and uses HTTP interception. agent-cost-guard is zero-dependency and uses method-level wrapping -- lighter, simpler, and no token estimation drift.

What about provider dashboard limits? OpenAI's dashboard limits are org-level monthly caps. agent-cost-guard provides per-session, per-agent, real-time enforcement you can configure in code.

Roadmap

Anthropic SDK support
Gemini SDK support
MCP server for budget enforcement
Persistent cross-session budgets
Framework integrations (Vercel AI SDK, LangChain.js)

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

agent-cost-guard

The Problem

The Solution

How It Works

What's Supported

Installation

API Reference

guard(client, options)

client.$budget

BudgetExceededError

With Streaming

With Responses API

With Embeddings

Supported Models

Custom Pricing

FAQ

Roadmap

License

`guard(client, options)`

`client.$budget`

`BudgetExceededError`