agensor

v0.1.2

Published

4 months ago

Complete end-user monetization stack for agent products. Credit enforcement, A2A payments, agent ratings.

0High
0Medium
0Low

agensor

ai agents billing metering credits llm anthropic openai a2a

agensor

Stop losing money on power users. Add real-time credit enforcement to your AI agent product in 30 minutes.

Agensor is an open-source SDK that lets you charge your users for what they actually consume — LLM tokens, tool calls, and agent runs — enforced in real time, mid-execution, with no billing infrastructure to build yourself.

Why this exists

You're building an AI agent product. Your costs scale with usage — every extra LLM call, every tool execution costs you money. But you're charging flat subscriptions.

One power user can wipe out a month of margin. Once you've seen that Anthropic bill you understand the problem.

Existing options are all painful:

Build it yourself — 3–4 weeks minimum, then maintain it forever
Stripe Metered Billing — not designed for agent steps, no mid-run enforcement
OpenMeter / Lago — great infrastructure, but you still build all the middleware

Agensor is the opinionated, agent-native version of all that plumbing.

5-minute quickstart

Early access: Agensor is in private beta. Join the waitlist to get your API key. npm publish coming on early access launch.

1. Install

npm install agensor

2. Create a meter

import { createMeter } from 'agensor'

const meter = createMeter({
  apiKey:  process.env.AGENSOR_API_KEY!,   // sk_... from app.agensor.dev
  baseUrl: 'https://api.agensor.dev',
  // baseUrl: 'mock'  ← swap this in for local dev / tests (no real server needed)
})

3. Wrap your LLM client

import Anthropic from '@anthropic-ai/sdk'

const client = meter.wrapAnthropic(new Anthropic(), {
  getUserId: () => req.user.id,
})

// Use exactly as you would the raw SDK — billing is automatic
const response = await client.messages.create({
  model: 'claude-haiku-4-5',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Summarise this document...' }],
})

4. Handle budget exhaustion

import { BudgetExhaustedError } from 'agensor'

try {
  const result = await myAgent.run(input)
} catch (e) {
  if (e instanceof BudgetExhaustedError) {
    return res.status(402).json({ error: 'out_of_credits' })
  }
  throw e
}

That's it. Every LLM call made through the wrapped client now checks and debits the user's credit balance in real time. If a user runs out mid-run, the agent stops immediately.

Compatibility

| Agensor | Anthropic SDK | OpenAI SDK | |---------|--------------|------------| | 0.1.x | ≥ 0.24.0 | ≥ 4.0.0 |

Agensor wraps client.messages.create() (Anthropic) and client.chat.completions.create() (OpenAI). If either SDK ships a breaking rename of these methods, billing will fail loudly with a TypeError (not silently). Pin your SDK versions in production.

How credits work

Credits are a virtual currency your users hold in their wallet.

| Concept | Detail | |---------|--------| | 1 credit | = $0.001 (configurable per account) | | Top-up | User pays via Stripe → credits added to their wallet | | Spend | SDK debits credits per LLM call and tool call, automatically | | Enforcement | Balance checked in-memory before each call — zero added latency | | Zero balance | BudgetExhaustedError thrown → agent stops | | Sync | Pending debits flushed to the server every 30 seconds in the background |

You define credit prices for your own tools. The SDK handles LLM token costs automatically using a built-in price catalogue (Anthropic and OpenAI models, updated per release).

Example economics:

User buys $10 → 10,000 credits land in their wallet
Your agent makes 5 Claude Haiku calls (1k tokens each) → ~50 credits debited
User sees remaining balance in your UI via the Agensor user portal widget

Code examples

Wrap Anthropic

import { createMeter, BudgetExhaustedError } from 'agensor'
import Anthropic from '@anthropic-ai/sdk'

const meter = createMeter({ apiKey: process.env.AGENSOR_API_KEY! })

const client = meter.wrapAnthropic(new Anthropic(), {
  getUserId: () => req.user.id,
  onBudgetExhausted: 'throw',
})

// Streaming is fully supported — reservation made before stream starts
const stream = await client.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 2048,
  stream: true,
  messages: [{ role: 'user', content: 'Write a full report on...' }],
})
// Max-token reservation is debited upfront; surplus is refunded when stream finishes

Wrap OpenAI

import OpenAI from 'openai'

const client = meter.wrapOpenAI(new OpenAI(), {
  getUserId: () => req.user.id,
})

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Help me with...' }],
})

Meter tool calls

const tools = meter.wrapTools(
  {
    web_search:     { fn: searchFn,      pricing: 10 },            // 10 cr per call
    generate_image: { fn: imageFn,       pricing: 50 },            // 50 cr per call
    execute_code:   { fn: executeFn,     pricing: { credits: 2, per: 'perSecond' } },
    process_file:   { fn: processFileFn, pricing: { credits: 1, per: 'perByte'   } },
  },
  { getUserId: () => req.user.id },
)

// Pass directly to your agent framework — budgets are enforced automatically
const agent = new Agent({ tools, llm: client })

perCall cost is checked before execution — the tool never runs if the user can't afford it. perSecond and perByte costs are measured after execution, since the cost isn't known upfront.

Named runs with budget reservation

Use runs when you want a hard cap on what a single agent execution can spend:

const run = await meter.startRun(req.user.id, {
  maxCredits: 500,        // hard ceiling — agent stops if hit
  reserveCredits: 500,    // escrowed from wallet at run start
  expiresAfter: '30m',    // reservation auto-returns if agent hangs
  metadata: {
    task: 'research',
    query: req.body.query,
  },
})

try {
  const result = await myAgent.execute(input, { billingRun: run })
  const receipt = await run.commit()   // settle actual spend, return unused
  console.log(`Used ${receipt.creditsUsed} of ${run.maxCredits} credits`)
} catch (e) {
  await run.cancel()   // full refund of reservation
  throw e
}

The run cap is enforced in-process — once spentCredits >= maxCredits, BudgetExhaustedError is thrown without any server round-trip. The global wallet is also checked on each call: a user cannot spend more than min(run.maxCredits, globalBalance).

Configuration

const meter = createMeter({
  apiKey: 'sk_...',                     // required — from app.agensor.dev dashboard
  baseUrl: 'https://api.agensor.dev',   // optional — default shown
  // baseUrl: 'mock'                    // dev/test mode — no real server needed
  syncIntervalMs: 30_000,               // optional — how often to flush debits (ms)
})

Mock mode (baseUrl: 'mock') returns canned balances and accepts syncs without a real server. Use it in tests and local development.

Cleanup

Call meter.destroy() when you're done — in tests, or on process shutdown — to clear the background sync interval:

afterEach(() => meter.destroy())

How the balance sync works

To keep the enforcement path at zero latency, Agensor uses an optimistic in-memory balance:

On first call for a user, balance is fetched from the server and cached
Every LLM/tool debit is applied locally — no network call on the hot path
A background loop (default: every 30s) flushes pending debits to the server
The server returns the corrected balance, which updates the local cache
If the server is unreachable, the SDK continues using the local estimate and logs a warning — it never blocks your product

This means there can be a brief overshoot window if a user's balance hits zero at exactly the wrong moment. This is an intentional trade-off: blocking every LLM call on a server round-trip adds 50–200ms to every agent step. Use startRun() with reserveCredits for tighter per-session control.

Horizontal scaling

The balance cache is per-process. Each Node.js instance maintains its own in-memory credit counter and syncs independently to the server.

This works correctly for:

Single-server deployments
One SDK instance per user session (e.g. serverless functions)

This creates unbounded overspend if:

Multiple processes share the same userId simultaneously (load-balanced servers, multiple workers)
A user has two concurrent sessions hitting different instances

v0.1 constraint: for horizontally scaled deployments, ensure requests from the same userId are routed to the same instance (sticky sessions), or use startRun() which reserves credits atomically at the server level before the run begins.

Layer 2 (coming): reserveBalance() will enforce budgets server-side per-request with no per-process state — making multi-instance deploys safe without sticky sessions.

Error reference

| Error | When it's thrown | |-------|-----------------| | BudgetExhaustedError | User's credit balance or run cap is insufficient for the next call | | AgensorError | Server returned an unexpected error (ledger, Stripe, etc.) |

BudgetExhaustedError carries .userId, .available, and .required — use these to show a meaningful message to your user.

Roadmap

Layer 1 — Human → Agent billing ✅ live now

| What | Status | |------|--------| | Anthropic + OpenAI wrappers | ✅ | | Tool metering (perCall / perSecond / perByte) | ✅ | | Run handles with hard credit caps | ✅ | | Managed credit ledger + developer dashboard | ✅ | | Embeddable user portal widget | ✅ | | Mock mode for local dev + tests | ✅ |

Layer 2 — Scale hardening (next)

Server-side atomic balance enforcement — eliminates the per-process overshoot window, making multi-instance deploys safe without sticky sessions
npm publish on early access launch

Layer 3 — Agent-to-agent payments (future)

Agents paying agents for subtasks. A2A micropayments with call tree attribution and automatic settlement.

Layer 4 — Agent reputation (emerging)

Trust scores and discovery registry emerging from transaction history.

Technical detail

See SPEC.md for the full technical specification: credit enforcement architecture, streaming reservation design, RunHandle concurrency model, A2A payment design, and key decision log.

License

Apache 2.0 — free to use, self-host, and modify.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

agensor

Why this exists

5-minute quickstart

1. Install

2. Create a meter

3. Wrap your LLM client

4. Handle budget exhaustion

Compatibility

How credits work

Code examples

Wrap Anthropic

Wrap OpenAI

Meter tool calls

Named runs with budget reservation

Configuration

Cleanup

How the balance sync works

Horizontal scaling

Error reference

Roadmap

Layer 1 — Human → Agent billing ✅ live now

Layer 2 — Scale hardening (next)

Layer 3 — Agent-to-agent payments (future)

Layer 4 — Agent reputation (emerging)

Technical detail

License