agensor
v0.1.2
Published
Complete end-user monetization stack for agent products. Credit enforcement, A2A payments, agent ratings.
Maintainers
Readme
agensor
Stop losing money on power users. Add real-time credit enforcement to your AI agent product in 30 minutes.
Agensor is an open-source SDK that lets you charge your users for what they actually consume — LLM tokens, tool calls, and agent runs — enforced in real time, mid-execution, with no billing infrastructure to build yourself.
Why this exists
You're building an AI agent product. Your costs scale with usage — every extra LLM call, every tool execution costs you money. But you're charging flat subscriptions.
One power user can wipe out a month of margin. Once you've seen that Anthropic bill you understand the problem.
Existing options are all painful:
- Build it yourself — 3–4 weeks minimum, then maintain it forever
- Stripe Metered Billing — not designed for agent steps, no mid-run enforcement
- OpenMeter / Lago — great infrastructure, but you still build all the middleware
Agensor is the opinionated, agent-native version of all that plumbing.
5-minute quickstart
Early access: Agensor is in private beta. Join the waitlist to get your API key. npm publish coming on early access launch.
1. Install
npm install agensor2. Create a meter
import { createMeter } from 'agensor'
const meter = createMeter({
apiKey: process.env.AGENSOR_API_KEY!, // sk_... from app.agensor.dev
baseUrl: 'https://api.agensor.dev',
// baseUrl: 'mock' ← swap this in for local dev / tests (no real server needed)
})3. Wrap your LLM client
import Anthropic from '@anthropic-ai/sdk'
const client = meter.wrapAnthropic(new Anthropic(), {
getUserId: () => req.user.id,
})
// Use exactly as you would the raw SDK — billing is automatic
const response = await client.messages.create({
model: 'claude-haiku-4-5',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Summarise this document...' }],
})4. Handle budget exhaustion
import { BudgetExhaustedError } from 'agensor'
try {
const result = await myAgent.run(input)
} catch (e) {
if (e instanceof BudgetExhaustedError) {
return res.status(402).json({ error: 'out_of_credits' })
}
throw e
}That's it. Every LLM call made through the wrapped client now checks and debits the user's credit balance in real time. If a user runs out mid-run, the agent stops immediately.
Compatibility
| Agensor | Anthropic SDK | OpenAI SDK | |---------|--------------|------------| | 0.1.x | ≥ 0.24.0 | ≥ 4.0.0 |
Agensor wraps client.messages.create() (Anthropic) and
client.chat.completions.create() (OpenAI). If either SDK ships a breaking
rename of these methods, billing will fail loudly with a TypeError (not silently).
Pin your SDK versions in production.
How credits work
Credits are a virtual currency your users hold in their wallet.
| Concept | Detail |
|---------|--------|
| 1 credit | = $0.001 (configurable per account) |
| Top-up | User pays via Stripe → credits added to their wallet |
| Spend | SDK debits credits per LLM call and tool call, automatically |
| Enforcement | Balance checked in-memory before each call — zero added latency |
| Zero balance | BudgetExhaustedError thrown → agent stops |
| Sync | Pending debits flushed to the server every 30 seconds in the background |
You define credit prices for your own tools. The SDK handles LLM token costs automatically using a built-in price catalogue (Anthropic and OpenAI models, updated per release).
Example economics:
User buys $10 → 10,000 credits land in their wallet
Your agent makes 5 Claude Haiku calls (1k tokens each) → ~50 credits debited
User sees remaining balance in your UI via the Agensor user portal widgetCode examples
Wrap Anthropic
import { createMeter, BudgetExhaustedError } from 'agensor'
import Anthropic from '@anthropic-ai/sdk'
const meter = createMeter({ apiKey: process.env.AGENSOR_API_KEY! })
const client = meter.wrapAnthropic(new Anthropic(), {
getUserId: () => req.user.id,
onBudgetExhausted: 'throw',
})
// Streaming is fully supported — reservation made before stream starts
const stream = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 2048,
stream: true,
messages: [{ role: 'user', content: 'Write a full report on...' }],
})
// Max-token reservation is debited upfront; surplus is refunded when stream finishesWrap OpenAI
import OpenAI from 'openai'
const client = meter.wrapOpenAI(new OpenAI(), {
getUserId: () => req.user.id,
})
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Help me with...' }],
})Meter tool calls
const tools = meter.wrapTools(
{
web_search: { fn: searchFn, pricing: 10 }, // 10 cr per call
generate_image: { fn: imageFn, pricing: 50 }, // 50 cr per call
execute_code: { fn: executeFn, pricing: { credits: 2, per: 'perSecond' } },
process_file: { fn: processFileFn, pricing: { credits: 1, per: 'perByte' } },
},
{ getUserId: () => req.user.id },
)
// Pass directly to your agent framework — budgets are enforced automatically
const agent = new Agent({ tools, llm: client })perCall cost is checked before execution — the tool never runs if the user can't afford it.
perSecond and perByte costs are measured after execution, since the cost isn't known upfront.
Named runs with budget reservation
Use runs when you want a hard cap on what a single agent execution can spend:
const run = await meter.startRun(req.user.id, {
maxCredits: 500, // hard ceiling — agent stops if hit
reserveCredits: 500, // escrowed from wallet at run start
expiresAfter: '30m', // reservation auto-returns if agent hangs
metadata: {
task: 'research',
query: req.body.query,
},
})
try {
const result = await myAgent.execute(input, { billingRun: run })
const receipt = await run.commit() // settle actual spend, return unused
console.log(`Used ${receipt.creditsUsed} of ${run.maxCredits} credits`)
} catch (e) {
await run.cancel() // full refund of reservation
throw e
}The run cap is enforced in-process — once spentCredits >= maxCredits, BudgetExhaustedError is thrown without any server round-trip. The global wallet is also checked on each call: a user cannot spend more than min(run.maxCredits, globalBalance).
Configuration
const meter = createMeter({
apiKey: 'sk_...', // required — from app.agensor.dev dashboard
baseUrl: 'https://api.agensor.dev', // optional — default shown
// baseUrl: 'mock' // dev/test mode — no real server needed
syncIntervalMs: 30_000, // optional — how often to flush debits (ms)
})Mock mode (baseUrl: 'mock') returns canned balances and accepts syncs without a real server. Use it in tests and local development.
Cleanup
Call meter.destroy() when you're done — in tests, or on process shutdown — to clear the background sync interval:
afterEach(() => meter.destroy())How the balance sync works
To keep the enforcement path at zero latency, Agensor uses an optimistic in-memory balance:
- On first call for a user, balance is fetched from the server and cached
- Every LLM/tool debit is applied locally — no network call on the hot path
- A background loop (default: every 30s) flushes pending debits to the server
- The server returns the corrected balance, which updates the local cache
- If the server is unreachable, the SDK continues using the local estimate and logs a warning — it never blocks your product
This means there can be a brief overshoot window if a user's balance hits zero at exactly the wrong moment. This is an intentional trade-off: blocking every LLM call on a server round-trip adds 50–200ms to every agent step. Use startRun() with reserveCredits for tighter per-session control.
Horizontal scaling
The balance cache is per-process. Each Node.js instance maintains its own in-memory credit counter and syncs independently to the server.
This works correctly for:
- Single-server deployments
- One SDK instance per user session (e.g. serverless functions)
This creates unbounded overspend if:
- Multiple processes share the same
userIdsimultaneously (load-balanced servers, multiple workers) - A user has two concurrent sessions hitting different instances
v0.1 constraint: for horizontally scaled deployments, ensure requests
from the same userId are routed to the same instance (sticky sessions),
or use startRun() which reserves credits atomically at the server level
before the run begins.
Layer 2 (coming): reserveBalance() will enforce budgets server-side
per-request with no per-process state — making multi-instance deploys safe
without sticky sessions.
Error reference
| Error | When it's thrown |
|-------|-----------------|
| BudgetExhaustedError | User's credit balance or run cap is insufficient for the next call |
| AgensorError | Server returned an unexpected error (ledger, Stripe, etc.) |
BudgetExhaustedError carries .userId, .available, and .required — use these to show a meaningful message to your user.
Roadmap
Layer 1 — Human → Agent billing ✅ live now
| What | Status | |------|--------| | Anthropic + OpenAI wrappers | ✅ | | Tool metering (perCall / perSecond / perByte) | ✅ | | Run handles with hard credit caps | ✅ | | Managed credit ledger + developer dashboard | ✅ | | Embeddable user portal widget | ✅ | | Mock mode for local dev + tests | ✅ |
Layer 2 — Scale hardening (next)
- Server-side atomic balance enforcement — eliminates the per-process overshoot window, making multi-instance deploys safe without sticky sessions
- npm publish on early access launch
Layer 3 — Agent-to-agent payments (future)
Agents paying agents for subtasks. A2A micropayments with call tree attribution and automatic settlement.
Layer 4 — Agent reputation (emerging)
Trust scores and discovery registry emerging from transaction history.
Technical detail
See SPEC.md for the full technical specification: credit enforcement architecture, streaming reservation design, RunHandle concurrency model, A2A payment design, and key decision log.
License
Apache 2.0 — free to use, self-host, and modify.
