@useanvil/sdk
v0.1.6
Published
Anvil: execution correctness layer for AI-triggered actions
Readme
Anvil
Anvil is an execution correctness layer for risky actions.
Put it in front of the calls that can move money, mutate a system, trigger an irreversible workflow step, or cause an expensive side effect if they happen twice.
Anvil helps teams make sure those actions:
- do not run twice by accident
- stay inside policy before they run
- leave behind durable receipts operators can inspect later
- help verify whether the downstream side effect actually happened
- can be observed safely in ghost before enforce
- stay under centralized rollout control
- leave an operational record teams can trust
The point is not “more infrastructure.” The point is fewer bad outcomes, safer automation rollout, less hand-rolled defensive code, and clearer proof that AI or software is operating correctly in production.
Production Truth
For design-partner production deployments, the durable model is:
- Postgres is the source of truth for receipts, receipt events, audit records, durable commands, workflow recovery state, and control history.
- Redis is coordination only: locks, leases, counters, and short-lived in-flight coordination.
- Local JSONL is optional debug/export output only.
Practical consequence:
- if Postgres is unavailable before a risky side effect starts, Anvil fails closed
- if a process crashes after the provider may have accepted the action, the durable attempt record is still in Postgres
- if Redis is flushed, recovery still works because the durable truth is not in Redis
Runtime boot rule:
- if
ANVIL_POSTGRES_URLis set, Anvil now auto-registers the built-inPostgresPersistenceAdapterat startup - you only need to call
registerPersistenceAdapter()yourself when you want a custom adapter or test double
Lock lease invariant:
ANVIL_LOCK_TTL_SECONDSdefaults to30and must be longer than the slowest realexecute()call.ANVIL_LOCK_KEEPALIVE_ENABLED=truerenews the Redis lock every half-TTL while the action is still running.- Example: if a Stripe charge can take
25s, a30slease leaves only5sof margin. That is risky in production. Increase the TTL rather than relying on luck.
Repo Layout
The repo is organized around a few clear layers:
src/: runtime SDK, CLI, control plane, receipts, policies, and UI generatorssrc/ui/: generated local product surfaces like Start, Ghost Report, Policy Studio, and Control Centersrc/sinks/: optional output adapters for observability and local metricspolicies/: bundled starter and production-ready policy filesdemo/: runnable demoseval/fixtures/: bundled Ghost evaluation datasetsscripts/: verification, benchmarking, evidence, and ops drillstests/: unit and integration coveragedocs/: product docs, PRDs, and operations notespreview/: static prototype and marketing previewsartifacts/: generated local output only, not source of truthdist/: build output only
Ghost Modes
There are two different Ghost stories in this repo, and they are not the same thing:
- Runtime ghost mode is the SDK/CLI/app rollout mode for live traffic. It still runs the real provider call and returns the real result or thrown error to the app. It only records what Anvil would have done as a separate shadow decision:
would_allow,would_dedup,would_block, orwould_unknown. - Ghost replay reconstructs execution truth from messy logs after the fact.
Ghost replay works like this:
- parse raw logs into normalized events
- group normalized events into entities
- run deterministic inference for attempts, final status, retries, and duplicates
- optionally attach AI only for narrow low-confidence unknown cases
The deterministic layer is the source of truth. AI is optional enrichment only and never gets to rewrite a deterministic success or failure.
Run the Ghost pipeline directly:
npm run eval -- --logs ./eval/fixtures/adversarial_logs.ndjson --truth ./eval/fixtures/adversarial_ground_truth.json --out ./artifacts/ghost-evalThat writes:
artifacts/.../summary.jsonartifacts/.../comparisons.jsonartifacts/.../operations.json
Practical rule:
- If it defines behavior, it belongs in
src/,policies/,tests/, ordocs/. - If it is generated by a command, it belongs in
artifacts/ordist/.
Runtime Ghost Mode v2 Migration
Old runtime ghost behavior skipped the protected execute() callback and returned synthetic runtime statuses such as blocked, duplicate, or executed with result = null. That was wrong for live traffic because enabling ghost could suppress real side effects or break caller-visible return contracts.
Ghost Mode v2 fixes that contract:
execute()still runs in runtime ghost mode.- The app still gets the real return value or the real thrown error.
- The top-level runtime
statusis the real execution outcome. - The shadow decision now lives in
response.ghostand auditrecord.ghostaswould_allow,would_dedup,would_block, orwould_unknown. - Ghost observation state is kept separate from enforce-mode idempotency state.
For the runtime hot-path map, Redis round-trip counts, benchmark command, and provider-facing idempotency audit, see docs/GHOST_MODE_V2_GA_READINESS.md.
Quick Start
- Install dependencies:
npm install- Start with the guided front door:
anvil startThat command now writes the Start page, runs a protected local execution, blocks the duplicate replay, and prints a real receipt path so the first experience ends with proof instead of setup trivia.
- Create your local config if you want a
.envscaffold:
npm run init- Check your environment:
npm run doctorBy default, runtime installs now start in ANVIL_MODE=enforce. Switch to ANVIL_MODE=ghost explicitly when you want observe-only rollout on real traffic.
If you are bringing up production durability:
npm run db:migrate- Re-run the dead-simple local demo directly any time:
npm run demo- See value immediately with the built-in ghost sample:
npm run ghost:sample- Measure runtime overhead on your own Redis before rollout changes:
npm run build --silent
npm run bench:guardThat benchmark compares enforce and runtime ghost in separate child processes and reports the delta in throughput plus average and p95 latency.
- Run the duplicate-prevention demo with a Stripe test key:
npm run demo:stripe -- --stripe-key sk_test_...- For existing product code, start with the default adoption namespace:
import { safe } from '@useanvil/sdk';There is one mental model: point Anvil at the risky thing and protect it. You do not have to know Anvil's primitive taxonomy first.
safe.protect(...)— protect one inline risky side effect.safe.wrap(...)— protect an existing function without changing its contract.
Describe the risky call in plain business terms and Anvil infers the surface (payment / subscription / external mutation / agent step) and applies the right keying, metadata, reporting vocabulary, and policy defaults automatically:
import { safe } from '@useanvil/sdk';
// Inferred as a payment charge — keyed on orderId, reported as payments.charge.
await safe.protect({
orderId,
amount: cents,
execute: async () =>
stripe.charges.create({ amount: cents, currency: 'usd', customer: customerId })
});
// Inferred as a refund (the `reason` field is the high-confidence refund signal).
await safe.protect({ orderId, amount: cents, reason: 'customer_request', execute });
// Inferred as an external mutation — keyed on stable business identity.
await safe.protect({
provider: 'salesforce', resourceType: 'contact', resourceId, operation: 'update-email',
method: 'PATCH', execute
});
// Inferred as an agent/workflow step — enforces workflow budgets.
await safe.protect({ workflow, stepId, toolName, execute });
// Inferred as inbound event processing — dedups a retried webhook on its eventId,
// so your handler runs exactly once. (Outbound delivery has a deliveryId; inbound never does.)
await safe.protect({ eventId: evt.id, provider: 'stripe', execute: () => handleEvent(evt) });
// Inferred as a payout / transfer — keyed on the durable payout/transfer id.
await safe.protect({ payoutId, amount: cents, execute });
await safe.protect({ transferId, amount: cents, destinationAccount, execute });
// Inferred as an OUTBOUND webhook delivery — keyed on eventId + deliveryId.
// (An eventId WITHOUT a deliveryId is inferred as inbound webhook.received instead.)
await safe.protect({ endpoint, eventId, deliveryId, provider: 'stripe', execute });
// Inferred as an email send — keyed on messageId + recipient.
await safe.protect({ messageId, recipient, execute });
// Inferred as a support ticket create — keyed on provider + externalCaseId.
await safe.protect({ provider: 'zendesk', externalCaseId, execute });execute() receives the same computed context the explicit surface would hand it,
so migrating loses nothing — most importantly the idempotencyKey to forward to
your provider for provider-level idempotency, plus the resolved surface, the
business metadata, the provider-ready providerMetadata, and any workflow:
await safe.protect({
paymentIntentId,
amount: cents,
confirm: true,
execute: async ({ idempotencyKey, providerMetadata }) =>
stripe.paymentIntents.confirm(paymentIntentId, { idempotencyKey, metadata: providerMetadata })
});Anvil leads with inference but never guesses an idempotency key. When the data is
ambiguous (e.g. a paymentIntentId that could be a confirm or a capture), it
throws AnvilAmbiguousProtectionError telling you the candidate surfaces and the
field — or as override — that resolves it. For the confirm/capture case you just
add the intent boolean — no taxonomy needed:
// Disambiguate a payment intent with a plain business word:
await safe.protect({ paymentIntentId, amount: cents, confirm: true, execute }); // → payment.confirm
await safe.protect({ paymentIntentId, amount: cents, capture: true, execute }); // → payment.capture
// Or, if you prefer, pin the surface explicitly — same key, same result:
await safe.protect({ as: 'payment.capture', paymentIntentId, amount: cents, execute });safe.wrap(...) works the same way; business fields may be plain values or
factories of the call context so keys can be derived from arguments:
const createCharge = safe.wrap({
fn: stripe.charges.create,
orderId: ({ args }) => args[0].metadata.orderId,
amount: ({ args }) => args[0].amount
}); // → inferred payment.chargeFully explicit / advanced path. The productized surfaces are still first-class
and remain the right choice when you want zero inference or provider-shaped
naming. Use them directly, or pin any universal call with as:
safe.payments.charge|confirm|capture|refund(...)safe.subscriptions.create|change|cancel|resume(...)safe.external.mutation(...)safe.agent.toolCall(...)/safe.agent.step(...)
You can also stay fully explicit on safe.protect/safe.wrap by passing key
and action yourself — that is the original contract and is unchanged.
// Explicit: you supply the key and canonical action yourself.
const charge = await safe.protect({
key: safe.key.payment(orderId),
action: 'payment.charge',
amount: cents,
execute: async () =>
stripe.charges.create({ amount: cents, currency: 'usd', customer: customerId })
});- Use
guard()directly only when you explicitly want Anvil's structured response object in your own code:
import { guard, key } from '@useanvil/sdk';
await guard({
key: key.payment('order-8821'),
action: 'payment.charge',
amount: 4999,
execute: async () => ({ ok: true })
});policy defaults to stripe-v1, so you only need to pass it when using a different policy.
Practical rule:
- Start with
safe.protect()(inline) orsafe.wrap()(existing function) and describe the risky thing in business terms. Let Anvil infer the surface. - Reach for
safe.payments.*,safe.subscriptions.*,safe.external.mutation(), orsafe.agent.toolCall()when you want to be fully explicit, or pin a universal call withas. - Pass
key+actiontosafe.protect()/safe.wrap()when you want to control the idempotency key yourself. - Use
guard()when you want the full structuredAnvilResponse.
Safe adoption rule:
- If a team is unsure which surface to choose, that is exactly the case the universal path is for —
safe.protect({ ...business fields, execute })and let inference decide. - If Anvil throws
AnvilAmbiguousProtectionError, it is telling you the data is genuinely ambiguous. Add the field it names or pin the surface withas— do not work around it, because a wrong key means a wrong dedup decision. - Move to direct
guard()handling only when you actually want downstream code to consume Anvil statuses likeduplicate,blocked, orunknown.
Universal protection — design decisions:
- Smart defaults, safe fallbacks — never magic. Inference keys off stable, structured business fields (provider/resourceId, subscriptionId, paymentIntentId, workflow+stepId+toolName), never on function or variable names. Callee-name heuristics are fine for CLI suggestions but too weak for runtime keying.
- Refuse over guess. When two surfaces are plausible from the same data (confirm vs capture, or a bare
subscriptionId), Anvil throws instead of guessing. A wrong classification is a wrong idempotency key, which in a dedup engine can mean a double-charge or a swallowed write. - Fail loud on bad input. Malformed input is rejected up front with
AnvilInvalidProtectionInputErrorbefore anything is classified or executed, so a poisoned value can never reach the key, the payload hash, or a budget comparison. A non-finite or negativeamount(NaN/Infinity/-5) is refused outright — it would otherwise silently defeat budget caps (NaN > limitis alwaysfalse) — and a missing/non-functionexecutefails with a clear message instead of a deep crash. Empty or whitespace-only identity fields are treated as absent, so a typo can never key on" ". - Name the missing field. When you are one field short of a valid surface (e.g. an external mutation without
method, or acustomerIdwith nopriceId), the error names exactly what to add rather than a generic "could not infer" — because a confusing error is what pushes people toward hand-rolling keys, the footgun this layer exists to remove. - Single source of truth. The universal path does not re-implement keying or metadata. It classifies, then dispatches to the existing product surfaces, so Ghost/audit/receipts keep speaking the correct business language for free.
- Zero-loss migration (semantic fidelity). The universal path dispatches to the same surfaces, so the result, the thrown errors (provider errors,
duplicatereplay,blocked), the receipts, and the metadata are byte-for-byte what the explicit surface produces. Theexecute()callback receives the same computed context too —idempotencyKey,surface,metadata,providerMetadata,workflow— so moving an explicit call to the universal form never drops information you were relying on. - Override precedence.
aspins the surface; an explicitkeypins the idempotency key; explicitaction(onsafe.protect/safe.wrap) keeps the original low-level contract. Explicit always wins over inference. - Backward compatible. Existing callers that pass
action(andkey) are routed down the exact original path, includingwrap's lightweight hosted lock/complete protocol. The universal path is purely additive. Hosted mode still works: universalprotect/wrapdispatch through the product surfaces intoguard(), which has its own full hosted-API path (ANVIL_API_KEY), so no local Redis is required.
Integration proof:
- See docs/INTEGRATION_PROOF.md for four production-style examples: existing backend function, API handler, worker/job, and webhook handler.
Production deployment:
- See docs/OPERATIONS.md for the runbook.
- See docs/DESIGN_PARTNER_PRODUCTION_ARCHITECTURE.md for the blessed topology.
What Anvil Is
Anvil is a runtime execution control layer for side-effectful actions.
Use it when an agent, workflow, or service can:
- move money
- mutate an external system
- trigger a step that should not run twice
In practice, Anvil sits between “request received” and “real-world side effect happens.”
Anvil is strongest on:
- payments and refunds
- external API calls
- workflow steps with real-world side effects
What Anvil Is Not
Anvil is not:
- a general workflow engine
- a broad authorization platform
- a replacement for business logic or provider-native idempotency
If a path is read-only or low-risk, Anvil is probably unnecessary.
The strongest setup is usually:
- provider-native idempotency where available
- Anvil around the full side-effectful workflow
That combination is stronger than either layer alone.
Common Commands
npm run init
npm run doctor
npm run start
npm run demo
npm run cc
npm run policy:studio
npm run ghost:sample
npm run demo:stripe -- --stripe-key sk_test_...
npm run verifyClaude Code / Agent Quick Start
Born protected: the agent pack
The highest-leverage protection is to never make it a separate step. init-agents
teaches your AI coding tools to write risky calls already wrapped, in ghost mode,
at generation time:
npx @useanvil/sdk init-agentsIt detects which agents your repo uses and drops one drop-in guidance file for each:
- Claude Code →
.claude/skills/anvil-protect/SKILL.md - Cursor →
.cursor/rules/anvil.mdc - Codex / others → an
AGENTS.mdsection
All three are compiled from a single source (agent-pack/anvil-wrap-guidance.md),
so every agent writes the byte-identical wrap({ fn, key, action }) shape. After
this, asking any of them to "add a Stripe charge" yields code that compiles, runs,
sits in ghost mode, and shows up in your dashboard — with a durable idempotency key
or an honest TODO(anvil) when one can't be inferred. Reads are never wrapped.
npx @useanvil/sdk init-agents --all # install for every agent regardless of detection
npx @useanvil/sdk init-agents --check # CI-friendly: exit 1 if a dropped file is staleThe pack is regenerated on every publish and guarded by a drift test, so the files you install always match the SDK you installed.
MCP server: protect & introspect from your agent
The agent pack makes new code born-protected. The MCP server lets the agent retrofit existing calls and verify protection on demand — same codemod, so the wraps are byte-identical. It runs locally over stdio and inherits "which repo / which file" from the agent's workspace, so there's nothing to point at.
Five small tools: find_action, protect_action (wraps the real call in ghost
mode, returns a unified diff + the durable key it chose or an honest TODO,
reversible via .anvil-bak), check_protection, list_observations, and
set_mode (the only way to enable enforcement).
Configure once — no global install (npx fetches it):
claude mcp add anvil -- npx -y @useanvil/sdk mcpOr add to Claude Code's .mcp.json / Cursor's .cursor/mcp.json:
{
"mcpServers": {
"anvil": { "command": "npx", "args": ["-y", "@useanvil/sdk", "mcp"] }
}
}Run anvil mcp --config to print this any time. Then just ask: "protect the
payment.charge in this repo" — the agent wraps it, shows the diff, done.
Inline protection
For AI-triggered payments or API calls, the safest copy-paste path for an existing system is to describe the risky thing and let Anvil infer the surface:
import { safe } from '@useanvil/sdk';
await safe.protect({
orderId,
amount: cents,
execute: async () => stripe.charges.create({
amount: cents,
currency: 'usd',
customer: customerId
})
});Prefer to pin the key and action yourself? That contract is unchanged:
await safe.protect({
key: safe.key.payment(orderId),
action: 'payment.charge',
amount: cents,
execute: async () => stripe.charges.create({ amount: cents, currency: 'usd', customer: customerId })
});If you need to persist the key in a queue or workflow step, round-trip it safely:
import { key, safe } from '@useanvil/sdk';
const storedKey = safe.key.payment(orderId).toString();
await safe.protect({
key: key.from(storedKey),
action: 'payment.charge',
amount: cents,
execute: async () => doWork()
});Product surface example:
import { safe } from '@useanvil/sdk';
await safe.payments.confirm({
paymentIntentId: 'pi_123',
amount: 4999,
execute: async ({ paymentIntentId, idempotencyKey }) =>
stripe.paymentIntents.confirm(paymentIntentId, {
idempotencyKey
})
});Stripe money movement rule:
- Use
safe.payments.charge(...),safe.payments.confirm(...),safe.payments.capture(...), orsafe.payments.refund(...)as the default path. - Key charges and refunds from durable business identity like
orderId, refund business identity, or the payment intent being acted on. Do not key from request IDs, retry attempts, or webhook delivery IDs. - Keep Stripe idempotency turned on with the
idempotencyKeyAnvil gives you, but do not confuse that with full execution correctness.
What Stripe idempotency alone does not solve:
- it does not enforce your retry or amount policy before the call fires
- it does not give operators one durable receipt across retries, duplicates, and unknown outcomes
- it does not tell you whether a provider timeout is safe to retry without reconciliation
- it does not protect non-Stripe work around the same money movement path
Replay safety rule:
- replay is safe only when Anvil captured a replayable original provider result, so duplicate callers can receive the same value without re-running the charge or refund
- replay is not safe when the original result was not serializable or otherwise not replayable; in that case Anvil preserves the protection decision and throws instead of fabricating a fake provider result
- when runtime truth is unknown, hold retries until Stripe reconciliation or manual operator review says the next attempt is safe
Built-In Adapters
Plain English: built-in adapters mean you do not have to hand-build the Anvil wrapper for the most common risky calls.
Your app still performs the real provider call inside execute, but Anvil now gives you the recommended key shape, metadata, receipt context, and verification posture by default.
Most application code should prefer safe.payments.*, safe.external.*, safe.subscriptions.*, or safe.agent.toolCall(...) first. Reach for safe.adapters.* only when the provider-shaped surface is the more natural fit than the product-shaped one.
Adapters return an AnvilResponse on purpose. If you need a non-breaking call surface for existing code paths, prefer wrap(), protect(), or adapters.protected.* instead of swapping call sites straight to guard() or an adapter response object.
Built in today:
- Stripe charge
- Stripe refund
- Stripe payout
- Stripe transfer
- Stripe subscription cancel
- Webhook delivery
- Email send
- Support ticket create
- Generic external API mutation
Example: Stripe charge
import Stripe from 'stripe';
import { adapters } from '@useanvil/sdk';
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);
await adapters.stripe.charge({
orderId: 'order-8821',
amount: 4999,
execute: async (context) =>
stripe.charges.create({
amount: context.amount,
currency: context.currency,
metadata: context.stripeMetadata
}, {
idempotencyKey: context.idempotencyKey
})
});Same adapter, production-safe return contract:
const charge = await adapters.protected.stripe.charge({
orderId: 'order-8821',
amount: 4999,
execute: async (context) =>
stripe.charges.create({
amount: context.amount,
currency: context.currency,
metadata: context.stripeMetadata
}, {
idempotencyKey: context.idempotencyKey
})
});Example: email send
import { adapters } from '@useanvil/sdk';
await adapters.comms.emailSend({
recipient: '[email protected]',
messageId: 'welcome-8821',
provider: 'resend',
template: 'welcome',
execute: async () => sendEmailSomehow()
});Example: support ticket create
import { adapters } from '@useanvil/sdk';
await adapters.support.ticketCreate({
provider: 'zendesk',
externalCaseId: 'case-8821',
customerId: 'cust-42',
execute: async () => createTicketSomehow()
});If the risky call is a plain outbound HTTP write, the default path should be safe.external.mutation(...), not a custom wrapper.
Minimum fields for a safe generic mutation:
method: the write verb that will fireprovider: the external or internal system being changedresourceType: what business object is being changedresourceId: the stable business identifier for that objectoperation: the business mutation being attemptedtarget: optional request path or URL for operator contextactorId/workflow: optional governance and runtime context when the mutation sits inside a job or agent workflow
Keying rule:
- Prefer the same stable business identity you would use to explain the mutation to an operator:
provider + resourceType + resourceId + operation. - Do not key by request IDs, trace IDs, queue attempt IDs, retry counters, timestamps, webhook delivery attempts, or process-local UUIDs.
- If the same business mutation can be sent to multiple URLs over time, keep the same business key and record the URL in
target.
Examples:
Update Salesforce contact
import { safe } from '@useanvil/sdk';
await safe.external.mutation({
provider: 'salesforce',
resourceType: 'contact',
resourceId: '0038A00000F1ABC',
operation: 'update-email',
method: 'PATCH',
target: '/services/data/v60.0/sobjects/Contact/0038A00000F1ABC',
actorId: 'ops-user-7',
execute: async () =>
salesforce.patch('/services/data/v60.0/sobjects/Contact/0038A00000F1ABC', {
Email: '[email protected]'
})
});Patch HubSpot record
await safe.external.mutation({
provider: 'hubspot',
resourceType: 'company',
resourceId: '9482211',
operation: 'patch-lifecycle-stage',
method: 'PATCH',
target: '/crm/v3/objects/companies/9482211',
workflow: {
workflowId: 'nightly-crm-sync'
},
execute: async () =>
hubspot.patch('/crm/v3/objects/companies/9482211', {
properties: {
lifecyclestage: 'customer'
}
})
});Cancel account in internal billing API
await safe.external.mutation({
provider: 'internal-billing-api',
resourceType: 'account',
resourceId: account.id,
operation: 'cancel-account',
method: 'POST',
target: `/v1/accounts/${account.id}/cancel`,
actorId: req.user.id,
execute: async () =>
billingClient.post(`/v1/accounts/${account.id}/cancel`, {
reason: 'fraud-review'
})
});Create support artifact in a third-party tool
await safe.external.mutation({
provider: 'zendesk',
resourceType: 'ticket',
resourceId: caseId,
operation: 'create-followup-ticket',
method: 'POST',
target: '/api/v2/tickets',
reconciliationMode: 'custom',
execute: async () =>
zendesk.tickets.create({
external_id: caseId,
subject: 'Follow-up required'
})
});If you explicitly need the lower-level AnvilResponse surface or a workflow-local api.call shape, use adapters.external.mutation(...). If you need downstream verification for a generic mutation, register a custom reconciliation adapter.
Agent Workflow Limits
Anvil now supports first-class workflow limits for agent systems:
- max tool calls per workflow
- max retries per workflow
- max spend budget per workflow
- max recursion / step depth per workflow
These limits apply when you pass a workflowId into the guarded call. Anvil now auto-tracks tool call count, run-level retry pressure, and guarded spend for the run by default, then enforces those limits before the side effect runs. stepDepth remains optional caller-supplied context when the workflow runtime already knows nesting.
Example policy:
{
"name": "agent-runtime-v1",
"allowed_actions": ["api.call"],
"workflow_limits": {
"max_tool_calls": 20,
"max_retries": 6,
"max_spend": 20000,
"max_step_depth": 12
},
"limits": {
"api.call": {
"max_retries": 5,
"execute_timeout_ms": 8000
}
}
}Plain English:
max_tool_calls: stop the workflow before tool loop number 21max_retries: stop the workflow before retry count 7max_spend: stop the workflow before cumulative guarded amount goes above$200.00in the default bundled policiesmax_step_depth: stop runaway recursion or planner depth before step 13
Zero-Config Agent Runtime
The intended system is simple:
- Anvil is the runtime control point in front of a risky tool call
- agent and workflow teams use it when a step can mutate a real system, move money, or create an expensive side effect
- they use it at the exact point where the tool would normally run
- their workflow changes from “call the provider directly” to “call the provider through Anvil once”
Concrete example:
- an agent tries to verify a customer five times in the same run after a planner loop starts drifting
- the developer passes
workflowId,stepId, andtoolName - Anvil derives the business key, blocks duplicate step execution, increments the run counter, and blocks the fifth tool call before the provider sees it
What now works out of the box with the bundled policies:
- idempotent tool steps from
workflowId + stepId - automatic per-run tool call counting
- automatic per-run retry counting when the same guarded work is reclaimed after failure
- automatic per-run guarded spend accumulation when
amountis provided - explicit
status,safeToRetry, andnextAction - destructive-action flags with
destructiveandrequiresConfirmation - durable receipts through
anvil.receipts
Agent Starter
For an existing agent or workflow, the lowest-risk path is safe.agent.toolCall(...) because it preserves the normal tool return contract while still enforcing Anvil. A full starter lives in demo/agent-starter.ts.
import { safe } from '@useanvil/sdk';
await safe.agent.toolCall({
workflow: {
workflowId: run.id
},
stepId: `verify-customer-${customerId}`,
toolName: 'kyc.verify',
action: 'api.call',
amount: 80,
metadata: {
provider: 'persona',
customer_id: customerId
},
execute: async () =>
persona.verifications.create({
customerId
})
});Why this matters:
- the workflow limits now live in policy, not scattered in agent code
- Anvil now owns the tool-count, retry-count, and guarded-spend bookkeeping for the guarded steps
- receipts now capture workflow context cleanly
- the same runtime can govern both ordinary risky API calls and agent tool execution
- if the tool succeeds, your workflow still gets the tool result instead of a new wrapper object
Direct OpenAI Tool Execution
If your team is using the OpenAI Responses API directly, the minimal safe path is:
- generate one stable
workflowIdwhen the agent run starts - use the provider's
function_call.call_idas the AnvilstepId - hand the model's
function_callitems torunOpenAIToolCalls(...) - send the returned
function_call_outputitems back toclient.responses.create(...)
This stays faithful to the normal OpenAI loop. Anvil only governs whether the risky tool is allowed to run and whether a retry should replay or stop.
import OpenAI from 'openai';
import { runOpenAIToolCalls, safe } from '@useanvil/sdk';
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const workflowId = crypto.randomUUID(); // create once per agent run
const response = await client.responses.create({
model: 'gpt-5.2',
input: 'Refund order ord_8821 and then cancel the internal billing account if the refund succeeds.',
tools: [
{
type: 'function',
name: 'refund_order',
description: 'Refund a completed order exactly once.',
parameters: {
type: 'object',
properties: {
orderId: { type: 'string' },
paymentIntentId: { type: 'string' },
amountCents: { type: 'integer' }
},
required: ['orderId', 'paymentIntentId', 'amountCents']
}
},
{
type: 'function',
name: 'cancel_billing_account',
description: 'Cancel an account in the internal billing API exactly once.',
parameters: {
type: 'object',
properties: {
accountId: { type: 'string' },
reason: { type: 'string' }
},
required: ['accountId', 'reason']
}
}
]
});
const toolCalls = response.output.filter((item) => item.type === 'function_call');
const toolOutputs = await runOpenAIToolCalls({
calls: toolCalls,
config: { workflowId, policy: 'agent-runtime-v1' },
tools: {
refund_order: {
action: 'payment.refund',
amount: ({ parsedArguments }) => parsedArguments.amountCents,
metadata: ({ parsedArguments }) => ({
provider: 'stripe',
order_id: parsedArguments.orderId,
payment_intent_id: parsedArguments.paymentIntentId
}),
execute: async ({ orderId, paymentIntentId, amountCents }) =>
safe.payments.refund({
provider: 'stripe',
paymentIntentId,
amount: amountCents,
businessId: orderId,
execute: async () =>
stripe.refunds.create({
payment_intent: paymentIntentId,
amount: amountCents
})
})
},
cancel_billing_account: {
action: 'api.call',
metadata: ({ parsedArguments }) => ({
provider: 'internal_billing_api',
account_id: parsedArguments.accountId,
operation: 'account.cancel'
}),
execute: async ({ accountId, reason }) =>
safe.external.mutation({
system: 'internal_billing_api',
operation: 'account.cancel',
method: 'POST',
target: `/accounts/${accountId}/cancel`,
businessId: accountId,
execute: async () =>
billingClient.post(`/accounts/${accountId}/cancel`, { reason })
})
}
}
});
await client.responses.create({
model: 'gpt-5.2',
previous_response_id: response.id,
input: toolOutputs
});Practical rules:
workflowIdcomes from your runtime. It must stay stable across retries of the same agent run.stepIdcomes from OpenAI'scall_id. Do not replace it with a request-local UUID.- If Anvil blocks or cannot confirm execution,
runOpenAIToolCalls(...)returns a structuredfunction_call_outputpayload the model can reason about. - If the exact same tool call is replayed later with the same
call_id, Anvil replays the original result instead of running the side effect again when replay is safe.
Direct Anthropic Tool Execution
If your team is using the Anthropic Messages API directly, the minimal safe path is the same:
- generate one stable
workflowIdper agent run - use the provider's
tool_use.idas the AnvilstepId - hand the
tool_useblocks torunAnthropicToolUses(...) - append the returned
tool_resultblocks immediately after the assistant tool-use message
import Anthropic from '@anthropic-ai/sdk';
import { runAnthropicToolUses, safe } from '@useanvil/sdk';
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const workflowId = crypto.randomUUID(); // create once per agent run
const message = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
tools: [
{
name: 'create_refund_case',
description: 'Create a support artifact exactly once for a risky refund incident.',
input_schema: {
type: 'object',
properties: {
orderId: { type: 'string' },
customerId: { type: 'string' },
summary: { type: 'string' }
},
required: ['orderId', 'customerId', 'summary']
}
}
],
messages: [{ role: 'user', content: 'Create a support case for the customer who saw a duplicate refund attempt.' }]
});
const toolUses = message.content.filter((block) => block.type === 'tool_use');
const toolResults = await runAnthropicToolUses({
toolUses,
config: { workflowId, policy: 'agent-runtime-v1' },
tools: {
create_refund_case: {
action: 'api.call',
metadata: ({ input }) => ({
provider: 'zendesk',
order_id: input.orderId,
customer_id: input.customerId,
operation: 'ticket.create'
}),
execute: async ({ orderId, customerId, summary }) =>
safe.external.mutation({
system: 'zendesk',
operation: 'ticket.create',
method: 'POST',
target: '/api/v2/tickets',
businessId: `refund-case:${orderId}`,
execute: async () =>
zendesk.tickets.create({
subject: `Refund incident for ${orderId}`,
comment: { body: summary },
requester_id: customerId
})
})
}
}
});
await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 1024,
messages: [
{ role: 'assistant', content: message.content },
{ role: 'user', content: toolResults }
]
});Practical rules:
workflowIdis your run identity, not Anthropic's request id.tool_use.idis the step identity. That is what makes duplicate retries collapse onto the same guarded execution.- Blocked or unknown outcomes come back as
tool_resultblocks withis_error: true, so Claude can reason about the stop condition instead of blindly retrying. - If the tool use is retried after a successful completion, Anvil returns the prior result and does not run the side effect again.
Framework Integrations
Framework wrappers are the right choice when your team already lives inside that framework's tool loop and wants to keep the framework ergonomics intact.
Recommended path:
- use the direct OpenAI and Anthropic integrations when you are already calling those provider SDKs directly
- use
withAnvilGuard(...)for Vercel AI SDK when your tool execution already runs throughtool() - use
anvilToolkit(...)for LangChain or LangGraph when your tools already run throughinvoke()
The trust boundary is simple:
workflowIdmust come from your runtime and stay stable across retries of the same runstepIdmust come from a stable per-tool-call identity if the framework can provide one- if the framework cannot provide stable step identity, Anvil can still enforce workflow budgets, but durable per-call replay semantics become weaker
Vercel AI SDK
withAnvilGuard(...) is the production path when your tools execute through the Vercel AI SDK.
What it guarantees:
- if the SDK provides
toolCallId, Anvil uses it as the guardedstepId - repeated execution of the same
toolCallIdreplays the original result when replay is safe - blocked or unknown outcomes throw readable tool errors so the model can adapt
What it does not guarantee unless the SDK provides toolCallId:
- durable call-level deduplication across retries outside the current process
import { generateText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
import { withAnvilGuard } from '@useanvil/sdk/integrations/vercel-ai';
const workflowId = request.id; // stable for this agent run
const tools = withAnvilGuard(
{
refund_order: tool({
description: 'Refund an order exactly once.',
parameters: z.object({
orderId: z.string(),
paymentIntentId: z.string(),
amountCents: z.number().int().positive()
}),
execute: async ({ orderId, paymentIntentId, amountCents }) =>
stripe.refunds.create({
payment_intent: paymentIntentId,
amount: amountCents,
metadata: { orderId }
})
}),
cancel_billing_account: tool({
description: 'Cancel an account in the internal billing API exactly once.',
parameters: z.object({
accountId: z.string(),
reason: z.string()
}),
execute: async ({ accountId, reason }) =>
billingClient.post(`/accounts/${accountId}/cancel`, { reason })
})
},
{
workflowId,
actions: {
refund_order: 'payment.refund',
cancel_billing_account: 'api.call'
},
requireToolCallId: true
}
);
await generateText({
model: openai('gpt-5.2'),
tools,
prompt: 'Refund order ord_8821 and cancel the internal billing account.'
});Operational rule:
- set
requireToolCallId: trueif you want the integration to fail fast instead of silently falling back to a generated local step id
LangChain And LangGraph
anvilToolkit(...) is the right wrapper when your tools already execute through LangChain.
The honest default:
- generic LangChain does not expose a stable per-tool-call id through this wrapper surface
- Anvil therefore generates a fresh local
stepIdby default - that still enforces workflow-level tool counts, retry budgets, and spend budgets
- it does not give durable per-call deduplication across process restarts unless you provide a stable
getStepId(...)
import { DynamicStructuredTool } from '@langchain/core/tools';
import { AgentExecutor, createOpenAIFunctionsAgent } from 'langchain/agents';
import { anvilToolkit } from '@useanvil/sdk/integrations/langchain';
const workflowId = request.id; // stable for this agent run
const tools = anvilToolkit(
[
new DynamicStructuredTool({
name: 'refund_order',
description: 'Refund an order exactly once.',
schema: refundSchema,
func: async ({ orderId, paymentIntentId, amountCents }) =>
stripe.refunds.create({
payment_intent: paymentIntentId,
amount: amountCents,
metadata: { orderId }
})
}),
new DynamicStructuredTool({
name: 'patch_salesforce_contact',
description: 'Patch a Salesforce contact exactly once.',
schema: salesforceSchema,
func: async ({ contactId, email }) =>
salesforce.patch(`/services/data/v60.0/sobjects/Contact/${contactId}`, {
Email: email
})
})
],
{
workflowId,
actions: {
refund_order: 'payment.refund',
patch_salesforce_contact: 'api.call'
}
}
);If you are in LangGraph and your runtime already has durable step identity, pass it in:
const tools = anvilToolkit(myTools, {
workflowId: graphThreadId,
actions: {
refund_order: 'payment.refund'
},
getStepId: ({ toolName, options }) => {
const configurable = options?.configurable as { checkpoint_id?: string; task_id?: string } | undefined;
if (!configurable?.task_id) return undefined;
return `langgraph:${toolName}:${configurable.task_id}`;
}
});Operational rule:
- if you cannot provide a durable
getStepId(...), trustanvilToolkit(...)for workflow budgets and policy enforcement, not for provider-style per-call replay across restarts
Framework retry semantics:
- replayable duplicate: framework wrapper returns the original tool result
- non-replayable duplicate: framework wrapper throws a clear Anvil error instead of re-running the side effect
- blocked or unknown outcome: framework wrapper throws a readable error so the framework can surface it back into the loop
- conflict: if the winning result is replayable, the wrapper returns it; otherwise it throws and tells the caller to wait and inspect the receipt
Runtime Response Contract
Every guarded call now returns or persists the same high-signal runtime fields:
status:executed,duplicate,blocked, orunknownsafeToRetry: whether a retry is currently safenextAction:none,retry,do_not_retry,reconcile, orreviewdestructive: whether the action looks irreversible by defaultrequiresConfirmation: whether the caller should force an explicit confirmation step
Example unknown outcome:
const response = await agent.toolCall({
workflow: { workflowId: run.id },
stepId: 'issue-refund',
toolName: 'stripe.refunds.create',
action: 'payment.refund',
amount: 15000,
metadata: {
operation: 'refund'
},
execute: async () => stripe.refunds.create({ payment_intent: paymentIntentId, amount: 15000 })
});
if (response.status === 'unknown') {
console.log(response.safeToRetry, response.nextAction);
}Receipts API
The simplest operator surface is now:
import { anvil } from '@useanvil/sdk';
const recent = anvil.receipts.list({ action: 'payment.charge', limit: 10 });
const receipt = anvil.receipts.show(recent[0]!.id);If you need explicit lifecycle control for long-lived runs:
const current = await anvil.workflowState.read(run.id);
await anvil.workflowState.reset(run.id);- Open the visual policy editor:
anvil policy studioThis writes a self-contained HTML policy editor you can open in a browser, tune visually, and download as valid Anvil policy JSON.
Start Here If You Are New
If you only want the easiest path to understanding the product, run:
anvil startThat writes ./artifacts/anvil-start.html, a simple guided home page that explains:
- what Anvil does in plain English
- which command to run first
- how ghost mode, policies, and enforce mode fit together
- the fastest path for business users and the fastest path for engineers
Why Teams Use It Instead Of Hand-Rolling
Teams usually do not adopt Anvil just for a Redis lock.
They adopt it because it combines:
- duplicate prevention
- policy enforcement before risky execution
- action-specific required business context when a path needs stronger guardrails
- durable execution receipts and unknown-outcome recovery
- downstream reconciliation when teams need to verify real provider truth
- explicit unknown-outcome handling
- audit visibility and ghost-to-enforce rollout
- centralized rollout control across services
- per-action defaults like retries, amount caps, and timeouts
That is the part most in-house versions either skip or only discover after incidents.
The simple reason to wrap the call is:
- the expensive mistake is in execution, not in the request object
- the painful incident is the charge, refund, workflow step, or external mutation happening twice or outside policy
- the thing teams need in production is one product that covers prevention, policy, receipts, recovery, rollout safety, and operational visibility together
That is what Anvil is for.
First Production Path
The best first path to protect is usually:
- high-value enough to matter
- common enough to produce real signal
- simple enough that unknowns can be investigated quickly
Good examples:
payment.chargepayment.refund- one external API step with clear ownership
CLI
anvil start
anvil init
anvil init-agents
anvil mcp --config
anvil doctor
anvil policy studio
anvil ghost --sample
anvil ghost --logs ./your-logs.ndjson
anvil ghost --logs ./your-logs.ndjson --report ./artifacts/anvil-ghost-report.html
anvil control show
anvil receipts list --current-status unknown
anvil receipts inspect <receipt-id>
anvil receipts resolve <receipt-id> manual_review_required --note "Investigate provider outcome"
anvil receipts reconcile-auto <receipt-id>
anvil control resolve checkout-service payment.refund
anvil control set-mode enforce
anvil control kill-switch on
anvil control set-service-policy checkout-service payments-prod-v3
anvil control set-action-mode checkout-service payment.refund ghost
anvil audit incident --action payment.charge
anvil demo
anvil demo stripe --stripe-key sk_test_...anvil ghost --report writes a polished HTML report you can open in a browser, hand to engineering or business, and export from with built-in JSON/CSV download buttons.
anvil policy studio writes a visual policy editor at ./artifacts/anvil-policy-studio.html by default. Use it to add actions, set max amounts and retries, then download a policy JSON file and point ANVIL_POLICY_PATH at it.
Make A Policy By Hand
You do not need Policy Studio.
An Anvil policy is just JSON. Put a file in ./policies/ like this:
{
"name": "my-payments-policy",
"allowed_actions": ["payment.charge", "payment.refund", "api.call"],
"limits": {
"payment.charge": {
"max_amount": 50000,
"max_retries": 2,
"execute_timeout_ms": 15000
},
"payment.refund": {
"max_amount": 25000,
"max_retries": 1,
"execute_timeout_ms": 15000
},
"api.call": {
"max_retries": 3,
"execute_timeout_ms": 8000
}
}
}Then point Anvil at it:
export ANVIL_POLICY_PATH=./policies/my-payments-policy.jsonThen use it in code:
import { guard, key } from '@useanvil/sdk';
await guard({
key: key.payment(orderId),
action: 'payment.charge',
policy: 'my-payments-policy',
amount: 4999,
execute: async () => stripe.charges.create({ amount: 4999, currency: 'usd' })
});Rules of thumb:
allowed_actionsis the allowlist.max_amountis in cents.max_retriesis how many tries Anvil will allow.allowed_servicesandblocked_serviceslet you scope a policy to specific services.required_metadatalets you require stable facts likeactor_id,request_id, orapproval_ticket.custom_rulepoints to a local JS/MJS/CJS module when the JSON model is not enough.custom_rule.run_in_ghostlets that custom code run during ghost replay when your logs contain enough context for it to be meaningful.- If you set
max_amount, you must passamounttoguard(). - If you set
allowed_services, setANVIL_SERVICEin the running service. - If you set
required_metadataor usecustom_rule, passmetadatatoguard(). - Every action in
allowed_actionsshould have a matching row inlimits. - If several policy files live in the same directory, Anvil can load them by policy name.
Advanced example:
{
"name": "anvil-payments-governed",
"allowed_actions": ["payment.charge", "payment.refund", "api.call"],
"allowed_services": ["checkout-service", "payments-worker"],
"blocked_services": ["admin-dashboard"],
"required_metadata": ["actor_id", "request_id"],
"custom_rule": {
"module": "./custom/anvil-payments-governed-rule.mjs",
"export": "default",
"timeout_ms": 500,
"run_in_ghost": false,
"config": {
"approvalAmountCents": 100000,
"requiredApprovalKey": "approval_ticket"
}
},
"limits": {
"payment.charge": { "max_amount": 200000, "max_retries": 1, "execute_timeout_ms": 12000 },
"payment.refund": { "max_amount": 100000, "max_retries": 1, "execute_timeout_ms": 12000 },
"api.call": { "max_retries": 2, "execute_timeout_ms": 6000 }
}
}And the runtime call:
await guard({
key: key.payment(orderId),
action: 'payment.charge',
policy: 'anvil-payments-governed',
amount: 150000,
metadata: {
actor_id: 'user-42',
request_id: 'req-991',
approval_ticket: 'ap-7'
},
execute: async () => doCharge()
});Bundled examples:
./policies/stripe-v1.json./policies/anvil-payments-starter.json./policies/anvil-payments-prod.json./policies/anvil-payments-governed.json
Central Control Plane
Phase 1 adds a minimal Redis-backed control layer for multi-service rollout consistency.
- Set
ANVIL_SERVICE=<service-name>in each service. - Store the central JSON document in Redis at
anvil:control:configby default. - Runtime resolves mode in this order: kill switch, per-action override, per-service default, global default.
- Runtime resolves policy from the service assignment first, then falls back to the local policy configuration.
- If central control cannot be read, Anvil falls back to the existing local env and policy behavior.
anvil control resolve <service> <action>previews the exact resolved mode and policy before you roll anything forward.
ANVIL_POLICY_PATH can still point at a single file. If you place multiple policy JSON files in that same directory, Anvil can now load sibling files by policy name for per-service assignments.
If an action has a max_amount, Anvil now requires you to pass amount at runtime. That keeps money-moving actions from silently bypassing the cap.
If a policy row has execute_timeout_ms, Anvil will use that as the default timeout for the action unless the caller overrides it.
Reliability And Metrics
- Lock TTL is automatically extended to cover the action timeout plus a safety buffer, so slow provider calls do not reopen the key early.
- Redis Sentinel is supported through
ANVIL_REDIS_SENTINELSandANVIL_REDIS_SENTINEL_NAMEif you want failover without changing the guard path. - If you explicitly set
ANVIL_REDIS_UNAVAILABLE_BEHAVIOR=passthrough, Anvil can execute without idempotency protection when Redis is unreachable. Use this only for low-risk paths. - You can hook Redis failures directly with
hooks.onRedisError. - Built-in in-process counters are available through
@useanvil/sdk/sinks/metricsfor Prometheus-style scraping, including latency histogram buckets, Redis error counters, and audit write failure counters. anvil doctor --jsonnow emits a machine-readable operational health report you can wire into deployment checks.anvil prove idempotencyis the product-facing proof command. It hammers one key under concurrency, writes./artifacts/anvil-idempotency-proof.json, and letsanvil doctorreport the last verified proof.
Performance Check
Use the local benchmark harness before making a public throughput claim:
npm run build
BENCH_REQUESTS=10000 BENCH_CONCURRENCY=200 npm run bench:guardThis measures the enforced happy path against your configured Redis and prints throughput plus p50/p95/p99 latency. Treat the result as environment-specific, not a universal ceiling.
For the production-facing idempotency verification your design partner can actually run:
npm run build
ANVIL_REDIS_URL=redis://127.0.0.1:6379 anvil prove idempotency --requests 1000 --concurrency 50That writes ./artifacts/anvil-idempotency-proof.json and prints a pass/fail line like:
PASS — 1 executed, 999 protected duplicates, 0 unknown, Redis RTT p95 3ms — idempotency held across 1000 concurrent attempts.anvil doctor will then show the last proved date, request count, and proof result.
npm run prove:guard still exists as the older developer proof harness. Use anvil prove idempotency when you want the product surface a design partner can run and understand.
For a fuller release artifact that bundles doctor output, proof runs, and optional drills:
npm run build
ANVIL_REDIS_URL=redis://127.0.0.1:6379 npm run evidence:releaseSet RELEASE_EVIDENCE_INCLUDE_DRILL=1 to include a Redis recovery drill in the output report.
Use npm run drill:redis-recovery for a local degraded-mode and recovery drill. It manages its own temporary Redis instance by default so you can validate fail-closed, passthrough, and recovery behavior without touching a shared environment.
Production Shape
Recommended baseline:
- Run at least two app instances behind your normal service load balancer.
- Keep Redis in the same region and low-latency network boundary as the app.
- Use Redis Sentinel or your managed Redis failover equivalent for production deployments.
- Expose
anvil doctor --jsonorhealthCheckDetailed()in deployment checks. - Expose metrics from
createMetricsAggregator()to your existing telemetry stack. - Keep proof reports as release artifacts before making scale claims.
Repo Layout
src/contains the runtime and CLI.src/ui/contains the generated product surfaces: Start, Policy Studio, and Ghost Report.policies/contains bundled starter policies.demo/contains live demo entrypoints.artifacts/is the default home for generated HTML reports and local audit output.docs/product/contains the product and spec documents.
Operations
For the short operational model and failure semantics, see docs/OPERATIONS.md.
Verification
npm run verify