@perufitlife/durable-agent-runtime

v0.1.0

Published

a month ago

Stop your AI agent from doing the same thing twice, doing dumb things, or doing things you never agreed to. A durable runtime for LLM agents, in Postgres.

0High
0Medium
0Low

renzom13

ai agent llm idempotency durable-execution event-sourcing anthropic openai postgres supabase tool-use function-calling constitutional-ai guardrails

Durable Agent Runtime

Stop your AI agent from doing the same thing twice, doing dumb things, or doing things you never agreed to.
A durable runtime for LLM agents. Postgres-backed. Model-agnostic. ~5KB gzipped. Production-tested on a real CRM serving 10k+ daily WhatsApp conversations.

npm install @perufitlife/durable-agent-runtime

The 30-second pitch

Your LLM agent has tools. It calls them. Sometimes it calls the same tool twice (you get charged twice). Sometimes it calls a tool with bad inputs (you ship to a city that doesn't exist). Sometimes it does things you explicitly told it not to (it auto-confirms a payment).

This package wraps your tools in three layers:

Idempotency — same tool, same input, same window → executes ONCE, returns cached result the next N times.
Constitutional rules — predicates you write in TypeScript. Tool can't execute unless rules pass.
Event log — every tool call is an immutable row in Postgres. Audit trail, replay, postmortem in 30 seconds instead of 8 hours of reading chats.

That's it. No new infrastructure. No vendor lock-in. Works with Claude, OpenAI, Gemini, local models — anything that calls a function.

Why this exists

We run a Peruvian COD e-commerce CRM with an LLM agent handling thousands of WhatsApp conversations per day. We learned the hard way:

| Real incident | What happened | What DAR does | |---|---|---| | 551 duplicate orders | A cron retried 19 times in a window. The "import to courier" tool fired all 19 times. 29 orders × 19 ticks = 551 duplicates. | Idempotency key import:order-123:hash makes call #2..19 return the cached result instantly. | | Zombie orders in Yunguyo | Agent promised home delivery in a city where no courier had coverage. Customer paid the advance. No one ever shipped. | Constitutional rule: dispatch_order requires validate_coverage(zone) === true. Tool call rejected before execution. | | Brand identity leak | Agent introduced itself as "Lucía from Peru Fit Life Store" when the actual Shopify store was a different brand. Customers got confused. | Rule: every outbound message must match store_config.brand_name. Mismatched message rejected with reason logged. |

Each of these cost real money. Each was preventable. Each is now impossible with DAR enabled.

Quickstart

1. Run the migration

-- migrations/001_init.sql
CREATE TABLE agent_tool_events (
  event_id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  idempotency_key   TEXT NOT NULL UNIQUE,
  tool_name         TEXT NOT NULL,
  effect            TEXT NOT NULL,
  input             JSONB NOT NULL,
  output            JSONB,
  status            TEXT NOT NULL,
  rejection_reason  TEXT,
  correlation_id    TEXT,
  parent_event_id   UUID REFERENCES agent_tool_events(event_id),
  created_at        TIMESTAMPTZ NOT NULL DEFAULT now(),
  completed_at      TIMESTAMPTZ
);

CREATE INDEX idx_agent_tool_events_tool_name ON agent_tool_events(tool_name);
CREATE INDEX idx_agent_tool_events_correlation_id ON agent_tool_events(correlation_id);
CREATE INDEX idx_agent_tool_events_created_at ON agent_tool_events(created_at DESC);

2. Wrap your tools

import { createRuntime, defineTool, defineRule } from '@perufitlife/durable-agent-runtime';
import { PostgresEventStore } from '@perufitlife/durable-agent-runtime/postgres';
import { Pool } from 'pg';

const runtime = createRuntime({
  store: new PostgresEventStore(new Pool({ connectionString: process.env.DATABASE_URL })),
});

// 1. Declare your tool with its effect type
const dispatchOrder = defineTool({
  name: 'dispatch_order',
  effect: 'irreversible',          // ⚠️ tells DAR to be strict
  idempotencyKey: (input) => `dispatch:${input.orderId}:${input.courier}`,
  handler: async (input: { orderId: string; courier: string; zone: string }) => {
    // your actual side effect
    return await courierAPI.createShipment(input);
  },
});

// 2. Add constitutional rules
runtime.addRule(defineRule({
  name: 'courier-must-cover-zone',
  appliesTo: 'dispatch_order',
  check: async (input) => {
    const covered = await coverageService.isCovered(input.courier, input.zone);
    return covered
      ? { allow: true }
      : { allow: false, reason: `${input.courier} does not cover ${input.zone}` };
  },
}));

// 3. Execute through the runtime
const result = await runtime.execute(dispatchOrder, {
  orderId: 'ORD-123',
  courier: 'shalom',
  zone: 'yunguyo',
});

// If the rule fails → result.status === 'rejected', courier API never called.
// If the rule passes and you call this again with the same input → result.status === 'cached'.
// If it's a fresh call → result.status === 'executed'.

What "durable" means here

Most agent frameworks treat tool calls as fire-and-forget. DAR treats them as event-sourced:

┌──────────────┐    ┌───────────────┐    ┌──────────────┐    ┌──────────────┐
│ LLM decides  │───▶│ DAR checks    │───▶│ DAR checks   │───▶│ Tool handler │
│ to call tool │    │ rules         │    │ idempotency  │    │ executes     │
└──────────────┘    └───────────────┘    └──────────────┘    └──────────────┘
                            │                    │                   │
                            ▼                    ▼                   ▼
                    ┌─────────────────────────────────────────────────────┐
                    │       agent_tool_events  (Postgres)                 │
                    │  every attempt, rejection, result — immutable row   │
                    └─────────────────────────────────────────────────────┘

This gives you:

Crash safety — your process dies mid-execution? The next replay reads the event log and skips completed work.
Idempotent retries — caller can retry safely without fear of duplicate effects.
Auditable history — SELECT * FROM agent_tool_events WHERE correlation_id = 'session-xyz' reconstructs the entire agent reasoning trace.
Hot-reloadable rules — change predicates.ts, redeploy. Next tool call uses new rules. Zero downtime.

The Effect type system

Every tool declares one of four effects. DAR applies appropriate strictness:

| Effect | Examples | DAR behavior | |---|---|---| | read | get_order, search_customers | No idempotency required. Rules optional. | | mutate-local | update_internal_status, log_event | Idempotency required. Rules optional. | | mutate-external | send_whatsapp, charge_payment, update_shopify | Idempotency required. Rules strongly recommended. | | irreversible | dispatch_to_courier, send_email, fire_webhook | Idempotency required. Rules required. Sandbox dry-run available (v0.2). |

You can't ship a tool without declaring its effect. TypeScript enforces it. Reviewers see it at a glance.

Comparison

| | LangChain / LlamaIndex | Temporal / Restate | DAR | |---|---|---|---| | Idempotent tool calls | ❌ | ✅ (heavy) | ✅ | | Constitutional rules | ❌ | ❌ | ✅ | | Postgres-only (no new infra) | ❌ | ❌ (own cluster) | ✅ | | Model-agnostic | partial | yes | ✅ | | Bundle size | huge | huge | ~5KB gzipped | | Audit trail | ❌ | ✅ | ✅ | | Designed for LLM agents specifically | ✅ | ❌ | ✅ |

We're not trying to replace Temporal for microservices. We're solving the specific shape of "LLM picks a tool, calls it, sometimes does dumb things."

Production status

Battle-tested: serving production traffic at Peru Fit Life and FitCRM since May 2026.
API stability: 0.x — expect breaking changes until 1.0. We tag every minor with migration notes.
License: MIT.
Built by: @renzomacar — building in public, follow along.

What's next

v0.2 — Sandbox dry-run for irreversible tools (simulate before commit)
v0.3 — Saga compensation (multi-step rollback)
v0.4 — Semantic deduplication via embeddings
v0.5 — Replay graph visualizer (TUI + web)
v1.0 — Speculative branching (Monte Carlo Tree Search applied to tool choice)

PRs welcome. Issues even more welcome.

TL;DR — your LLM agent will eventually screw something up. DAR is the airbag, the seatbelt, and the dashcam.