agentskeptic
v4.2.0
Published
Structured tool activity vs downstream state at verify time: SQL (SQLite, Postgres, MySQL), HTTP witnesses, supported vector indexes, MongoDB documents, and S3-compatible objects when configured—deterministic verdict artifacts (see verification-state-stor
Maintainers
Readme
AgentSkeptic — state vs trace
Trust reality, not traces.
Declared tool effects vs read-only store facts.
Traces say success. Your data often disagrees. Read-only checks at verify time compare tool claims to stored state—before you ship or bill.
Bundled terminal proof
### Success (`wf_complete`)
workflow_id: wf_complete
workflow_status: complete
trust: TRUSTED: Every step matched the database under the configured verification rules.
steps:
- seq=0 tool=crm.upsert_contact result=Matched the database.
{
"schemaVersion": 15,
"workflowId": "wf_complete",
"status": "complete",
"steps": [{ "seq": 0, "toolId": "crm.upsert_contact", "status": "verified" }]
}
### Failure (`wf_missing`)
workflow_id: wf_missing
workflow_status: inconsistent
steps:
- seq=0 tool=crm.upsert_contact result:Expected row is missing from the database (the log implies a write that is not present).
reference_code: ROW_ABSENT
{
"schemaVersion": 15,
"workflowId": "wf_missing",
"status": "inconsistent",
"steps": [
{
"seq": 0,
"toolId": "crm.upsert_contact",
"status": "missing",
"reasons": [{ "code": "ROW_ABSENT" }]
}
]
}Default path: one truth check
Compare recorded tool activity to your database and get an Outcome Certificate (stdout) plus a truth_check_verdict line on stderr:
npx agentskeptic check --workflow-id wf_example \
--project ./path/to/your-app \
--db ./path/to/readable.sqliteWith the conventional layout, --registry and --events default to ./path/to/your-app/agentskeptic/tools.json and events.ndjson. Pass them explicitly when your paths differ. Full reference: docs/integrate.md.
Exportable activation (advanced): BootstrapPackInput v1 + agentskeptic activate (writes proof/ under --out on exits 0–2; bootstrap is legacy — docs/bootstrap-pack-normative.md).
Lifecycle
- Keep
agentskeptic/tools.jsonin version control; update whentoolId→ SQL mapping changes. - Emit observations via the canonical SDK emitter, then append emitted rows to the gate buffer. Optionally mirror the same JSON lines to
agentskeptic/events.ndjsonfor CI replay. - On the code path before irreversible work you control (ship, bill, ticket close), call
await gate.assertSafeForIrreversibleAction()so unsafe trust (or required emissions that never reached the gate) blocks that branch — it is not a substitute for wiring the gate everywhere it matters, and outcomes can still beunknownwhenhighStakesRelianceis notpermitted(seedocs/outcome-certificate-normative.md).
Install
npm install agentskepticCode
npx agentskeptic init --framework none --database sqlite --yesimport { join } from "node:path";
import { AgentSkeptic } from "agentskeptic";
const skeptic = new AgentSkeptic({
registryPath: join("agentskeptic", "tools.json"),
databaseUrl: join(process.cwd(), "demo.db"),
});
const certificate = await skeptic.check({
workflowId: "wf_complete",
observations: [
{
toolId: "crm.upsert_contact",
params: { recordId: "c_ok", fields: { name: "Alice", status: "active" } },
},
],
});See docs/integrate.md (v2 integrator SSOT) and docs/migrate-2.md for 1.x → 2.0 renames.
Buy vs build: why not only SQL checks
The scar (one pattern, over and over): the trace says the tool succeeded—here crm.upsert_contact / contacts—but the row is missing or wrong. The repo demo names it wf_missing / ROW_ABSENT; the same failure shape applies whenever your registry maps tool activity to SQL state (ledgers, orders, tickets—not only CRM). That is not a logging problem—it is a money and risk problem the moment you ship, bill, close, or treat the run as audit evidence.
Why “we’ll just write SQL checks” stops working
- Drift: Scripts rot when schemas and workflows change; nobody keeps them current.
- No ownership: The author leaves; the checks become folklore.
- Not an org contract: Expectations live in heads and one-off files—not in a shared
tools.json+ NDJSON contract everyone replays. - CI and audit: Ad-hoc checks are skipped locally and rarely ship as repeatable artifacts; when the issue is cross-team or compliance, scripts do not hold. Use CI lock / enforcement when you need pins (
docs/ci-enforcement.md).
What you standardize on instead: when the row backs revenue or customer promises, you stop betting the business on whoever wrote the last script. AgentSkeptic is how the org owns the check: one verifier, one replayable contract, Quick → Contract when stakes go up—explore with Quick Verify (docs/quick-verify-normative.md), lock with contract mode and a tools.json registry when “we ran a query” is not evidence (docs/agentskeptic.md). That is the responsible default once the failure mode hurts.
Core mechanism: Read-only SQL checks that your database at verification time matches expectations derived from structured tool activity—not whether a trace step “succeeded.”
Read-only checks at verify time—not color.
- Repository: https://github.com/jwekavanagh/agentskeptic
- npm package: https://www.npmjs.com/package/agentskeptic
- Canonical site: https://agentskeptic.com
- Integrate: https://agentskeptic.com/integrate
- OpenAPI (canonical): https://agentskeptic.com/openapi-commercial-v1.yaml
- Verification Contract Manifest: https://agentskeptic.com/contract/v1.json
- llms.txt (agents, site): https://agentskeptic.com/llms.txt
- llms.txt (repo, raw): https://raw.githubusercontent.com/jwekavanagh/agentskeptic/refs/heads/main/llms.txt
- llms.txt (repo, blob): https://github.com/jwekavanagh/agentskeptic/blob/main/llms.txt
Advanced
Canonical runnable (same API as README ### Code): after npm run build, run node examples/decision-gate-canonical.mjs.
Try it (about one minute)
This is the fastest way to see ROW_ABSENT versus verified on the same screen—the concrete failure mode the section above is about (bundled CRM-style demo, not your production incident yet).
Prerequisite: Node.js ≥ 22.13 (built-in node:sqlite), or use Docker below.
Fast first run on your own DB (canonical local truth loop): after npm install and npm run build, run:
agentskeptic loop --workflow-id <id> --events <path> --registry <path> --db <sqlitePath>This single command verifies state, emits TRUSTED / NOT TRUSTED / UNKNOWN, shows a next action when non-trusted, persists local run history, and auto-compares against your latest compatible prior run. Normative operator contract: docs/local-feedback-loop.md.
Advanced compatibility paths: agentskeptic quick, agentskeptic crossing, and agentskeptic verify-integrator-owned remain supported for specialized workflows and CI parity; they are no longer the default local operator path.
npm install
npm startWhat you should see: npm start builds, seeds examples/demo.db, and runs two workflows from examples/events.ndjson with examples/tools.json. The first case ends complete / verified; the second inconsistent / missing with reason ROW_ABSENT. That contrast is the product on one screen.
npm install does not compile TypeScript. To run the CLI without npm start, run npm run build first so dist/ exists.
Docker quickstart (optional)
Use this when you want the bundled demo without Node 22.13+ on the host. The repo is bind-mounted so examples/demo.db stays on your machine.
Bash / macOS / Linux (repo root):
docker run --rm -it -v "$PWD:/work" -w /work node:22-bookworm bash -lc "npm install && npm start"PowerShell (repo root):
docker run --rm -it -v "${PWD}:/work" -w /work node:22-bookworm bash -lc "npm install && npm start"Minimal model (event → registry → result)
One structured observation (NDJSON line; full schema in Event line schema):
{"schemaVersion":1,"workflowId":"wf_complete","seq":0,"type":"tool_observed","toolId":"crm.upsert_contact","params":{"recordId":"c_ok","fields":{"name":"Alice","status":"active"}}}Registry entry (excerpt; full file is examples/tools.json) telling the engine how that toolId maps to a row check:
{
"toolId": "crm.upsert_contact",
"verification": {
"kind": "sql_row",
"table": { "const": "contacts" },
"identityEq": [{ "column": { "const": "id" }, "value": { "pointer": "/recordId" } }],
"requiredFields": { "pointer": "/fields" }
}
}When the row matches: workflow result (excerpt; demo prints full JSON to stdout):
{
"workflowId": "wf_complete",
"status": "complete",
"steps": [{ "seq": 0, "toolId": "crm.upsert_contact", "status": "verified" }]
}When the row is missing or fields disagree, you get inconsistent / missing and reason codes such as ROW_ABSENT.
What this is (and is not)
Retries, partial failures, and race conditions mean a success flag in a trace is not proof the intended row exists with the right values. The engine derives expected state from your registry and events and compares it to observed state with read-only SELECTs.
| This is | This is not | |-------------|-----------------| | A SQL ground-truth state check against expectations from structured tool activity | Generic observability, log search, or arbitrary unstructured logs | | A verifier for persisted state after agent or automation workflows | A test runner for application code | | Proof that observed DB state matched expectations at verification time | Proof that a tool executed, wrote, or caused that state |
This is for you if you need persisted-row SQL truth after agent or automation runs when the trace looks fine but the DB might not.
This is not for you if you need proof a tool executed, log search as verification, or a model where read-only SQL against your app DB is not the right check. Homepage “for you / not for you” copy lives in website/src/content/productCopy.ts (single source with the site).
Trust boundary (once): a green trace does not prove the row exists with the right values—only whether read-only SELECTs matched expected rows under your rules, not deep causality.
Declared → expected → observed (how reports reason about runs):
- Declared — what the captured tool activity encodes (
toolId, parameters). - Expected — what should hold in SQL under the rules (in Quick Verify, inferred; in contract mode, registry-driven from events).
- Observed — what read-only SQL returned at verification time.
Contract path (registry + events)
CLI: after npm install and npm run build, use agentskeptic loop as the default local command (or node dist/cli.js loop). Postgres: --postgres-url instead of --db (exactly one).
Typical integration:
- Emit one NDJSON line per tool observation (see Event line schema).
- Add a registry entry per
toolId(start fromexamples/templates/). - Run the local truth loop:
npm run build
agentskeptic loop --workflow-id <id> --events <path> --registry <path> --db <sqlitePath>Replay the bundled files: wf_complete / examples/events.ndjson / examples/tools.json / examples/demo.db (same flags as above).
From source without agentskeptic on PATH: node dist/cli.js with the same flags.
Why SQLite in the demo: file-backed ground truth with no extra services. The demo (re)creates examples/demo.db; verification still uses read-only SQL.
Quick Verify and assurance (optional)
Quick Verify (agentskeptic quick): inferred checks, no registry file; provisional, not audit-final—graduate to contract mode for explicit per-tool expectations. Full contract: docs/quick-verify-normative.md.
Input contract: We only accept structured tool activity—JSON or NDJSON that describes tool calls and parameters our ingest model can extract—not arbitrary logs, traces, or unstructured observability text. Verification uses read-only SQL against your database; API-only or non-SQL systems are out of scope for this tool.
npm run build
agentskeptic quick --input test/fixtures/quick-verify/pass-line.ndjson --db examples/demo.db --export-registry ./quick-export.jsonUse --postgres-url instead of --db; - as --input reads stdin.
Assurance (assurance run / assurance stale): multi-scenario sweeps and staleness over saved reports; success paths emit one AssuranceOutputV1 JSON line on stdout (embedded runReport)—Assurance subsystem, examples/assurance/manifest.json.
Sample output (contract demo)
The npm start driver prints human report + workflow JSON to stdout (one stream for the demo). Normal CLI: machine JSON on stdout, human report on stderr—Human truth report. Full success/failure transcripts (same strings as below) are in the acquisition fold at the top of this README.
Success (wf_complete)
Interpretation: Under the configured rules, expected state matched observed SQL for this step—state alignment, not proof of execution.
Failure (wf_missing)
Interpretation: Expected state from the tool activity implied a row observed SQL did not find—inconsistent—a gap traces alone often miss. Still not proof a write was attempted or rolled back.
How this differs from logs, tests, and observability
| Approach | What it tells you | |----------|-------------------| | Logs / traces | A step ran, duration, errors—not “row X has columns Y.” | | Unit / integration tests | Code paths in your repo—not production agent runs against live DB state. | | Metrics / APM | Health and latency—not semantic equality of persisted records. | | Ad-hoc SQL checks / one-off scripts | Same failure mode as Buy vs build—drift, weak ownership, not a durable contract. | | agentskeptic | Whether observed SQL matches expectations from declared tool parameters (contract mode), via read-only SQL—not proof the tool executed. |
When to run it
Run after a workflow (or CI replay of its log), before you treat the outcome as safe for customer-facing or regulated actions.
Inputs: NDJSON observations, registry JSON, read-only SQLite or Postgres. Semantics: docs/relational-verification.md.
Typical uses: block a release, trigger human review, open an incident, or attach a verification artifact to an audit trail.
CI with over-time guarantees: use stateful agentskeptic enforce baseline/check/accept lifecycle—docs/ci-enforcement.md.
Further capabilities (reference)
Everything beyond core contract verification lives in docs/agentskeptic.md—subcommands, hooks, bundles, debug, plan transition, human report layout, exit codes.
Documentation map
| Doc | Purpose |
|-----|---------|
| docs/contract.md | Verification Contract Manifest SSOT — names, hashes, and versions the event/registry/registry-export schemas; one URL, one CI gate |
| docs/epistemic-contract.md | Normative epistemic contract (grounded output vs funnel; ranking limits; telemetry proxies)—single authored source; other docs link or generate from here |
| README — Buy vs build | Canonical buy vs build narrative (failure mode, scripts limits, Quick → Contract) |
| docs/agentskeptic.md | Authoritative CLI and behavior reference (SSOT) |
| docs/quick-verify-normative.md | Quick Verify normative contract |
| docs/verification-product.md | Product intent, trust boundary, authority matrix |
| docs/reconciliation-vocabulary.md | Reconciliation dimension IDs and UI mapping |
| docs/verification-operational-notes.md | First-run runbooks, TTFV, export vs replay coverage |
| docs/langgraph-reference-boundaries.md | LangGraph reference path: emitter/CLI boundaries and test chain |
| docs/langgraph-checkpoint-trust.md | LangGraph checkpoint trust: v3 wire, terminal contract, shared kernel, production gate |
| docs/relational-verification.md | Relational verification semantics |
| docs/ci-enforcement.md | CI enforcement and lock fixtures |
| docs/correctness-definition-normative.md | Correctness and limits (normative) |
Development and testing
Why SQLite: same note as under Contract path (file-backed demo DB; read-only verification SQL).
npm test runs npm run verification:truth (regeneration + contract gate, Postgres distribution, then full journey suite). Requires DATABASE_URL and TELEMETRY_DATABASE_URL (see website/.env.example). Ordering: docs/testing.md.
Full CI parity (Postgres + Playwright for Debug Console): set POSTGRES_ADMIN_URL and POSTGRES_VERIFICATION_URL, then npm run test:ci. See docs/testing.md, .github/workflows/ci.yml, and: docker run -d --name etl-pg -p 5432:5432 -e POSTGRES_PASSWORD=postgres postgres:16.
Commercial CLI (npm) vs OSS (this repo)
Commercial metering (published npm) uses AGENTSKEPTIC_API_KEY + POST /api/v1/usage/reserve as documented in docs/commercial.md — account-pooled quota per billing month.
OSS/unmetered CLI for single-run verification: clone this repo and use the OSS build (WF_BUILD_PROFILE=oss / default npm run build artifact). State over-time enforce needs the commercial CLI and a paid entitlement.
Canonical write-up: docs/commercial.md (npm package, Stripe, keys, telemetry, validation, entitlements; operator metrics in docs/funnel-observability.md—disable with AGENTSKEPTIC_TELEMETRY=0). OSS builds in this repo run contract verify / quick without a license server for stateless runs. Stateful agentskeptic enforce and over-time guarantees require a commercial build per docs/commercial-enforce-gate-normative.md. Example workflow: examples/github-actions/agentskeptic-commercial.yml.
Status, contributing, security
Maturity: 0.x (package.json). APIs, CLI flags, and JSON schemas may evolve; rely on tests and docs for current contracts.
Contributing: see CONTRIBUTING.md.
Security: see SECURITY.md.
License
Released under the MIT License — LICENSE.
