evalgate
v3.2.0
Published
Eval-gated todos for agentic coding. Your agents can't tick the checkbox until the verifier passes.
Maintainers
Readme
evalgate
Eval-gated todos for agentic coding. Your agents can't tick the checkbox until the verifier passes.
Multi-agent coding is getting real. Claude Code subagents, Codex, Octogent, OpenHarness — they all let you run more agents in parallel. The problem nobody has solved is that more agents means more plausible-looking output that's actually wrong. The completion signal today is "the agent said it was done," which scales terribly.
evalgate fixes that with one primitive: a todo item can't be flipped to [x]
until its attached verifier exits 0.
Zero runtime dependencies. Plain markdown. Plugs into any agent or CI pipeline.
30-second demo
Your todo.md:
- [ ] Implement add(a, b)
- eval: `npm run test:add`
- retries: 2
- [ ] Implement subtract(a, b)
- eval: `npm run test:subtract`
- retries: 3Run evalgate check:
evalgate · checking 2 contracts in todo.md
▸ Implement add(a, b) (implement-add) ... ✓ passed (412ms)
▸ Implement subtract(a, b) (implement-subtract) ... ✗ failed (exit 1, 388ms)
│ subtract
│ ✖ expected 2, got 8
│ at file:///.../subtract.test.js:6:10
Summary: 1 passed, 1 failedadd flips to [x]. subtract stays [ ] — and the failure output is right
there for the agent to read and retry.
Why this matters
Today's agent orchestrators give you more workers. evalgate gives those workers
a contract: each todo is a unit of work with a built-in quality gate. That unlocks:
- Auto-retries with context. The agent sees the verifier output and fixes the root cause instead of re-asking the human.
- Honest progress bars.
[x]means the tests actually passed. - Safe parallelism. Spawn 8 workers on 8 todos; only the ones that actually work commit their checkboxes.
- Budget enforcement per task. Each contract declares its token budget;
evalgatetracks spend and can report overruns. - 24/7 autonomous operation. Trigger contracts on a schedule, file change, or webhook — no human needed to kick things off.
Install
# Global CLI
npm install -g evalgate
# Or clone + link for development
git clone https://github.com/jorgejac1/evalgate
cd evalgate
pnpm install && pnpm build && npm linkQuick start
cd examples/basic
npm install
evalgate list # show contracts + verifiers
evalgate check # run pending verifiers, flip checkboxesContract format
Contracts live in any markdown file (convention: todo.md). A contract is a GFM
task-list item with indented sub-bullet fields:
- [ ] Task title
- eval: `shell command`
- eval.all: `cmd1` | `cmd2`
- eval.any: `cmd1` | `cmd2`
- eval.llm: judge prompt as plain text
- eval.diff: src/file.ts contains "new pattern"
- eval.http: http://localhost:3000/health
- eval.http.status: 200
- eval.http.contains: "healthy"
- eval.http.timeout: 5000
- eval.schema: output.json {"type":"object","required":["id"]}
- retries: 3
- retry-if: exit-code > 1
- budget: 50k
- id: stable-slug
- on: schedule: "0 * * * *"
- on: watch: "src/**/*.ts"
- on: webhook: "/deploy-done"Field reference
| Field | Required | Description |
| ----------- | -------- | ----------- |
| eval | yes* | Shell command. Exit 0 = pass, anything else = fail. |
| eval.all | yes* | Pipe-separated commands — all must exit 0. |
| eval.any | yes* | Pipe-separated commands — any one must exit 0. |
| eval.llm | yes* | Natural-language prompt judged by Claude. Answers PASS or FAIL. |
| eval.diff | yes* | Assert a structural change in a file: contains, not contains, deleted, created, changed. Zero deps. |
| eval.http | yes* | HTTP health check — GET request; passes if status matches (default 200). Uses built-in fetch (Node 18+). |
| eval.http.status | no | Expected HTTP status code. Defaults to 200. |
| eval.http.contains | no | Substring that must appear in the response body. |
| eval.http.timeout | no | Request timeout in ms. Defaults to 10000. |
| eval.schema | yes* | JSON schema validator — <file> <inline-json-schema>. Checks type, required[], properties[].type. Zero deps. |
| retries | no | Max retry attempts hint for orchestrators. |
| retry-if | no | Only consume a retry slot when condition is true: exit-code <op> <n>. Operators: == != > < >= <=. |
| budget | no | Token budget: 50k, 1.5m, or raw integer. |
| id | no | Stable slug for references and logs. Defaults to slugified title. |
| provider | no | Preferred model: opus, sonnet, or haiku. |
| role | no | Agent role hint: coordinator, worker, or linter. |
| mcp | no | Comma-separated list of MCP servers this contract may use. |
| on | no | Trigger: schedule: "<cron>", watch: "<glob>", or webhook: "<path>". |
* Exactly one eval variant is required for a gated contract. Items without any eval are ungated — they behave like normal checkboxes.
Verifier variants
Shell — a single command:
- [ ] Tests pass
- eval: `npm test`Composite all — every step must exit 0:
- [ ] Build and lint pass
- eval.all: `npm run build` | `npm run lint` | `npm test`Composite any — at least one step must exit 0:
- [ ] At least one mirror is up
- eval.any: `curl -f https://mirror-1/health` | `curl -f https://mirror-2/health`LLM judge — Claude evaluates the output; useful for prose, API contracts, or anything hard to assert mechanically:
- [ ] README explains the feature clearly
- eval.llm: Does the README at ./README.md explain the auth flow in plain English?Requires ANTHROPIC_API_KEY. Defaults to claude-haiku-4-5-20251001.
Semantic diff — assert that a specific structural change appeared in a file. Passes if the pattern matches the diff; fails if the file is unchanged or the pattern is absent. Zero external dependencies:
- [ ] Add rate-limit header to responses
- eval.diff: src/middleware.ts contains "X-RateLimit-Remaining"
- [ ] Remove legacy auth module
- eval.diff: src/auth-legacy.ts deletedSupported assertions: contains "<text>", not contains "<text>", deleted, created, changed.
HTTP verifier — issues a GET request and checks status code + optional body substring. Uses Node's built-in fetch (requires Node 18+). Zero external dependencies:
- [ ] Service is healthy
- eval.http: http://localhost:3000/health
- [ ] API returns correct status
- eval.http: http://localhost:3000/api/status
- eval.http.status: 201
- eval.http.contains: "ready"
- eval.http.timeout: 5000| Sub-field | Description |
| --------- | ----------- |
| eval.http: <url> | GET request to <url>. Passes if status matches (default 200). |
| eval.http.status: <n> | Expected HTTP status code. |
| eval.http.contains: <text> | Response body must contain this substring. |
| eval.http.timeout: <ms> | Request timeout in ms (default 10000). |
Schema verifier — reads a JSON file and validates its structure against an inline schema. Zero external dependencies:
- [ ] Agent produced valid output
- eval.schema: dist/output.json {"type":"object","required":["id","score"]}
- [ ] Response is an array
- eval.schema: response.json {"type":"array"}
- [ ] Fields have correct types
- eval.schema: result.json {"type":"object","properties":{"id":{"type":"string"},"score":{"type":"number"}}}Inline schema supports: type (object, array, string, number, boolean), required (array of field names), properties with per-field type checks.
Conditional retry — only consume a retry slot when the exit-code condition is met. Useful when certain exit codes indicate a transient failure (worth retrying) vs a permanent one (not worth retrying):
- [ ] Flaky network test — retry only on timeout (exit code 2)
- eval: `./scripts/smoke-test.sh`
- retries: 3
- retry-if: exit-code == 2
- [ ] Build — retry on any non-zero exit code
- eval: `npm run build`
- retries: 2
- retry-if: exit-code != 0Supported operators: == != > < >= <=. Without retry-if, every failure unconditionally consumes a retry slot (existing behaviour).
Trigger variants
# Run on a cron schedule
- on: schedule: "*/30 * * * *"
# Re-check when source files change
- on: watch: "src/**/*.ts"
# Fire when a webhook hits the daemon
- on: webhook: "/deploy-done"CLI reference
| Command | Description |
| ------- | ----------- |
| evalgate check [path] | Run verifiers on all pending contracts. |
| evalgate list [path] | List all contracts with status and verifier. |
| evalgate retry <id> [path] | Rerun a single contract, injecting last failure as context. |
| evalgate log [path] | Show run history. Flags: --contract=<id>, --failed, --limit=N. |
| evalgate msg send <from> <to> <kind> [payload-json] [path] | Send a structured message between agents. |
| evalgate msg list [path] | List messages. Flags: --to=<agent>, --kind=<kind>. |
| evalgate serve [cwd] | Start the MCP server on stdio. |
| evalgate watch [path] | Start the trigger daemon (schedule / watch / webhook). |
| evalgate ui [path] [--port=N] | Launch web dashboard at localhost:7777. |
| evalgate dash [path] | ANSI terminal dashboard — live contract status. |
| evalgate budget [path] | Show token spend vs budget per contract. |
| evalgate budget <id> <tokens> [path] | Record token usage for a contract. |
| evalgate suggest "<title>" [path] | Find similar past completions for a new contract. |
| evalgate patterns [path] | Analyse failure patterns across all contracts. |
| evalgate export [path] [--format=json\|md] | Export full project snapshot. |
| evalgate diff <snap1.json> <snap2.json> [--format=text\|json\|md] | Compare two snapshots. |
| evalgate swarm [path] [--concurrency=N] [--resume] [--agent=cmd] | Spawn parallel agent workers. |
| evalgate swarm status [path] | Show last swarm run status. |
Swarm Cockpit
evalgate swarm spawns parallel agent workers — one per pending contract — each in
its own git worktree. Workers implement their contract independently, then evalgate
runs the verifier in the worktree. Only workers whose verifier passes get merged back.
# Run swarm with up to 3 parallel workers (default)
evalgate swarm todo.md
# Custom concurrency
evalgate swarm todo.md --concurrency=5
# Resume a previous run (skip already-done workers)
evalgate swarm todo.md --resume
# Use a custom agent command
evalgate swarm todo.md --agent="claude --model opus"Output during a swarm run:
evalgate swarm · todo.md · concurrency 3
✓ Implement add(a, b) (implement-add) done
✗ Implement subtract(a, b) (implement-subtract) failed
✓ Add TypeScript types (add-typescript-types) done
Swarm summary: 2 merged, 1 failed, 0 skippedChecking swarm status
After a run, inspect each worker's outcome:
evalgate swarm status todo.mdevalgate swarm status · swarm-a1b2c3d4 (2024-01-15 10:32:00)
✓ Implement add(a, b) done (implement-add)
duration: 8420ms
verifier: passed log: .evalgate/swarm/logs/implement-add.log
✗ Implement subtract(a, b) failed (implement-subtract)
duration: 12300ms
verifier: failed log: .evalgate/swarm/logs/implement-subtract.logRetrying a failed worker
# Retry a single failed contract (shows last failure output first)
evalgate retry implement-subtract todo.mdThe retry command pulls the last failure output from the durable run log and displays it before re-running the verifier, giving the agent concrete context to fix the root cause.
Web UI
# Browser dashboard at localhost:7777
evalgate ui todo.md
# Custom port
evalgate ui todo.md --port=8080The web UI shows live contract status (auto-refreshing via SSE), run history,
failure output per contract, and budget gauges. It serves from Node's built-in
http module — no framework, no external dependencies.
MCP server
evalgate serve exposes 15 tools over stdio MCP. Any MCP client (Claude Desktop,
Cursor, Windsurf) can invoke contracts as tools without touching the CLI.
Add to Claude Desktop
In ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"evalgate": {
"command": "evalgate",
"args": ["serve", "/path/to/your/project"]
}
}
}Named workspaces (v0.11+)
Expose multiple todo.md files as named workspaces — each tool accepts an
optional workspace parameter to select which file to operate on:
{
"mcpServers": {
"evalgate": {
"command": "evalgate",
"args": ["serve", "/path/to/project",
"--workspace", "auth=/path/to/auth/todo.md",
"--workspace", "payments=/path/to/payments/todo.md"
]
}
}
}Or programmatically:
import { startMcpServer } from "evalgate";
startMcpServer(process.cwd(), {
workspaces: {
auth: "/project/.conductor/tracks/auth/todo.md",
payments: "/project/.conductor/tracks/payments/todo.md",
},
});Call list_workspaces from any MCP client to enumerate configured workspaces.
All 15 tools accept workspace: "<name>" to target a specific file.
MCP tools
| Tool | Description |
| ---- | ----------- |
| list_workspaces | List all configured named workspaces. |
| list_all | All contracts with status. |
| list_pending | Contracts not yet passing. |
| list_triggers | Contracts with on: triggers and their next fire time. |
| run_eval | Run a single contract by id. |
| check_all | Run all pending contracts. |
| get_retry_context | Get last failure output formatted as a retry prompt. |
| get_run_history | Full run log, filterable by contract or status. |
| get_last_failure | Last failure details for a contract. |
| send_message | Send a structured agent message. |
| list_messages | List messages, filterable by recipient or kind. |
| get_provider_hints | Provider and role hints for all contracts. |
| report_token_usage | Record token spend for a contract. |
| suggest_template | Find similar past completions for a new task. |
| get_patterns | Failure pattern analysis across all contracts. |
| export_state | Full project snapshot as JSON or markdown. |
Claude Code integration
Wire the provided PostToolUse hook to auto-check todo.md whenever Claude Code
edits it:
chmod +x hooks/claude-code-posttooluse.shAdd to ~/.claude/settings.json (or .claude/settings.json in your repo):
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write|MultiEdit",
"hooks": [
{ "type": "command", "command": "/abs/path/to/hooks/claude-code-posttooluse.sh" }
]
}
]
}
}Now whenever Claude Code edits todo.md, evalgate check runs automatically.
If a verifier fails, the hook exits 2 and Claude Code feeds the failure output
back into the agent's next turn.
Triggers
Start the trigger daemon with evalgate watch. It handles three trigger kinds:
Schedule — cron expression, runs the contract verifier at the specified interval:
- [ ] Sync exchange rates
- eval: `node scripts/sync-rates.js`
- on: schedule: "0 * * * *"Watch — glob pattern, re-checks when matching files change:
- [ ] Auth tests must pass
- eval: `pnpm test src/auth`
- on: watch: "src/auth/**"Webhook — HTTP endpoint, fires when a POST hits the daemon:
- [ ] Deploy smoke test
- eval: `./scripts/smoke-test.sh staging`
- on: webhook: "/deploy-done"The webhook server listens on localhost:7778 by default.
Agent messaging
Agents in a multi-worker setup can exchange structured messages through evalgate's
message bus. Messages persist to .evalgate/messages.ndjson.
# Coordinator tells worker-2 the schema is ready
evalgate msg send coordinator worker-2 schema_ready '{"table":"users"}'
# Worker lists its inbox
evalgate msg list --to=worker-2
# Or via MCP
send_message({ from: "coordinator", to: "worker-2", kind: "schema_ready", payload: {...} })
list_messages({ to: "worker-2" })Message envelopes carry from, to, kind, payload, and a correlation_id for
tracing request/response pairs across turns.
Budget tracking
Declare a token budget on any contract. Workers report their spend; evalgate tracks cumulative usage and warns on overrun.
- [ ] Generate migration SQL
- eval: `pnpm db:validate`
- budget: 50k
- id: gen-migration# Worker reports spend after its turn
evalgate budget gen-migration 12400
# See all contracts vs budget
evalgate budget
# gen-migration 12,400 / 50,000 24%
# auth-refactor 61,200 / 50,000 122% ⚠ over budgetThe report_token_usage MCP tool lets agents self-report without shelling out.
Terminal dashboard
# ANSI live dashboard in the terminal
evalgate dash todo.mdThe ANSI dashboard is useful inside tmux or when you want a heads-up display without leaving the terminal. It refreshes on every run event via the durable log.
Memory and learning
evalgate indexes successful contract completions so future contracts can get a head start:
# Find templates similar to a new task
evalgate suggest "migrate users table to Postgres"
# 85% match: migrate_products_table — passed in 1 attempt
# 72% match: migrate_orders_table — passed in 2 attempts, retries=3
# See failure patterns across all contracts
evalgate patterns
# Most common failure: exit 1 in test:subtract — 8 occurrences
# Fastest to fix after failure: lint contracts — avg 1.2 retriesThe suggest_template and get_patterns MCP tools expose the same data to agents
directly, so they can self-tune without human intervention.
Persistence
All state lives in .evalgate/ at the project root:
.evalgate/
runs.ndjson — full run history (contract id, exit code, output, duration)
messages.ndjson — agent message log
budget.ndjson — token spend per contract
swarm-state.json — last swarm run state
swarm/logs/ — per-worker agent session logsNDJSON format — human-readable, grep-friendly, no database required.
Swarm hooks (v2.3+)
runSwarm accepts optional inline callbacks alongside the existing swarmEvents emitter. Hooks fire per-worker and are useful for orchestrators that need inline reactions without setting up a global listener.
import { runSwarm, estimateUsd } from "evalgate";
import type { BudgetExceededEvent } from "evalgate";
const controller = new AbortController();
await runSwarm({
todoPath: ".evalgate/todo.md",
concurrency: 4,
// Called when a worker starts; worker.id / worker.contractTitle are populated.
onWorkerStart(worker) {
console.log(`→ started ${worker.contractTitle}`);
},
// Called when a worker reaches a terminal state (done / failed / timeout).
onWorkerComplete(worker, { status, failureKind }) {
console.log(`← ${status} ${worker.contractTitle}${failureKind ? ` (${failureKind})` : ""}`);
},
// Called when a contract's cumulative spend crosses its declared budget.
// Abort the controller here to stop spawning new workers.
onBudgetExceeded(evt: BudgetExceededEvent) {
console.warn(`budget exceeded: ${evt.totalTokens} tok / $${evt.estimatedUsd.toFixed(4)}`);
controller.abort();
},
// Pass an AbortSignal to stop new-worker spawning without killing in-flight workers.
// In-flight workers run to completion (or their agentTimeoutMs).
signal: controller.signal,
});resumeSwarm is a thin convenience wrapper that picks up pending workers from a prior run:
import { resumeSwarm } from "evalgate";
// stateFile is the path to .evalgate/swarm-state.json
await resumeSwarm(".evalgate/swarm-state.json", { concurrency: 2 });estimateUsd replaces hardcoded pricing math in downstream consumers:
import { estimateUsd } from "evalgate";
const usd = estimateUsd(inputTokens, outputTokens); // default: sonnet4 rates
const usd2 = estimateUsd(inputTokens, outputTokens, "opus4"); // opus4 rates
const usd3 = estimateUsd(inputTokens, outputTokens, "haiku4"); // haiku4 ratesProgrammatic API
import { parseTodo, runContract, updateTodo } from "evalgate";
import { readFileSync, writeFileSync } from "node:fs";
const src = readFileSync("todo.md", "utf8");
const contracts = parseTodo(src);
const results = [];
for (const c of contracts.filter((c) => !c.checked && c.verifier)) {
results.push(await runContract(c, process.cwd()));
}
writeFileSync("todo.md", updateTodo(src, results));Full export surface:
// Core
import { parseTodo, runContract, runShell, updateTodo } from "evalgate";
// Swarm orchestration
import { runSwarm, resumeSwarm, retryWorker, loadState, swarmEvents } from "evalgate";
import type { SwarmOptions, SwarmResult, SwarmState, SwarmEvent, BudgetExceededEvent } from "evalgate";
// Persistence
import { appendRun, queryRuns, getLastFailure, getLastRun, onRun } from "evalgate";
import { sendMessage, listMessages } from "evalgate";
import { reportTokenUsage, queryBudgetRecords, getTotalTokens, getBudgetSummary, estimateUsd } from "evalgate";
// Memory + analysis
import { suggest, detectPatterns, exportSnapshot, snapshotToMarkdown, diffSnapshots, diffToMarkdown } from "evalgate";
// Servers
import { startMcpServer, startUiServer, startWatcher, startDash } from "evalgate";
import type { McpServerOptions } from "evalgate";
// Cron helpers
import { parseCron, matchesCron, nextFireMs } from "evalgate";Other integrations
Codex / any CLI / git hooks
evalgate check is provider-agnostic. Wire it anywhere a shell command runs:
# post-commit hook inside a worktree
evalgate check todo.md || echo "Contracts failed — review before merging."CI (GitHub Actions)
- name: Check evalgate contracts
run: npx evalgate checkRoadmap
| Version | Feature | Status |
| ------- | ------- | ------ |
| v0.1 | Parser, shell verifier, CLI (check, list), Claude Code hook | Shipped |
| v0.2 | MCP server (15 tools), evalgate serve | Shipped |
| v0.3 | Triggers (schedule, watch, webhook), evalgate watch | Shipped |
| v0.4 | Durable run log, structured agent messaging, evalgate retry | Shipped |
| v0.5 | Web UI (evalgate ui), ANSI dashboard (evalgate dash) | Shipped |
| v0.6 | Budget tracking, provider/role hints, MCP-scoped contracts | Shipped |
| v0.7 | Memory/learning (suggest, patterns), export, failure analysis | Shipped |
| v0.8 | Composite verifiers (eval.all, eval.any), LLM-judge (eval.llm), diff, GitHub Actions CI, Biome linter | Shipped |
| v0.9 | Swarm orchestrator — parallel workers, git worktrees, verifier-gated merge | Shipped |
| v0.10 | Export swarm/worktree/spawn APIs for orchestrator consumers, retryWorker with failure-context injection | Shipped |
| v0.11 | MCP named workspaces — expose multiple todo.md files as a single MCP server with workspace routing | Shipped |
| v0.12 | Structured swarm events — "eval-result", "cost", "task-complete" typed events on swarmEvents; SwarmEvent discriminated union exported | Shipped |
| v0.13 | Re-check watch mode — evalgate check --watch re-runs failing contracts on file change; TDD inner loop for agents | Shipped |
| v0.14 | Semantic-diff verifier — eval.diff kind: assert a structural change happened in a file (pattern/hash-based, zero deps) | Shipped |
| v1.0 | API stability declaration — stable public surface, VERSION export, coordinated with conductor v1.0. Agent-agnostic context injection (taskContext on SpawnOpts/SwarmOptions), {task} placeholder in agentArgs for non-Claude CLIs, concurrent merge fix (mutex serializes commit+merge to eliminate todo.md conflicts at any concurrency) | Shipped |
| v2.0 | Repo-level merge mutex, -X theirs conflict resolution, DiffVerifier, VERSION re-export | Shipped |
| v2.1 | FailureKind typed errors (worktree-create, agent-crash, agent-timeout, verifier-fail, verifier-timeout, merge-conflict), failureKind on WorkerState + TaskCompleteEvent, agent timeout (agentTimeoutMs on SpawnOpts), verifier timeouts (shell + LLM + composite aggregate timeoutMs), "worker-start" + "worker-retry" events on swarmEvents | Shipped |
| v2.2 | Eval type expansion — eval.http: (HTTP health check, uses built-in fetch), eval.schema: (JSON schema validation), conditional retry (retry-if: exit-code > 1) | Shipped |
| v2.3 | Plugin hooks + Resume API — onWorkerStart / onWorkerComplete / onBudgetExceeded hooks in SwarmOptions; AbortSignal support (signal option stops new-worker spawning cleanly); resumeSwarm(stateFile) public API; estimateUsd(input, output, model?) replaces hardcoded pricing math; log pagination (offset, from, to on queryRuns) | Shipped |
v1.0 Stability
As of v1.0.0 the public API surface exported from evalgate (src/index.ts) is stable and follows semantic versioning:
- Breaking changes require a major version bump
- New exports are added in minor releases
- Bug fixes ship as patches
Stable public exports: runSwarm, resumeSwarm, retryWorker, swarmEvents, estimateUsd, parseTodo, runContract, runShell, budget.*, log.*, telegram.*, startMcpServer, startCheckWatch, parseCron, matchesCron, nextFireMs, worktree.*, and all exported types from types.ts
Also exported: VERSION — the current package version as a string, useful for downstream consumers that want to display or validate the evalgate version.
import { VERSION } from "evalgate";
console.log(VERSION); // "1.0.0"Prior art and positioning
- conductor — multi-agent orchestrator built on top of evalgate. If you want track-scoped parallel workers with a web dashboard and CLI, use conductor. evalgate is its quality gate engine.
- Octogent / OpenHarness — orchestrate multiple Claude Code sessions.
evalgateslots underneath them as the quality-gate layer they're missing. - promptfoo / braintrust — eval LLM outputs at the prompt level.
evalgateevals agent-produced changes at the task level. - Claude Code native subagents — invisible spawning, no quality gate.
evalgatecontracts are plain markdown you can read, edit, and version.
conductor ← orchestrator built on evalgate (tracks, UI, retry)
↓
evalgate ← primitive, no deps, quality gate layer
↑
Octogent ← orchestrator, Claude Code only, no quality gate
OpenHarness ← full harness, multi-providerContributing
PRs welcome. Keep the zero-dependency constraint — it's a hard rule, not a preference.
New verifier kinds belong in types.ts first, then verifier.ts, then a test.
The parser is the most critical file; edge cases matter more than features.
Development setup
pnpm install # install dev deps (typescript, tsx, biome)
pnpm build # compile TypeScript → dist/
pnpm test # run all unit tests
pnpm typecheck # TypeScript strict check, no emit
pnpm lint # biome lint check (src/ and test/)
pnpm lint:fix # auto-fix lint issuesA pre-commit hook runs typecheck + lint automatically on every commit.
If it blocks, run pnpm lint:fix to auto-fix, then re-stage.
Adding a verifier kind
- Add the interface to
src/types.tsand union it intoVerifier - Handle it in
src/parser.ts(buildVerifier) - Handle it in
src/verifier.ts(runContract) - Add tests in
test/parser.test.tsand a newtest/<kind>.test.ts - Export from
src/index.ts
License
MIT
