@lannguyensi/harness
v0.27.0
Published
Declarative control plane for agent harnesses — one YAML for grounding, tools, memory, and hooks.
Downloads
4,803
Maintainers
Readme
harness
Declarative control plane for agent harnesses.
One zod-validated YAML manifest for grounding, tools, memory, hooks, policies, and workflows, plus a CLI that describes, validates, diffs, applies, audits, and enforces.
Most config tools tell you what an agent is configured to use.
harnesstells you what an agent is allowed to do, under this exact context, and why.
A coding agent like Claude Code is configured across half a dozen
files (settings.json, CLAUDE.md, memory notes, MCP registrations,
hook scripts, per-project overrides), and no single file answers
"what can this agent do right now, and why is it set up that way?".
harness puts all of it in one YAML you read, validate, and diff;
generates the config the agent loads from it; and at runtime blocks
tool calls that violate the declared rules while recording every
decision.
See it work
One rule, declared in harness.yaml: no session may merge a PR
until it has logged a review.
Claude Code goes to merge PR 42. Before the tool call runs, the
runtime hands the event to harness, which checks it against the
manifest. The hook protocol wire shape is the legacy engine-vocabulary
envelope (operators see this on stderr; agents read it via
permissionDecisionReason when the policy declares no ux: block):
$ harness policy intercept # Claude Code runs this before each tool call
{"decision":"block","reason":"review-before-merge: no matching ledger entry for tag `review:42`","hookSpecificOutput":{"hookEventName":"PreToolUse","permissionDecision":"deny","permissionDecisionReason":"review-before-merge: no matching ledger entry for tag `review:42`"}}Built-in block-enforcement policies ship a ux: block since v0.17.0,
so the agent sees a plain-language three-section form
(docs/for-agents.md);
the engine-vocabulary text above stays in the audit ledger.
Blocked. harness explain says exactly why:
$ harness explain review-before-merge --trace
name: review-before-merge
decision: deny
enforcement: block
reason: no matching ledger entry for tag `review:42`
ledgerTag: review:42
extract:
PR_NUMBER: "42"
requiresEval:
matchedCount: 0
reason: no matching ledger entry for tag `review:42`
# ... (trimmed; the full trace also shows the matched trigger, every extracted variable, and the ledger query)The rule pulled PR_NUMBER=42 out of the tool call and looked for a
review:42 entry in the evidence ledger. There wasn't one. So the
reviewer (or a review subagent) logs that entry, and the same merge
call, retried, goes straight through, no restart, no config edit:
$ harness policy intercept # same call, after the review was logged
$ # (no output, exit 0: allowed)Every one of those decisions is recorded:
$ harness audit --since 1h --policy review-before-merge
timestamp policy outcome reason
------------------- ------------------- ------- --------------------------------------------
2026-05-14 19:09:03 review-before-merge deny no matching ledger entry for tag `review:42`
2026-05-14 19:09:13 review-before-merge allow 1 matching ledger entry for tag `review:42`Declare the rule once; every session is held to it, with a paper trail of every decision.
Concepts in six lines
| Term | What it is |
|------|-----------|
| manifest | The one YAML file (harness.yaml) where you declare everything: tools, hooks, policies, memory. |
| apply | harness apply renders the manifest into the config files the agent runtime actually reads. |
| policy | A rule of the form when the agent does X, require evidence Y. Evaluated at runtime; can block the call. |
| evidence ledger | An append-only log of facts an agent records during a session. Policies check it; audit / explain replay it. |
| hook | A script the agent runtime runs at a lifecycle event (session start, before every tool call, ...). How policies get enforced. |
| policy pack | A reusable bundle of policies, hooks, and templates shipped under one name and enabled with a single manifest key. |
What harness does
flowchart LR
declare["1. Declare<br/><code>harness.yaml</code>"]
apply["2. Apply<br/><code>harness apply</code>"]
enforce["3. Enforce<br/>hooks + policies<br/>at runtime"]
record[("4. Record<br/>evidence ledger")]
observe["5. Observe<br/><code>audit</code> / <code>explain</code> /<br/><code>session-export</code>"]
declare --> apply
apply --> enforce
enforce --> record
record --> observe
observe -. refine .-> declareObserve → refine → declare is the whole loop. The read-side surfaces
(audit, explain --trace, session-export) replay rows the runtime
already recorded, so what flows back into the manifest is grounded in
what actually happened.
Pick your audience
- Operator?
docs/for-humans.md: install through firstapply, first real policy, diagnostics cheat sheet. - Agent (or onboarding one)?
docs/for-agents.md: workflow lifecycle, policy / ledger sequence, CLI cheat sheet by side-effect class, the audit triumvirate. - Writing your own policy?
docs/writing-custom-policies.md: three tripwires, four worked recipes (each validated in CI), author loop, field reference.
Install
npm i -g @lannguyensi/harnessThe CLI binary is harness. Node 20 or newer required.
First-time setup
In a hurry? docs/quickstart.md is the bare
command path, install to wired-in, no prose.
harness init --interactiveGuided wizard. Detects ~/.claude/ and ~/.codex/, MCP servers
already wired in settings.json, harness binary version. Picks a
profile (solo / team / custom) and writes a starting
harness.yaml. Ctrl-C aborts cleanly. Walkthrough +
limitations: docs/init-interactive.md.
Profiles at a glance
| Profile | External accounts / tools required | Best for |
|---------|------------------------------------|----------|
| solo | None. npm + Claude Code is enough. | Single operators who want the Understanding Gate without committing to a tasking system. |
| team | An agent-tasks account (hosted or self-hosted). | Teams that already use agent-tasks for PR review tracking. The merge gate (review:<pr-number> ledger tag) wires against the agent-tasks MCP. |
| full | Same as team plus @lannguyensi/agent-preflight and gh on PATH. | Operators who want every reference policy enforced (dogfood gate, preflight gates, review-subagent gate, merge gate). |
Not using agent-tasks? Pick solo. The team and full review gates currently match only the agent-tasks MCP tool names, so a gh pr create workflow stays unprotected by them today. Tool-agnostic gates that also match gh pr are tracked in the backlog.
If you prefer non-interactive (CI, fresh-VM provisioning), pick a template directly:
harness init --template solo # memory-router + understanding-before-execution pack
harness init --template team # solo + agent-tasks MCP + review-before-merge policy
harness init --template full # everything from the Appendix A reference manifestUse harness init --probe for a JSON snapshot of detected runtimes
and MCPs without writing anything.
Try it without installing
harness dry-run reports which hooks fire and which policies match
for a given tool call, against the reference manifest, before any
ledger I/O:
git clone https://github.com/LanNguyenSi/harness && cd harness
npm install && npm run build
node dist/cli/main.js dry-run "merge PR 42" \
--tool mcp__agent-tasks__pull_requests_merge \
--tool-args '{"prNumber":42}' \
--config docs/examples/full-manifest.yamldocs/examples/full-manifest.yaml is a schema-coverage example, not a
runnable config (the file header spells out the contract). For a
manifest tailored to your machine, install globally and run
harness init --interactive.
Uninstall
harness uninstall is the single-command teardown: dry-run by default,
--apply to mutate, --restore-from <backup> to roll back. Full
inventory + recommended order in docs/uninstall.md.
Status
harness ships in phases. Phases 1 through 6 are released: read-only
inventory → managed edits → declarative truth → policy layer → polish
and dogfood lessons → the Understanding Gate Policy Pack. Phase 7, the
Risk Gate, is next. The current release is v0.20.0.
The phase-by-phase plan with acceptance criteria lives in
docs/ROADMAP.md; what shipped in each version is
in CHANGELOG.md.
Policy Packs
A Policy Pack is a reusable bundle of hooks, policies, instruction
template, and permission profiles shipped under one name and enabled
from harness.yaml with a single key:
policy_packs:
- name: understanding-before-execution
config:
mode: grill_me # fast_confirm | grill_me | strict
permission_profile: safe-start # safe-start | implementation-after-approval | high-risk-grill-meManage packs with harness pack add / remove / list. Two packs ship
today: understanding-before-execution
(forces an Understanding Report before any write-capable tool fires)
and branch-protection
(blocks source mutations on protected branches without an explicit
override). Custom packs from path:, npm:, or git: sources are
out of scope for v1 (see the pack docs for the future-vocabulary
contract).
What's next
Phase 7, Risk Gate. Today's policy model returns a binary
block/allow per matching trigger. Phase 7 lets harness reason about
the action itself (Action Envelope → Context Resolver → Risk
Classifier) and extends the decision space to allow / warn /
require_approval / deny. Motivating use case: block DROP TABLE
users, kubectl delete namespace prod, terraform destroy against
unverified production targets. Full plan in
docs/ROADMAP.md#phase-7--risk-gate.
Bring your favorite agent harness. Add governance.
Why this exists
On 2026-04-23, an agent-grounding checkout that was 16 commits
behind origin led two tasks to be incorrectly called "stale". The
check that would have caught it already existed:
agent-preflight
runs git fetch + git status and emits a structured ready +
confidence-score result. The missing piece was not the check, it was
the deterministic trigger: a SessionStart hook that invokes
preflight run and a policy that gates further work on the result.
Building that wiring needs an agreed-upon place for harness config to
live first. That conversation is the origin of this repo.
Related
agent-grounding: evidence-ledger, claim-gate, review-claim-gate;grounding-mcpis the canonical client surface harness queries throughqueryLedgerByTag.agent-memory: the memory surfaces the control plane inventories.agent-tasks: MCP-registered task platform whose registration + health appear inharness describe.agent-preflight: local preflight validator; the canonical implementation of preflight-hook content harness wires.codebase-oracle: opt-in MCP for multi-repo RAG search; not in Full, wire viaharness add mcp codebase-oracle --command codebase-oracle,mcp.agent-dx: shipsgit-batch-cli, a day-to-day tool whose inventory appears inharness describe.
License
MIT, see LICENSE.
