ai-engineering-harness
v1.2.3
Published
Engineering discipline and workflow guardrails for AI coding agents (Claude, Cursor, Codex, Gemini).
Maintainers
Readme
ai-engineering-harness
Professional workflow guardrails for AI coding agents.
A markdown-first, open-source kit that helps agents restore context, plan before coding, verify with evidence, ship reviewer-ready summaries, and preserve durable project knowledge.
Quickstart · Commands · Providers · Demo · Landing page
In 30 seconds
AI coding agents are fast at editing files, but they often skip engineering discipline:
- They start with stale context.
- They code before the plan is clear.
- They claim success without real evidence.
- They end sessions without durable handoff artifacts.
ai-engineering-harness gives them a repeatable operating contract:
Session Start → Discuss → Plan → Run → Verify → Ship → RememberThe result is a lighter-weight, easier-to-audit workflow for real software work, not just prompt-driven code generation.
Why this instead of manual Cursor rules or prompt packs?
| Approach | Limitation | Harness answer |
| --- | --- | --- |
| Hand-written rules | No proof they improve outcomes | Deterministic evals (aih eval) with A/B reports |
| Single-provider prompt repos | Fragmented install surfaces | Declarative provider manifests + one installer |
| Workflow markdown only | Hard to measure discipline | Phase guards, telemetry (aih insights), evidence artifacts |
See compatibility matrix and evals.
Why teams use it
- Professional workflow: command contracts, phase guards, and explicit stop conditions
- Easy to inspect: markdown artifacts live in the repo and are readable without a special UI
- Honest verification:
VERIFY.md,REPORT.md, andPR_MESSAGE.mdare grounded in real evidence - Open-source friendly: works as a repo-level discipline layer, not a closed orchestration platform
Quickstart
First time? Start with Your First 5 Minutes.
Inside your target project:
npx ai-engineering-harness install
npx ai-engineering-harness status
npx ai-engineering-harness doctorNon-interactive install:
npx ai-engineering-harness install --provider claude --yesNote: --provider is preferred; --runtime is a deprecated alias.
The Node.js CLI (npx ai-engineering-harness ...) is the only supported install and lifecycle surface.
Wizard details: docs/npx-cli-ux.md, docs/terminal-wizard-ux.md
What it gives you
| Layer | Purpose |
| --- | --- |
| Agent system prompt | Senior role, MUST/MUST NOT rules, response formats |
| Session Start | Restore active session, memory, blockers, and next command |
| Commands | Canonical workflow contracts for start, discuss, plan, run, verify, ship, and remember, plus compatibility helpers |
| Prompt templates | Structured execution with blocked and ready branches |
| Session memory | Store work by session instead of flat root dumps |
| Tool discovery | Route to git, rg, worktree, markitdown, and code-graph fallbacks |
| Hooks | Guard phase transitions and record evidence |
| Skills | Package reusable or session-specific capability |
| Reports | Generate REPORT.md and PR_MESSAGE.md from real changes |
TypeScript & JSDoc Support
Full type definitions and JSDoc comments for IDE autocomplete:
import type { InstallOptions } from 'ai-engineering-harness'
const options: InstallOptions = {
target: './my-project',
dryRun: true,
force: false,
}See docs/typescript-usage.md for the full API reference.
Evals
The harness includes an eval subsystem for deterministic A/B comparisons between with-harness and without-harness task runs. Reports are tagged as synthetic-fixture by default, and can be promoted to live-provider-command when you run a configured provider CLI via --live-provider-command "<cmd>" or EVAL_PROVIDER_COMMAND.
npx ai-engineering-harness eval list
npx ai-engineering-harness eval run sample-bugfix --provider codex --yes
npx ai-engineering-harness eval report <run-id>See docs/evals.md for the benchmark model and report format.
Insights
Summarize local harness telemetry from .harness/history/events.jsonl:
npx ai-engineering-harness insights
npx ai-engineering-harness insights --target . --jsonSee docs/insights.md.
Comparison: with vs without
| Scenario | Without harness | With harness | | --- | --- | --- | | Agent starts a task | Reads goal, starts coding | Restores session state, maps repo/current context, then discusses and plans | | Agent finishes coding | Says "done" and ships | Runs checks, writes evidence, prepares report artifacts | | Session ends | Context disappears | Decisions, state, and lessons are preserved | | Next session | Starts from scratch | Continues from explicit session state | | PR review | Code only | Plan, rationale, verification evidence, and change summary |
The difference: without the harness, your agent is mostly a code editor. With the harness, it behaves more like an engineer operating inside a process.
Canonical commands
harness-start
harness-map
harness-discuss
harness-plan
harness-run
harness-verify
harness-ship
harness-rememberCanonical command IDs use hyphen form only, for example harness-plan.
Claude project commands may expose them as /harness-plan.
Do not use legacy colon-separated or underscore forms.
Session Start
Every workflow begins with Session Start.
harness-start restores:
- active session
- current goal and phase
- blocked state
- durable memory and hazards
- tool context
- repository/current context:
- important paths
- conventions
- commands
- quality gates
- provider entrypoints
- harness artifacts
- constraints
- likely affected areas when an active goal exists
- next allowed command
No implementation, verification, or shipping should happen before session state is established.
harness-map is kept as a backward-compatible manual context refresh command. It is not part of the primary workflow because harness-start already performs context mapping.
Harness Storage
Primary workflow storage lives under .harness/:
.harness/
├── STATE.md
├── context.md
├── tasks/
│ └── <task-id>.md
├── history/
│ └── events.jsonl
├── memory/
│ ├── project.md
│ ├── decisions.md
│ ├── conventions.md
│ └── lessons.md
└── archive/
└── tasks/.harness/STATE.mdis the current active pointer..harness/context.mdstores repo/current-goal context produced byharness-start..harness/tasks/*.mdstores task-level working context when task tracking is enabled..harness/history/events.jsonlstores append-only event history..harness/memory/stores durable knowledge extracted byharness-remember.
Agent System Prompt
The harness includes a provider-neutral system prompt that pushes agents toward senior-engineering behavior instead of optimistic assistant behavior.
It defines:
- phase discipline
- MUST and MUST NOT rules
- blocked-state behavior
- evidence standards
- response and report expectations
Source: agent-system/SYSTEM_PROMPT.md
Ship means PR-ready
harness-ship does more than say "done".
When verification supports it, it prepares:
REPORT.mdPR_MESSAGE.mdCHANGE_SUMMARY.md
based on real git changes and verification evidence.
Provider support
Support tiers vary significantly. Understand what your provider can do before relying on advanced features.
| Capability | Claude | Cursor | Codex | Gemini | | --- | --- | --- | --- | --- | | Slash commands | 8 native | Rules fallback | Rules fallback | Rules fallback | | Workers/subagents | 4 native | Manual setup | Manual setup | Manual setup | | Lifecycle hooks | 4 events | Manual setup | Manual setup | Manual setup | | Grade | ⭐⭐⭐ A | ⭐⭐ C+ | ⭐⭐ C+ | ⭐⭐ C+ |
What this means:
- Claude: strongest path, with native command and worker support
- Cursor, Codex, Gemini: core discipline works, but hooks and advanced behavior need manual setup or fallbacks
Provider-specific setup: docs/provider-rule-configuration.md, docs/adoption-guide.md
The phase discipline itself is platform-agnostic and works everywhere.
File layout
.ai-harness/ capability cache (commands, templates, skills, agent-system)
.harness/ project router and durable memory
.harness/sessions/ working artifacts per session
.claude/ Claude provider adapter (when installed)
.cursor/rules/ Cursor provider adapter
AGENTS.md generic / Codex fallbackDetails: docs/session-memory.md, docs/private-capability-cache.md
Demo
End-to-end workflow-artifact dogfood: examples/dogfood-tiny-node-api
cd examples/dogfood-tiny-node-api
npm testThe demo shows workflow artifacts and verification evidence in VERIFY.md. It is not a claim that every provider behaves identically.
Transcript: TRANSCRIPT.md
Docs
| Topic | Doc | | --- | --- | | Agent system prompt | agent-system/SYSTEM_PROMPT.md | | Session Start | docs/session-start.md | | Daily dev report | docs/daily-dev-report.md | | Provider rules | docs/provider-rule-configuration.md | | Tool discovery | docs/tool-discovery-and-routing.md | | Hooks and skills | docs/hooks-and-skills-layer.md | | Session memory | docs/session-memory.md | | Command guardrails | docs/command-guardrails.md |
Release notes: docs/v1.2.3-release-notes.md
Limitations
- Provider-native command support differs; Claude is the strongest path.
- Hooks are provider-specific.
- Optional tools such as
rg,markitdown, and code-graph integrations are best-effort. - Human approval is still required for risky or ambiguous decisions.
- This is a guardrail kit, not an autonomous software engineer or orchestration server.
Maintainers
node bin/validate.js
npm test
cd site && npm run buildPublish: docs/npm-publish.md
Status
v1.2.3: patch release — Stack scanner with framework detection and domain inference. New harness scan CLI command. harness domains now auto-scans. Fixed Codex hook router crash on non-shell tools.
MIT · CONTRIBUTING.md · SECURITY.md
