aih-harness-cli
v0.1.2
Published
AI Software Harness - local multi-CLI orchestration for Codex, Claude Code, and Gemini CLI
Maintainers
Readme
AI Software Harness
AI Software Harness (aih) is a Rust-based local software factory that turns one natural-language request into a runnable project.
The first version combines two proven open-source ideas:
- Kim Orchestrator: use a staged multi-CLI workflow where the harness plans the work, Codex builds, and Gemini reviews.
- Superpowers: use a disciplined development workflow with planning, small tasks, verification, review, and completion evidence.
Product Shape
aih --project ./work/stopwatch-demo "make a local stopwatch web app"
-> create a project workspace
-> write a build prompt, questionnaire seed, demo seed, acceptance seed, and plan
-> auto-detect Codex, Claude Code, and Gemini CLI
-> use full Claude/Codex/Gemini orchestration when available, or downgrade to one available CLI
-> optionally ask the user to answer questionnaire items
-> call Codex to build the cheapest useful demo first
-> optionally ask the user to confirm before full implementation
-> call Gemini to review the result against questionnaire, recommended style, demo plan, and acceptance contract
-> run the generated self_check.sh
-> write logs, run metadata, and summaryThe goal is not to build a full IDE. The goal is a reliable local workbench for quickly turning small ideas into runnable software.
Upstream References
The project currently vendors only reference source under third_party/:
third_party/kim-orchestrator: staged Claude/Codex/Gemini CLI orchestration patterns.third_party/superpowers: planning, TDD, review, and verification workflow patterns.
These repositories are references, not code we copy wholesale into the harness.
MVP Command
aih [options] "<requirement>"Supported first-pass options:
--project PATH required for new projects; exact output directory
--continue PATH continue iterating an existing generated project
--name NAME optional run label; does not choose the project path
--plain disable rich terminal UI
--workflow MODE default superpower; use single for one-agent mode
--agent AGENT builder for single mode; supports auto, codex, claude, gemini, noop
--model MODEL pass model to the builder agent
--no-test skip self_check.sh execution
--review run Gemini review in single mode
--qa run frontend QA in single mode; superpower does this automatically
--agent-terminals open per-agent Terminal log windows; default on macOS
--no-agent-terminals disable per-agent Terminal log windows
--no-interactive exit immediately after writing the summary
--fix-rounds N repair rounds, default 1
--help show helpRun Locally
From this repository:
./bin/aih --help
./bin/aih --project ./work/stopwatch-demo "build a local stopwatch web app"
./bin/aih --continue ./work/stopwatch-demo "polish the UI and fix any broken buttons"bin/aih runs the Rust implementation. If target/debug/aih is missing or older than src/main.rs, it compiles the binary with rustc first.
Install And CLI Detection
AIH is a single Rust binary and is intended to run on macOS, Linux, and Windows. The recommended distribution layer is npm because it gives every platform the same CLI entry:
npm install -g aih-harness-cli
aih --project ./work/stopwatch-demo "build a local stopwatch web app"The npm package exposes aih through a small Node launcher. In source installs it compiles src/main.rs with rustc on first run and then reuses the platform binary under target/npm/. Release packages can later swap this for prebuilt per-platform binaries. For local repo development, ./bin/aih still works on Unix-like systems.
Requirements for source npm installs:
- Node.js 18+
- Rust toolchain with
rustc - At least one AI CLI on
PATHfor normal work:codex,claude, orgemini
At startup AIH clearly displays local install and login readiness for all three AI CLIs before creating a project:
Codex CLI installed / login status
Claude Code installed / login status
Gemini CLI installed / availability statusAt least one of them must be installed and locally usable unless --agent noop is used for smoke tests. If none are available, AIH stops before creating the project and asks the user to install or log in first. Full Superpower orchestration runs when both Claude Code and Codex are ready; Gemini review is used when Gemini is ready. If only one AI CLI is ready, AIH automatically uses that one tool as Planner, Builder, and Reviewer.
By default this runs the best available flow: Claude+Codex enable the staged Superpower workflow, Gemini adds review, and a single available CLI can still build a runnable project. Use --continue to keep iterating an existing project instead of starting from a blank workspace.
For a fast harness-only smoke test that does not call Codex:
./bin/aih --agent noop --project ./work/smoke "make a smoke test project"Verify
rustc --edition=2021 src/main.rs -o target/debug/aih
./bin/aih --workflow single --agent noop --project ./work/smoke "make a smoke test project"
./bin/aih --workflow single --agent noop --no-agent-terminals --continue ./work/smoke "add one small improvement"The active implementation is Rust-only. Legacy Python package code and compatibility tests were removed so there is one runtime to reason about.
Continue A Project
Generated projects are meant to be living workspaces. If the result is not good enough, run another iteration against the same directory:
./bin/aih --continue ./work/smoke "make the layout clearer and fix broken buttons"Continuation runs keep the existing project files, write run-specific prompt/summary/review files, reuse HARNESS_TASKS.tsv as the current iteration ledger, and produce a fresh log under logs/. Claude plans the delta, Codex modifies the existing code, Gemini reviews the new result, and Harness verifies again.
After an interactive run finishes, aih keeps the terminal open with a short prompt. You can try the project, then type a follow-up requirement directly into that prompt to start another iteration against the same project. Press Enter on an empty line to exit. Use --no-interactive for scripts and smoke tests.
Live Output
Long-running agent commands stream useful progress to the terminal while preserving full raw output in logs/.
The default terminal view is a pinned orchestration map, not a flat grid of agents:
- Harness is shown first as the Rust runtime that owns workspace setup, prompts, handoffs, verification, interrupt reports, and repair loops.
- The work is then shown as a vertical pipeline:
1 PLAN / Claude Code->2 BUILD / Codex->3 REVIEW / Gemini CLI->4 VERIFY / Harness. - Each stage shows role, state, input artifact, output artifact, current activity, elapsed time, and recent trace.
- Codex, Claude Code, and Gemini CLI readiness is detected at startup and remains pinned on the main control deck for the whole run, including installed/missing and logged-in/ready state.
- Handoffs are displayed as a trail under the pipeline so the execution order stays clear.
- A quantified task ledger is displayed under the pipeline. Claude writes
HARNESS_TASKS.tsvwith fine-grained tasks, owners, statuses, titles, and acceptance evidence; agents update the board by printingCLAIM TASK <id>andDONE TASK <id>.
The main terminal is a pinned control deck. It refreshes in place, shows the hierarchy, current owner, task progress bars, handoffs, and the pixel puppy location, but it does not scroll raw worker logs.
Claude and Codex are also prompted to emit structured progress lines such as PROGRESS: PLAN, PROGRESS: CODE, PROGRESS: TEST, and PROGRESS: FIX; worker terminals render these with distinct symbols so planning, coding, testing, and repair work are easy to scan.
Execution is task-ledger driven by default: Claude must write HARNESS_TASKS.tsv, Harness normalizes owners, sends Codex only the codex implementation queue, sends Gemini the gemini review queue, and keeps harness verification tasks for self-check/QA/summary. Codex claims and completes quantified tasks one by one, which keeps the build anchored to small verifiable units instead of a single vague request.
Execution is also acceptance-first by default: Claude must write HARNESS_ACCEPTANCE.md with Given/When/Then scenarios, automated checks, manual QA checks, edge cases, and regressions. Codex is instructed to encode tests and self_check.sh before or alongside feature code, and Gemini reviews the result against that acceptance contract.
Execution is demo-first in interactive runs: Claude writes HARNESS_QUESTIONNAIRE.md, HARNESS_QUESTIONNAIRE.tsv, HARNESS_STYLE_OPTIONS.md, and HARNESS_DEMO_PLAN.md; Harness opens browser pages for the questionnaire, demo style selection, demo confirmation, and iteration choice. The pages support Chinese/English UI switching and are choice-based, so the workflow does not require terminal typing. Every demo must expose a working local or LAN URL through HARNESS_DEMO_URL.md; screenshots, source files, or prose alone are not enough.
When output is captured by Codex, CI, or a redirected file, the same board events are printed as compact [aih] status lines so the terminal stays readable.
By default on macOS, Harness opens one worker Terminal per active agent. These windows tail files such as logs/<run>.claude.log, logs/<run>.codex.log, logs/<run>.gemini.log, and logs/<run>.harness.log; each worker log is formatted with a colored title, table-like rows, progress bars, heartbeats, task markers, and an end summary. The pixel puppy appears in the active worker terminal and disappears after that worker finishes. Use --no-agent-terminals for quiet runs.
Press Ctrl+C to interrupt. Partial files are kept, and Harness writes an INTERRUPTED summary with the project and log paths.
Frontend QA And Repair Loop
For frontend projects, --qa writes HARNESS_QA.md and checks common generated-app failures:
- missing or broken local asset references
- unlabeled buttons and form controls
- JavaScript syntax errors when
nodeis available - missing event-handler wiring
- missing responsive CSS signals
- obvious mobile overflow fixed widths
- missing run instructions in README
Use --fix-rounds 1 or higher to let Codex repair failures and then rerun self_check.sh, QA, and review.
Generated Project Contract
Each generated project should include:
README.md
HARNESS_PROMPT.md
HARNESS_QUESTIONNAIRE.md
HARNESS_QUESTIONNAIRE.tsv
HARNESS_QUESTIONNAIRE_FORM.html
HARNESS_STYLE_OPTIONS.md
HARNESS_STYLE_OPTIONS_FORM.html
HARNESS_DEMO_PLAN.md
HARNESS_DEMO_CONFIRM.html
HARNESS_DEMO_URL.md
HARNESS_ACCEPTANCE.md
HARNESS_SUMMARY.md
HARNESS_RUN.json
self_check.sh
source files
dependency files when neededPlanned Local Layout
bin/
aih
Cargo.toml
src/
main.rs
board.rs
tasks.rs
templates/
build_prompt.md
projects/
logs/
docs/
third_party/Current Planning Document
See docs/IMPLEMENTATION_PLAN.md for the first implementation plan.
