aih-harness-cli

v0.1.2

Published

9 days ago

AI Software Harness - local multi-CLI orchestration for Codex, Claude Code, and Gemini CLI

0High
0Medium
0Low

dinglurong

ai cli codex claude gemini orchestration

AI Software Harness

AI Software Harness (aih) is a Rust-based local software factory that turns one natural-language request into a runnable project.

The first version combines two proven open-source ideas:

Kim Orchestrator: use a staged multi-CLI workflow where the harness plans the work, Codex builds, and Gemini reviews.
Superpowers: use a disciplined development workflow with planning, small tasks, verification, review, and completion evidence.

Product Shape

aih --project ./work/stopwatch-demo "make a local stopwatch web app"
  -> create a project workspace
  -> write a build prompt, questionnaire seed, demo seed, acceptance seed, and plan
  -> auto-detect Codex, Claude Code, and Gemini CLI
  -> use full Claude/Codex/Gemini orchestration when available, or downgrade to one available CLI
  -> optionally ask the user to answer questionnaire items
  -> call Codex to build the cheapest useful demo first
  -> optionally ask the user to confirm before full implementation
  -> call Gemini to review the result against questionnaire, recommended style, demo plan, and acceptance contract
  -> run the generated self_check.sh
  -> write logs, run metadata, and summary

The goal is not to build a full IDE. The goal is a reliable local workbench for quickly turning small ideas into runnable software.

Upstream References

The project currently vendors only reference source under third_party/:

third_party/kim-orchestrator: staged Claude/Codex/Gemini CLI orchestration patterns.
third_party/superpowers: planning, TDD, review, and verification workflow patterns.

These repositories are references, not code we copy wholesale into the harness.

MVP Command

aih [options] "<requirement>"

Supported first-pass options:

--project PATH    required for new projects; exact output directory
--continue PATH   continue iterating an existing generated project
--name NAME       optional run label; does not choose the project path
--plain           disable rich terminal UI
--workflow MODE   default superpower; use single for one-agent mode
--agent AGENT     builder for single mode; supports auto, codex, claude, gemini, noop
--model MODEL     pass model to the builder agent
--no-test         skip self_check.sh execution
--review          run Gemini review in single mode
--qa              run frontend QA in single mode; superpower does this automatically
--agent-terminals open per-agent Terminal log windows; default on macOS
--no-agent-terminals disable per-agent Terminal log windows
--no-interactive exit immediately after writing the summary
--fix-rounds N    repair rounds, default 1
--help            show help

Run Locally

From this repository:

./bin/aih --help
./bin/aih --project ./work/stopwatch-demo "build a local stopwatch web app"
./bin/aih --continue ./work/stopwatch-demo "polish the UI and fix any broken buttons"

bin/aih runs the Rust implementation. If target/debug/aih is missing or older than src/main.rs, it compiles the binary with rustc first.

Install And CLI Detection

AIH is a single Rust binary and is intended to run on macOS, Linux, and Windows. The recommended distribution layer is npm because it gives every platform the same CLI entry:

npm install -g aih-harness-cli
aih --project ./work/stopwatch-demo "build a local stopwatch web app"

The npm package exposes aih through a small Node launcher. In source installs it compiles src/main.rs with rustc on first run and then reuses the platform binary under target/npm/. Release packages can later swap this for prebuilt per-platform binaries. For local repo development, ./bin/aih still works on Unix-like systems.

Requirements for source npm installs:

Node.js 18+
Rust toolchain with rustc
At least one AI CLI on PATH for normal work: codex, claude, or gemini

At startup AIH clearly displays local install and login readiness for all three AI CLIs before creating a project:

Codex CLI     installed / login status
Claude Code   installed / login status
Gemini CLI    installed / availability status

At least one of them must be installed and locally usable unless --agent noop is used for smoke tests. If none are available, AIH stops before creating the project and asks the user to install or log in first. Full Superpower orchestration runs when both Claude Code and Codex are ready; Gemini review is used when Gemini is ready. If only one AI CLI is ready, AIH automatically uses that one tool as Planner, Builder, and Reviewer.

By default this runs the best available flow: Claude+Codex enable the staged Superpower workflow, Gemini adds review, and a single available CLI can still build a runnable project. Use --continue to keep iterating an existing project instead of starting from a blank workspace.

For a fast harness-only smoke test that does not call Codex:

./bin/aih --agent noop --project ./work/smoke "make a smoke test project"

Verify

rustc --edition=2021 src/main.rs -o target/debug/aih
./bin/aih --workflow single --agent noop --project ./work/smoke "make a smoke test project"
./bin/aih --workflow single --agent noop --no-agent-terminals --continue ./work/smoke "add one small improvement"

The active implementation is Rust-only. Legacy Python package code and compatibility tests were removed so there is one runtime to reason about.

Continue A Project

Generated projects are meant to be living workspaces. If the result is not good enough, run another iteration against the same directory:

./bin/aih --continue ./work/smoke "make the layout clearer and fix broken buttons"

Continuation runs keep the existing project files, write run-specific prompt/summary/review files, reuse HARNESS_TASKS.tsv as the current iteration ledger, and produce a fresh log under logs/. Claude plans the delta, Codex modifies the existing code, Gemini reviews the new result, and Harness verifies again.

After an interactive run finishes, aih keeps the terminal open with a short prompt. You can try the project, then type a follow-up requirement directly into that prompt to start another iteration against the same project. Press Enter on an empty line to exit. Use --no-interactive for scripts and smoke tests.

Live Output

Long-running agent commands stream useful progress to the terminal while preserving full raw output in logs/.

The default terminal view is a pinned orchestration map, not a flat grid of agents:

Harness is shown first as the Rust runtime that owns workspace setup, prompts, handoffs, verification, interrupt reports, and repair loops.
The work is then shown as a vertical pipeline: 1 PLAN / Claude Code -> 2 BUILD / Codex -> 3 REVIEW / Gemini CLI -> 4 VERIFY / Harness.
Each stage shows role, state, input artifact, output artifact, current activity, elapsed time, and recent trace.
Codex, Claude Code, and Gemini CLI readiness is detected at startup and remains pinned on the main control deck for the whole run, including installed/missing and logged-in/ready state.
Handoffs are displayed as a trail under the pipeline so the execution order stays clear.
A quantified task ledger is displayed under the pipeline. Claude writes HARNESS_TASKS.tsv with fine-grained tasks, owners, statuses, titles, and acceptance evidence; agents update the board by printing CLAIM TASK <id> and DONE TASK <id>.

The main terminal is a pinned control deck. It refreshes in place, shows the hierarchy, current owner, task progress bars, handoffs, and the pixel puppy location, but it does not scroll raw worker logs. Claude and Codex are also prompted to emit structured progress lines such as PROGRESS: PLAN, PROGRESS: CODE, PROGRESS: TEST, and PROGRESS: FIX; worker terminals render these with distinct symbols so planning, coding, testing, and repair work are easy to scan. Execution is task-ledger driven by default: Claude must write HARNESS_TASKS.tsv, Harness normalizes owners, sends Codex only the codex implementation queue, sends Gemini the gemini review queue, and keeps harness verification tasks for self-check/QA/summary. Codex claims and completes quantified tasks one by one, which keeps the build anchored to small verifiable units instead of a single vague request. Execution is also acceptance-first by default: Claude must write HARNESS_ACCEPTANCE.md with Given/When/Then scenarios, automated checks, manual QA checks, edge cases, and regressions. Codex is instructed to encode tests and self_check.sh before or alongside feature code, and Gemini reviews the result against that acceptance contract. Execution is demo-first in interactive runs: Claude writes HARNESS_QUESTIONNAIRE.md, HARNESS_QUESTIONNAIRE.tsv, HARNESS_STYLE_OPTIONS.md, and HARNESS_DEMO_PLAN.md; Harness opens browser pages for the questionnaire, demo style selection, demo confirmation, and iteration choice. The pages support Chinese/English UI switching and are choice-based, so the workflow does not require terminal typing. Every demo must expose a working local or LAN URL through HARNESS_DEMO_URL.md; screenshots, source files, or prose alone are not enough. When output is captured by Codex, CI, or a redirected file, the same board events are printed as compact [aih] status lines so the terminal stays readable.

By default on macOS, Harness opens one worker Terminal per active agent. These windows tail files such as logs/<run>.claude.log, logs/<run>.codex.log, logs/<run>.gemini.log, and logs/<run>.harness.log; each worker log is formatted with a colored title, table-like rows, progress bars, heartbeats, task markers, and an end summary. The pixel puppy appears in the active worker terminal and disappears after that worker finishes. Use --no-agent-terminals for quiet runs.

Press Ctrl+C to interrupt. Partial files are kept, and Harness writes an INTERRUPTED summary with the project and log paths.

Frontend QA And Repair Loop

For frontend projects, --qa writes HARNESS_QA.md and checks common generated-app failures:

missing or broken local asset references
unlabeled buttons and form controls
JavaScript syntax errors when node is available
missing event-handler wiring
missing responsive CSS signals
obvious mobile overflow fixed widths
missing run instructions in README

Use --fix-rounds 1 or higher to let Codex repair failures and then rerun self_check.sh, QA, and review.

Generated Project Contract

Each generated project should include:

README.md
HARNESS_PROMPT.md
HARNESS_QUESTIONNAIRE.md
HARNESS_QUESTIONNAIRE.tsv
HARNESS_QUESTIONNAIRE_FORM.html
HARNESS_STYLE_OPTIONS.md
HARNESS_STYLE_OPTIONS_FORM.html
HARNESS_DEMO_PLAN.md
HARNESS_DEMO_CONFIRM.html
HARNESS_DEMO_URL.md
HARNESS_ACCEPTANCE.md
HARNESS_SUMMARY.md
HARNESS_RUN.json
self_check.sh
source files
dependency files when needed

Planned Local Layout

bin/
  aih
Cargo.toml
src/
  main.rs
  board.rs
  tasks.rs
templates/
  build_prompt.md
projects/
logs/
docs/
third_party/

Current Planning Document

See docs/IMPLEMENTATION_PLAN.md for the first implementation plan.