@shpitdev/codexharness
v0.0.5
Published
Codex conductor, nanny, and TUI runtime harness.
Downloads
347
Readme
codex-orchestration
Browser-observable multi-agent orchestration on Codex app-server.
What This Repo Does
codex-conductor: runs multi-agent implementation loops (solutionlead -> engineer -> tester) and captures full run telemetry.codex-nanny: separate thread watcher for human Codex sessions; sends idle nudges and follow-up prompts.- Browser monitor: live + rewind visualization of run events, handoffs, messages, and checkpoints.
The runtime is app-server only (no SDK backend path).
How Conductor Works
- Start a run from a spec file or
--prompt. - Runner executes role turns through Codex app-server threads.
- Every event is persisted to run artifacts (
events.jsonl, per-role logs, turn snapshots, reports). - A local monitor service serves the web viewer from
apps/webbuild artifacts for live state and rewind. - Monitor stays alive after completion so runs can be reviewed later.
Getting Started
bun install
bun linkAfter bun link, commands are available globally:
codex-conductorcodex-nannycodex-tui
Build host-native CLI binaries:
bun run build:cliOutputs:
dist/codex-conductordist/codex-nannydist/codex-tuidist/monitor-web/(staged copy ofapps/web/distused by monitor serving)
Compiled binaries include runner/nanny/TUI internals (no runtime dependency on src/*.ts paths).
Fresh-machine binary flow check (isolated bundle, no source-path dependency):
bun run test:e2e:fresh-binary -- --dist-dir ./distnpm packaging plan and publish gates:
docs/npm-packaging-plan.md
RELEASE.mdTarget public package: @shpitdev/codexharness.
Monorepo Scaffold
The repo now includes first-pass app/package boundaries for the long-term split:
apps/web— Solid web monitor viewer (bun run dev:web)apps/tui— OpenTUI run list + status badges + live events tail + turn detail inspector + final gate panel (bun run dev:tui)packages/core— core runner/state/policy/audit runtimepackages/cli— conductor/nanny/monitor command and daemon surfacespackages/monitor-api— monitor run/event discovery + API response shaping
Runtime now lives in packages/core + packages/cli.
Current extracted core modules:
packages/core/src/state.tspackages/core/src/threadTypes.tspackages/core/src/audit.tspackages/core/src/policy.tspackages/core/src/runDirs.tspackages/core/src/threadBackend.tspackages/core/src/appServerClient.tspackages/core/src/threadEvents.tspackages/core/src/threadBackendAppServer.tspackages/core/src/evidence.tspackages/core/src/artifacts.tspackages/core/src/io.tspackages/core/src/report.tspackages/core/src/runner.tspackages/core/src/agentDocs.tspackages/core/src/chime.tspackages/core/src/cliArgs.tspackages/core/src/env.tspackages/core/src/gitignore.tspackages/core/src/schema.tspackages/core/src/threadState.tspackages/core/src/todos.ts
Current extracted CLI modules:
packages/cli/src/codexConductor.tspackages/cli/src/codexNanny.tspackages/cli/src/conductorMonitor.tspackages/cli/src/nanny.tspackages/cli/src/nannyPolicy.tspackages/cli/src/nannyState.tspackages/cli/src/thread-cli.tspackages/cli/src/report-cli.ts
Current extracted monitor API modules:
packages/monitor-api/src/index.ts
Source-root compatibility shims have been removed; scripts/tests now import package modules directly.
Quickstart
Run conductor in current folder:
codex-conductor -p "implement X in this repo"Run from spec:
codex-conductor specs/v1/example.md --freshDefaults:
- workdir: current directory
- monitor: auto-start detached local daemon process
- model:
gpt-5.3-codex-spark - reasoning effort:
xhigh - verification gate: tester must provide verification evidence when
should_test=true
At run end, conductor prints the monitor URL for that run.
Usage:
codex-conductor <spec-file|-p|--prompt ...> [runner flags]
codex-conductor monitor <start|status|stop|open> [--workdir <path>] [--port <n>]
codex-conductor tui [--workdir <path>] [--monitor-port <n>] [--monitor-base-url <url>]
codex-tui [--workdir <path>] [--monitor-port <n>] [--monitor-base-url <url>]Model override example:
codex-conductor --prompt "implement X" --model gpt-5.3-codex-spark --model-reasoning-effort xhighReviewing A Run
- Start a run with
codex-conductor. - Open the printed monitor URL (or run
codex-conductor monitor open --workdir .). - Use the rewind slider for event-by-event replay.
- Select stage nodes or trace cards to inspect parsed turn detail (trigger/actions/output/next).
- Switch
Trace/Rawto compare consolidated turn cards vs raw event rows. - Expand
Todo Overviewto inspect latest per-role todos and todo update history. - Review final artifacts under
.runner-state/runs/<runId>/.
Monitor Commands
codex-conductor monitor status
codex-conductor monitor start --workdir . --port 42427
codex-conductor monitor open --workdir .
codex-conductor monitor stop --workdir .Notes:
- Monitor home lists recent run metadata with stage/status and prompt previews.
TUI Command
codex-conductor tui --workdir .
codex-tui --workdir .
codex-tui --monitor-base-url http://127.0.0.1:42427Notes:
codex-conductor tuiandcodex-tuiauto-start monitor for--workdirunless--monitor-base-urlis provided.
Nanny (Separate Interaction)
Thread watcher examples:
bun run nanny -- --dry-run --once
bun run nanny -- --idle-seconds 240 --cooldown-seconds 900Tmux launcher examples:
codex-nanny .
codex-nanny --workdir . -- --model gpt-5.3-codexcodex-nanny starts a tmux session with two panes:
- left pane: nanny monitor process
- right pane: Codex interactive session
Local State And Artifacts
Per target repo:
<workdir>/.runner-state/<workdir>/.runner-state/runs/<runId>/
Per-run artifacts:
manifest.jsonstate.jsonevents.jsonl(canonical timeline)events-by-role/*.jsonlturn-*.jsonreport.md/report.json/ mermaid outputs
Web monitor assets are built once under apps/web/dist (and staged to dist/monitor-web by bun run build:cli).
Regenerate report:
bun run report -- --artifacts .runner-state --svgValidation
Typecheck:
bunx tsc -p tsconfig.json --noEmitTests:
bun run test:unitUnit tests are intentionally unit/component scope only. They do not claim full real-run end-to-end verification.
Unit test files follow *.unit.test.ts under tests/.
Manual end-to-end scenario stub:
docs/scenarios/real-run-e2e.scenario.stub.mdReal-run e2e harness (executes a real prompt, then asserts high-level scenario checks):
bun run test:e2e:real -- --workdir . --prompt "implement X and verify"Run the real e2e harness against a compiled conductor binary artifact:
bun run test:e2e:real -- --workdir . --prompt "implement X and verify" --conductor-bin ./dist/codex-conductorOptional expected output artifact check:
bun run test:e2e:real -- --workdir . --run-id <runId> --expected-output-path output/result.jsonPTY-driven TUI e2e harness (real monitor API + real OpenTUI process in a pseudo-terminal):
bun run test:e2e:tuiKeep the seeded workdir for debugging:
bun run test:e2e:tui -- --keep-workdirHard real-generation validation:
scripts/validate-real-generations.sh /path/to/target/repoCI notes:
CI / build-clibuildsdist/codex-conductor+dist/codex-nanny+dist/codex-tuiand uploads them as workflow artifacts.CI / build-clialso runsbun run test:e2e:fresh-binary -- --dist-dir ./distbefore artifact upload.CI / e2e-tui-ptyruns PTY-driven TUI e2e (bun run test:e2e:tui) and uploads.memory/tui-pty-e2elogs/artifacts.CI / e2e-real-binaryruns on every PR using--conductor-bin ./dist/codex-conductor, cost-tuned model candidates (codex-mini-latestfirst), low reasoning effort, bounded retries, and publishes full command output + run artifacts.CI / e2e-real-binaryrequires repository/org secretOPENAI_API_KEYsocodex app-servercan run in CI.CodeQLmust stay green (no new code scanning alerts on changed code).- required check list for branch protection:
docs/required-checks.md
CLI productization roadmap:
docs/cli-roadmap.mdAdvanced Direct Runner
bun run runner -- specs/v1/example.md --workdir .Screenshots
Conductor monitor:

Nanny interaction:

Thread Lifecycle CLI
bun run threads -- list --limit 20
bun run threads -- read --thread-id <threadId> --include-turns
bun run threads -- archive --thread-id <threadId>
bun run threads -- unarchive --thread-id <threadId>
bun run threads -- compact --thread-id <threadId>