@malayvuong/conductor-ai

v2026.3.1

Published

8 days ago

Autonomous execution supervisor for AI coding CLIs

0High
0Medium
0Low

malayvuong

ai supervisor cli claude codex automation orchestration

Conductor

Supervisor for AI coding CLIs. Manages sessions, decomposes goals into work packages, runs engines (Claude, Codex) in a loop, and produces structured reports with full execution history.

User
  |
  v
cdx config set engine claude               # one-time setup
cdx doctor                                # check environment
cdx session start my-project
cdx execute plan.md --until-done          # plan mode
cdx execute "fix login bug" --until-done  # no-plan mode
cdx status                                # check progress
cdx inspect                               # deep dive
cdx inspect --goal 1 --insights           # drill into goal
cdx session pause                         # pause current session
cdx session switch other-project          # switch sessions
  |
  v
┌─────────────── Supervisor Layer ───────────────┐
│  Session → Goal → Work Packages → Snapshots    │
│  Compactor (context between runs)              │
│  Scheduler (WP ordering, retries, blockers)    │
│  Progress detection (evidence-based)           │
│  Closeout summary (per goal)                   │
└────────────────────┬───────────────────────────┘
                     |
┌────────────────────v───────────────────────────┐
│  Execution Layer                               │
│  Task → Run → Logs → Report → Heartbeat        │
│  Engine adapters (Claude, Codex)               │
│  Stream parser, log interpreter                │
└────────────────────────────────────────────────┘
                     |
                     v
               SQLite (local)

Architecture

Two-layer design:

Supervisor Layer — Sessions, goals, work packages, snapshots, execution attempts. Manages the "what to do next" loop: parse plan → create WPs → dispatch engine → evaluate result → snapshot state → repeat or finish.
Execution Layer — Tasks, runs, logs, reports, heartbeat. Handles the "how to run" mechanics: classify task → build prompt → spawn engine → stream output → generate report.

Session-first UX: Sessions are the primary user-facing surface. Session names are reusable labels (e.g. solo-defender, ispa-cms) — not permanent unique IDs. When a session completes, it is archived automatically, and the label becomes available for a new run. Goals are internal state managed by the supervisor. Users never need to handle goal IDs or invent -v2 suffixes.

Prerequisites

Node.js 22+
One of the supported AI CLI tools installed:
- Claude Code (claude)
- Codex CLI (codex)

Install

git clone <repo-url> conductor
cd conductor
npm install
npm run build
npm link          # makes `cdx` available globally

Quick Start

# First time: set defaults (once)
cdx config set engine claude
cdx config set path /path/to/project

# Check environment
cdx doctor

# Start a session
cdx session start my-project

# Execute a plan file (decomposes into WPs, runs in loop)
cdx execute plan.md --until-done

# Or execute an ad-hoc task (no plan file needed)
cdx execute "fix the login API 500 error" --until-done

# Check progress
cdx status

# Deep inspection (WPs, attempts, snapshots, closeout)
cdx inspect

# Resume after interruption (Ctrl+C pauses, next execute resumes)
cdx execute --until-done

# View session history
cdx session history
cdx session history my-project            # all runs for a label

# Session management
cdx session current                       # which session is active
cdx session pause                         # pause session + goal
cdx session resume                        # resume most recent paused
cdx session switch other-project          # switch sessions
cdx session close                         # close + archive session

# Label reuse — after completion, same label starts a fresh run
cdx session start my-project              # → creates run #2

# Inspect drill-down
cdx inspect --goal 1                      # single goal detail
cdx inspect --goal 1 --attempts           # attempt timeline
cdx inspect --goal 1 --snapshots          # snapshot chain
cdx inspect --goal 1 --insights           # decisions, assumptions, questions
cdx inspect --label my-project            # all runs for a label (including archived)

Commands

Session Management

`cdx session start <label>`

Start a session under a reusable label.

cdx session start solo-defender                        # uses config defaults
cdx session start solo-defender --engine codex          # override engine
cdx session start solo-defender --path /other/project   # override path

| Flag | Required | Description | |------|----------|-------------| | --engine | No | Override default engine. Uses config/env fallback if omitted. | | --path | No | Override workspace path. Uses config default or cwd if omitted. |

Behavior depends on existing sessions for that label:

| Existing state | Action | |----------------|--------| | No sessions | Create new session (run #1) | | Only archived runs | Create new session (run #N+1), show prior run count | | Active/created session | Focus existing session | | Paused session | Resume existing session |

Labels are reusable — once a session is archived (completed or closed), the same label can be used for a fresh run. No need to invent -v2, -v3 suffixes.

`cdx session list`

List all sessions with status and goal count.

cdx session list
cdx session list --status active

`cdx session current`

Show which session is active.

`cdx session pause`

Pause current session and its active goal.

`cdx session resume [name]`

Resume a paused session. Without name, resumes most recent paused session.

`cdx session switch <name>`

Switch to another session (pauses current, activates target).

`cdx session close`

Close and archive the current session. Completed goals stay completed, unfinished goals are abandoned with closeout summaries. The label becomes available for a new session.

`cdx status`

Show current session status: engine, path, active goal, WP progress, retries. Includes hygiene warnings for stale sessions and too many paused goals.

`cdx inspect`

Detailed inspection of current session or a label's full history.

cdx inspect                          # current active session
cdx inspect --label solo-defender    # all runs for a label (including archived)
cdx inspect --goal <N>               # single goal detail
cdx inspect --goal <N> --attempts    # attempt timeline
cdx inspect --goal <N> --snapshots   # snapshot chain
cdx inspect --goal <N> --insights    # decisions, assumptions, questions, follow-ups, constraints

The --label flag is especially useful after a session has been archived — it lets you inspect completed work without needing an active session.

`cdx session history [label]`

View session goal history. Without a label, shows the current active session. With a label, shows all runs (including archived) grouped by run index.

cdx session history                  # current session
cdx session history solo-defender    # all runs for label

Execution

`cdx execute [source] --until-done`

The main execution command. Behavior depends on input:

| Input | Mode | Behavior | |-------|------|----------| | cdx execute plan.md --until-done | Plan mode | Parse plan → create WPs → run loop | | cdx execute "fix bug" --until-done | No-plan mode | Create single WP → run with evidence-based completion | | cdx execute --until-done | Resume | Continue active unfinished goal |

Resume vs New Goal rules:

Source provided (file or text) → always creates new goal. If an active unfinished goal exists, it is auto-paused.
No source → resume only. Continues the active unfinished goal.

Execution Layer (Low-Level)

`cdx run`

Run a single task directly (bypasses supervisor layer).

cdx run --engine claude --task "fix the login bug"

`cdx resume <taskId>`

Resume a task with curated context from previous runs.

`cdx tasks`

List all tasks with status.

`cdx logs <runId>`

View saved logs for a run.

`cdx report <runId>`

View the structured report for a run.

`cdx runs show <runId>`

Inspect run metadata.

Configuration

`cdx config set <key> <value>`

Set a global config value. Short aliases supported.

cdx config set engine claude        # set default engine
cdx config set path /path/to/project  # set default workspace path
cdx config set heartbeat 30         # heartbeat interval (seconds)
cdx config set stuck-threshold 120  # stall detection threshold (seconds)

`cdx config get <key>`

Get a single config value.

cdx config get engine               # → claude

`cdx config show`

Show all config values. On fresh install, shows quick-start guide.

`cdx config unset <key>`

Remove a config value.

cdx config unset engine

| Key | Alias for | Description | |-----|-----------|-------------| | engine | defaultEngine | Default engine: claude or codex | | path | defaultPath | Default workspace path | | heartbeat | heartbeatIntervalSec | Heartbeat check interval (seconds, default 15) | | stuck-threshold | stuckThresholdSec | Stall detection threshold (seconds, default 60) |

Legacy aliases: cdx set-path, cdx get-path, cdx clear-path still work.

`cdx doctor`

Check environment and configuration. Shows what's set up, what's missing, and what to do next.

cdx doctor

  [OK] Config file          ~/.conductor/config.json
  [OK] Default engine       claude
  [OK] Engine: claude       found in PATH
  [--] Engine: codex        not found in PATH
  [OK] Default path         /Users/me/projects/my-app
  [OK] Active session       my-project [active]

Ready to go.

Configuration

Persistent config at ~/.conductor/config.json:

{
  "defaultPath": "/Users/me/projects/my-app",
  "defaultEngine": "claude",
  "heartbeatIntervalSec": 15,
  "stuckThresholdSec": 60
}

Engine Resolution

Engine is resolved with a priority chain — no need to specify --engine every time:

--engine flag on current command
Engine stored in session
defaultEngine in config
DEFAULT_ENGINE environment variable
Helpful error with onboarding instructions

# One-time setup
cdx config set engine claude

# Then just use sessions
cdx session start my-project

Supervisor Loop

The supervisor loop (cdx execute ... --until-done) works as follows:

Parse input — Plan file → decompose into work packages. Ad-hoc text → single WP.
Schedule — Pick next WP by seq order, skip completed/blocked.
Build prompt — Include goal context, WP description, snapshot from previous run (if any), done criteria.
Dispatch engine — Spawn Claude/Codex with the prompt.
Evaluate — Parse report, detect progress, check WP completion.
Snapshot — Capture state (completed items, in-progress, remaining, decisions, files, blockers) for next run's context.
Advance or retry — Complete WP → next WP. No progress → retry with escalated prompt strategy. Exhausted retries → mark failed.
Finalize check — After each WP result, immediately check if all WPs are done. If yes, finalize atomically (goal + session + closeout in one transaction). This prevents the SIGINT race where Ctrl+C during the last engine run could leave stale state.
Loop until all WPs done or hard-blocked.

Prompt strategy escalation: normal → focused → surgical → recovery (based on retry count).

Evidence-based completion (ad-hoc mode): Completion signal alone is not enough — needs evidence (files_changed, files_inspected, fix_applied, verification, what_implemented, substantial output, or findings).

Goal lifecycle: created → active → paused/completed/failed/hard_blocked/abandoned.

Session lifecycle: created → active → paused → archived. Completed sessions are automatically archived. The session label (e.g. solo-defender) becomes immediately reusable for a new run.

Closeout summary: Generated at every terminal state with objective, files touched, decisions, blockers, and next recommended action.

State Consistency Guarantees

The supervisor loop ensures that DB state is always consistent with execution reality:

Transactional finalization — Goal status + session status + closeout summary are written in a single SQLite transaction. No partial writes possible.
SIGINT-safe completion — After each WP completes, the loop checks for goal completion before checking the interrupted flag. If Ctrl+C fires during the last engine run, the goal still finalizes as completed.
Interrupt safety net — The interrupt path also checks if all WPs are actually done before marking as paused. If the goal completed, it finalizes correctly regardless of interrupt timing.
No silent error swallowing — Closeout generation errors propagate rather than being silently logged.

Live Progress During Execution

While the loop runs, periodic heartbeat events show real-time status:

── session: my-project | goal: Implement CMS ──
[WP 1/5] Scan structure — attempt 1 (normal)
[WP 1/5] heartbeat: alive | files: 3 | idle: 2s | last: Read | strategy: normal
[WP 1/5] heartbeat: alive | files: 8 | idle: 0s | last: Edit | strategy: normal
[WP 1/5] completed
[WP 2/5] Build models — attempt 1 (normal)
[WP 2/5] heartbeat: alive | files: 14 | idle: 5s | last: Write | strategy: normal
[WP 2/5] no output for 65s — possible stall

Each heartbeat shows:

status: alive, idle, suspected_stuck, recovered
files: unique files touched during current run
idle: seconds since last engine output
last: most recent tool used (Read, Edit, Write, etc.)
strategy: current prompt strategy

Stall detection: When no engine output exceeds the stuck threshold (default 60s), a visible warning is emitted. Heartbeat events are also persisted to the database for post-run analysis.

Live Status While Running

cdx status and cdx inspect show live run info when a session is actively executing:

Session:  my-project [Running]
Engine:   claude
Path:     /path/to/project

Goal:     Implement CMS [active]
Progress: 2/5 WPs completed
Current:  Build models

Run:      3m 42s elapsed | strategy: focused
Heartbeat: idle | idle: 35s

Data Storage

SQLite at data/conductor.db with two layers:

Supervisor tables:

sessions — name (reusable label), run_index, title, engine, path, status, active_goal_id, working_summary, decisions
goals — title, description, type, source_type, status, completion_rules, closeout_summary
work_packages — seq, title, status, retry_count/budget, blocker_type/detail, done_criteria
snapshots — trigger, summary, completed/in-progress/remaining items, decisions, files, blockers, next_action, assumptions, unresolved_questions, follow_ups
execution_attempts — attempt_no, status, prompt_strategy, progress_detected, files_changed_count

Execution tables:

tasks — raw_input, workspace, engine, classification, status
runs — command, prompt, PID, exit code, timestamps, resumed_from_run_id, cost_usd
run_logs — stdout/stderr/system lines, sequenced
heartbeat_events — alive/idle/suspected_stuck/recovered
run_reports — structured post-run analysis with task-type-specific fields

Schema migrations run automatically on startup.

Project Structure

conductor/
  src/
    cli/
      index.ts                        # CLI entry + command registration
      commands/
        session.ts                    # Session management + display helpers
        execute.ts                    # Supervisor execution (plan + no-plan)
        goal.ts                       # [Internal] Goal management
        run.ts                        # Single-run orchestration
        resume.ts                     # Resume with curated context
        tasks.ts                      # List tasks
        logs.ts                       # View run logs
        report.ts                     # Task-type-specific report display
        runs.ts                       # Run metadata inspection
        config.ts                     # cdx config set/get/show/unset + legacy aliases
        doctor.ts                     # cdx doctor — environment check
    core/
      config/service.ts               # Config read/write, resolveEngine(), key aliases
      supervisor/
        loop.ts                       # Main supervisor loop (transactional finalization, SIGINT-safe)
        scheduler.ts                  # WP scheduling, status counting
        plan-parser.ts                # Markdown plan → WP decomposition
        prompt-builder.ts             # Supervisor prompt (plan + ad-hoc)
        progress.ts                   # Evidence-based progress detection
        compactor.ts                  # Snapshot builder + decision extraction
        closeout.ts                   # Goal closeout summary generation
        progress-reporter.ts            # Live progress event formatting
        live-tracker.ts                 # Real-time file/tool tracking during runs
        hygiene.ts                      # Session health warnings
      storage/
        schema.ts                     # SQL DDL + migrations
        db.ts                         # SQLite singleton (WAL mode)
        repository.ts                 # Execution layer CRUD
        supervisor-repository.ts      # Supervisor layer CRUD
      task/normalizer.ts              # Keyword-based task classification
      prompt/builder.ts               # Template loading + substitution
      engine/
        types.ts                      # EngineAdapter interface + factory
        claude.ts                     # Claude CLI adapter (stream-json)
        codex.ts                      # Codex CLI adapter
        stream-parser.ts              # JSON event parser
        log-interpreter.ts            # Unified log parsing into typed events
      runner/process.ts               # child_process.spawn wrapper
      heartbeat/monitor.ts            # State-tracked output monitoring
      report/generator.ts             # Task-type-aware report extraction
      resume/
        context.ts                    # Best-run selection + typed context
        prompt.ts                     # Structured resume prompt rendering
    types/
      index.ts                        # Execution layer types
      supervisor.ts                   # Supervisor layer types
    utils/
      logger.ts                       # Timestamped console logger
      lookup.ts                       # Short-ID prefix resolution
  prompts/                            # Prompt templates per engine/task type
  data/                               # SQLite DB (gitignored)
  tests/                              # Vitest test suite (349 tests, 31 files)

Development

npm run dev -- <command>     # Run CLI in dev mode (tsx)
npm test                     # Run all tests (349 tests)
npm run test:watch           # Watch mode
npm run build                # Compile TypeScript to dist/
npm link                     # Link cdx command globally

Tech Stack

Runtime: Node.js 22+, TypeScript
CLI: commander
Validation: zod
Database: better-sqlite3 (WAL mode, foreign keys)
Process: child_process.spawn with stdin pipe
Tests: vitest

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Conductor

Architecture

Prerequisites

Install

Quick Start

Commands

Session Management

cdx session start <label>

cdx session list

cdx session current

cdx session pause

cdx session resume [name]

cdx session switch <name>

cdx session close

cdx status

cdx inspect

cdx session history [label]

Execution

cdx execute [source] --until-done

Execution Layer (Low-Level)

cdx run

cdx resume <taskId>

cdx tasks

cdx logs <runId>

cdx report <runId>

cdx runs show <runId>

Configuration

cdx config set <key> <value>

cdx config get <key>

cdx config show

cdx config unset <key>

cdx doctor

Configuration

Engine Resolution

Supervisor Loop

State Consistency Guarantees

Live Progress During Execution

Live Status While Running

Data Storage

Project Structure

Development

Tech Stack

License

`cdx session start <label>`

`cdx session list`

`cdx session current`

`cdx session pause`

`cdx session resume [name]`

`cdx session switch <name>`

`cdx session close`

`cdx status`

`cdx inspect`

`cdx session history [label]`

`cdx execute [source] --until-done`

`cdx run`

`cdx resume <taskId>`

`cdx tasks`

`cdx logs <runId>`

`cdx report <runId>`

`cdx runs show <runId>`

`cdx config set <key> <value>`

`cdx config get <key>`

`cdx config show`

`cdx config unset <key>`

`cdx doctor`