milhouse-cli

v1.0.0

Published

14 days ago

Correctness-first AI coding orchestrator - Evidence-based diagnostics, WBS planning, and iterative execution

Downloads

541

0High
0Medium
0Low

leonidklarov

ai coding automation claude correctness planner executor evidence-based wbs cli

Milhouse CLI

AI coding orchestrator. Diagnoses, plans, and executes correct work with evidence-based verification. Milhouse is neither Bart (Auto Vibe Coder) nor Ralph (Auto Loop Coder) because he’s a correctness-only QA/planner/problemsolver. Milhouse doesn’t invent new architecture like Bart, and he doesn’t just execute and move on like Ralph. Milhouse verifies with evidence, aligns code with the real environment, and turns issues into safe, one-commit tasks with clear DoD and dependencies.

Installation

Prerequisites

Node.js >= 18.0.0
pnpm >= 9.0.0 (for development)
Bun (for building binaries)

Development Setup

# Install dependencies
pnpm install

# Run in development mode
pnpm dev

# Run tests
pnpm test

# Build binaries
pnpm build

Package Manager

This project uses pnpm for package management and Bun for:

Running TypeScript directly in development
Building cross-platform binaries
Running tests (bun:test)

Building Binaries

# Build all platforms
pnpm build

# Build specific platform
pnpm build:linux
pnpm build:mac-arm
pnpm build:mac-x64
pnpm build:windows

Global Installation

npm install -g milhouse-cli

Three Modes

1. Single Task

Just tell it what to do:

milhouse "add dark mode"
milhouse "fix the auth bug"

2. Task List

Work through a PRD:

milhouse              # uses PRD.md
milhouse --prd tasks.md

3. Investigation Pipeline ⭐ NEW

Multi-agent investigation and execution:

milhouse --scan --scope "frontend zustand"  # Creates isolated run
milhouse --validate                          # Validate issues
milhouse --plan                              # Generate tasks
milhouse --consolidate                       # Merge plans
milhouse --exec --exec-by-issue              # Execute grouped by issue (recommended!)
milhouse --verify                            # Verify results

# Or run full pipeline (uses --exec-by-issue automatically)
milhouse --run

Investigation Pipeline

6-phase pipeline with specialized AI agents:

| Phase | Agent | Description | |-------|-------|-------------| | scan | LI (Lead Investigator) | Scans codebase, identifies issues | | validate | IV (Issue Validators) | Validates with probes | | plan | PL (Planners) | Generates WBS per issue | | consolidate | CO (Consolidator) | Merges into unified plan | | exec | EX (Executors) | Executes tasks | | verify | VE (Verifiers) | Runs verification gates |

Pipeline Runs

Each scan creates isolated state:

milhouse --scan --scope "frontend"    # Creates run-abc
milhouse --scan --scope "backend"     # Creates run-def

milhouse runs list                    # List all runs
milhouse runs switch run-abc          # Switch active run
milhouse runs info                    # Show current run
milhouse runs delete run-def          # Delete a run

Project Config

Optional. Stores rules the AI must follow.

milhouse --init              # auto-detects project settings
milhouse --config            # view config
milhouse --add-rule "use TypeScript strict mode"

Creates .milhouse/config.yaml:

project:
  name: "my-app"
  language: "TypeScript"
  framework: "Next.js"

commands:
  test: "npm test"
  lint: "npm run lint"
  build: "npm run build"

rules:
  - "use server actions not API routes"
  - "follow error pattern in src/utils/errors.ts"

boundaries:
  never_touch:
    - "src/legacy/**"
    - "*.lock"

AI Engines

milhouse              # Claude Code (default)
milhouse --opencode   # OpenCode
milhouse --cursor     # Cursor
milhouse --codex      # Codex
milhouse --qwen       # Qwen-Code
milhouse --droid      # Factory Droid

Model Override

milhouse --model sonnet "add feature"    # use sonnet with Claude
milhouse --sonnet "add feature"          # shortcut for above
milhouse --opencode --model opencode/glm-4.7-free "task"

Task Sources

Markdown file (default):

milhouse --prd PRD.md

Markdown folder (for large projects):

milhouse --prd ./prd/

Reads all .md files in the folder and aggregates tasks.

YAML:

milhouse --yaml tasks.yaml

GitHub Issues:

milhouse --github owner/repo
milhouse --github owner/repo --github-label "ready"

Parallel Execution

Issue-Based Execution (Recommended)

milhouse --exec --exec-by-issue              # Each issue in its own worktree
milhouse --exec --exec-by-issue --max-parallel 3  # 3 issues in parallel

How it works:

Groups all tasks by their parent issue
Each issue runs in an isolated worktree with a dedicated Claude agent
Agent receives: issue details + validation report + WBS plan + all tasks
Agent completes ALL tasks for that issue in one session
Branches auto-merge back after completion

Benefits:

Better context: Agent has full issue context, not just single task
Fewer context switches: One agent handles related tasks together
Faster overall: ~5 minutes per issue vs ~5 minutes per task

Task-Based Execution (Legacy)

milhouse --parallel                  # 3 agents default
milhouse --parallel --max-parallel 5 # 5 agents

Each agent gets isolated worktree + branch. Without --create-pr: auto-merges back with AI conflict resolution. With --create-pr: keeps branches, creates PRs. With --no-merge: keeps branches without merging.

Branch Workflow

milhouse --branch-per-task                # branch per task
milhouse --branch-per-task --create-pr    # + create PRs
milhouse --branch-per-task --draft-pr     # + draft PRs

Browser Automation

Milhouse supports browser automation via agent-browser for testing web UIs.

milhouse "add login form" --browser    # enable browser automation
milhouse "fix checkout" --no-browser   # disable browser automation

When enabled (and agent-browser is installed), the AI can:

Open URLs and navigate pages
Click elements and fill forms
Take screenshots for verification
Test web UI changes after implementation

Issue Filtering

Milhouse supports filtering issues by ID and severity level at any pipeline stage.

Filter by Issue IDs

# Process only specific issues
milhouse --validate --issues P-xxx,P-yyy,P-zzz

# Exclude specific issues
milhouse --plan --exclude-issues P-xxx

Filter by Severity

# Process only CRITICAL and HIGH severity issues
milhouse --validate --severity CRITICAL,HIGH

# Process issues with severity HIGH or above
milhouse --run --min-severity HIGH

Severity Levels

Severity levels in order of priority:

CRITICAL - Highest priority
HIGH
MEDIUM
LOW - Lowest priority

Combining Filters

Filters can be combined (AND logic):

# Validate specific issues that are also HIGH+ severity
milhouse --validate --issues P-xxx,P-yyy --min-severity HIGH

Options

| Flag | What it does | |------|--------------| | Pipeline | | | --scan | Run Lead Investigator | | --scope FOCUS | Focus scan on specific area | | --validate | Validate issues with probes | | --plan | Generate WBS | | --consolidate | Merge into execution plan | | --exec | Execute tasks | | --verify | Run verification gates | | --run | Run full pipeline | | --resume | Resume from last phase | | Issue Filtering | | | --issues IDS | Comma-separated issue IDs to process | | --exclude-issues IDS | Comma-separated issue IDs to exclude | | --severity LEVELS | Filter by severity (CRITICAL,HIGH,MEDIUM,LOW) | | --min-severity LEVEL | Minimum severity level to process | | Tasks | | | --prd PATH | task file or folder (auto-detected, default: PRD.md) | | --yaml FILE | YAML task file | | --github REPO | use GitHub issues | | --github-label TAG | filter issues by label | | Engine | | | --model NAME | override model for any engine | | --sonnet | shortcut for --claude --model sonnet | | Execution | | | --parallel | run tasks in parallel (legacy, per-task) | | --exec-by-issue | execute tasks grouped by issue (recommended!) | | --max-parallel N | max parallel agents/issues (default: 3) | | --no-merge | skip auto-merge in parallel mode | | --branch-per-task | branch per task | | --base-branch BRANCH | base branch for PRs | | --create-pr | create PRs | | --draft-pr | draft PRs | | --worktrees | force worktree isolation | | --exec-fail-fast | stop on first task failure | | Testing | | | --no-tests | skip tests | | --no-lint | skip lint | | --fast | skip tests + lint | | --no-commit | don't auto-commit | | --browser | enable browser automation | | --no-browser | disable browser automation | | General | | | --max-iterations N | stop after N tasks | | --max-retries N | retries per task (default: 3) | | --retry-delay N | delay between retries in seconds (default: 5) | | --dry-run | preview only | | -v, --verbose | debug output | | --init | setup .milhouse/ config | | --config | show config | | --add-rule "rule" | add rule to config |

Requirements

Node.js 18+ or Bun
AI CLI: Claude Code, OpenCode, Cursor, Codex, Qwen-Code, or Factory Droid
gh (optional, for GitHub issues / --create-pr)

License

MIT