npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

review-orchestra

v0.1.2

Published

Multi-model automated code review orchestration for Claude Code

Readme

review-orchestra

Multi-model code review orchestration for Claude Code. Runs multiple AI reviewers (Claude + Codex by default) in parallel, consolidates findings, and presents them to the user. The orchestrator Claude fixes code directly with user guidance in a supervised loop.

How it works

          +-------------------+
          |  Detect Scope     |  git diff / branch diff / PR diff
          +--------+----------+
                   |
          +--------v----------+
          |  Parallel Review   |  Claude + Codex (headless, concurrent)
          +--------+----------+
                   |
          +--------v----------+
          |  Consolidate       |  Dedup, classify confidence × impact, compute P-level
          +--------+----------+
                   |
          +--------v----------+
          |  Return findings   |  ReviewResult JSON → skill → user
          +-------------------+
  1. Scope detection — auto-detects uncommitted changes, branch diff vs main, or open PR diff.
  2. Parallel review — launches all configured reviewers as headless CLI processes concurrently.
  3. Consolidation — deduplicates findings across reviewers, classifies each on two axes (confidence × impact), computes a P-level (P0–P3), tags pre-existing issues outside the diff, and compares findings against previous rounds.
  4. Return findings — the CLI returns a ReviewResult JSON on stdout. The skill presents findings to the user, who decides what to fix.

After receiving findings, the orchestrator Claude (who wrote the code and has full context) fixes issues directly with user guidance. The user controls what gets fixed and when to re-review.

Installation

Prerequisites

From npm (recommended)

npm install -g review-orchestra
review-orchestra setup

From source (contributors)

git clone https://github.com/ckailash/review-orchestra.git
cd review-orchestra
npm install
npm run build
review-orchestra setup

review-orchestra setup validates your environment (Node version, required CLIs), creates the skill symlink for Claude Code, and configures .gitignore. Run review-orchestra doctor anytime to diagnose issues.

Usage

Invoke via the /review-orchestra skill in Claude Code. Arguments are natural language — no flags.

/review-orchestra                                  # auto-detect scope, all defaults
/review-orchestra src/auth/ src/api/               # only review these paths
/review-orchestra fix quality issues too            # extend threshold to P2
/review-orchestra only use claude                   # single reviewer
/review-orchestra skip codex                       # disable a specific reviewer

Defaults (when no arguments are provided):

  • Auto-detect diff scope
  • Both Claude + Codex reviewers
  • Stop-at threshold: P1 (all critical + functional issues recommended for fixing)

CLI subcommands

review-orchestra review                            # run reviewers + consolidate, return findings (default)
review-orchestra review src/services/              # with scope args
review-orchestra review only claude                # natural language filtering
review-orchestra stale                             # check if worktree changed since last review (exit 0=fresh, 1=stale, 2=no session)
review-orchestra reset                             # clear session state
review-orchestra setup                             # first-time install + repair broken install
review-orchestra doctor                            # diagnose issues without modifying anything

Running with no subcommand (just scope args) is equivalent to review-orchestra review.

Supervised workflow

The CLI runs review + consolidation and returns a ReviewResult JSON. The skill then enters the supervised loop:

  1. Review — run review-orchestra review (spawns reviewers, consolidates, returns findings)
  2. Present — skill shows findings grouped by severity with progressive disclosure
  3. User decides — user selects which findings to fix (by ID, severity band, or "all")
  4. Confirm — skill echoes back planned actions, waits for confirmation
  5. Fix — orchestrator Claude fixes code directly using Edit/Write tools
  6. Re-review — if requested, run review-orchestra review again (back to step 1)
  7. Done — summarize session, suggest next action (commit, push, PR)

The user controls the loop at every step. Escalation is implicit: the user sees all findings and decides.

Session artifacts

All session data lives in .review-orchestra/ in the project root. Reviewer outputs are persisted as flat files (no per-round subdirectories) so debug evidence survives parse failures and crashes:

.review-orchestra/
├── session.json                      # Session state (sessionId, status, scope, rounds[], reviews, consolidated, reviewerErrors)
├── state.lock                        # PID file for concurrent-run prevention
├── progress.json                     # Live reviewer status during a run (deleted on round complete)
├── round-1-claude-raw.txt            # Raw stdout from each reviewer, written before parsing
├── round-1-codex-raw.txt             # On parse failure renamed to *.raw.txt; on codex failure renamed *.failed
└── round-2-claude-raw.txt

Everything is kept. Never auto-deleted. Users can rm -rf .review-orchestra/ or review-orchestra reset when done. setup adds .review-orchestra/ to .gitignore automatically.

Concurrent runs of review-orchestra review in the same project are blocked by state.lock (atomic-rename release protocol with PID re-check) so two terminals don't trample each other's session.

Session state (session.json)

{
  "sessionId": "20260315-143022",
  "status": "active",
  "scope": { "type": "branch", "baseBranch": "main", "description": "branch feat/auth vs main" },
  "currentRound": 2,
  "worktreeHash": "abc123",
  "rounds": [
    {
      "number": 1,
      "phase": "complete",
      "worktreeHash": "def456",
      "reviews": { "claude": { "findings": [], "metadata": { } }, "codex": { } },
      "consolidated": [],
      "reviewerErrors": [],
      "findingsPersisted": true,
      "startedAt": "2026-03-15T14:30:22Z",
      "completedAt": "2026-03-15T14:32:11Z"
    }
  ],
  "startedAt": "2026-03-15T14:30:22Z",
  "completedAt": null
}

Key fields:

  • sessionId — timestamp-based unique ID (e.g. 20260315-143022)
  • statusactive, expired (scope base changed, requires reset), or completed
  • scope — diff scope detected at session creation (with pathFilters if any)
  • currentRound — most recent round number
  • worktreeHash — SHA-256 over HEAD + staged + unstaged + untracked files; per-round snapshot for stale detection
  • rounds[] — per-round artifacts: phase, reviews (per-reviewer findings + metadata), consolidated, reviewerErrors (preserved across crash recovery), findingsPersisted, timestamps
  • startedAt — session creation timestamp

Session lifecycle:

  • review-orchestra review → creates or continues session, runs reviewers + consolidation, returns findings
  • review-orchestra reset → clears the session (equivalent to rm -rf .review-orchestra/)
  • Session auto-expires if the scope base changes (e.g., new commits on main) — stale session error, user must reset and start fresh
  • Detached HEAD: baseBranch becomes detached@<sha7> (not the literal "HEAD") so cross-round comparison stays stable

Crash recovery: if a round was interrupted mid-reviewing or mid-consolidating, the next review invocation resumes from the persisted phase. Saved reviewer findings and reviewer errors carry forward.

Configuration

Configuration lives in config/default.json:

{
  "reviewers": {
    "claude": {
      "enabled": true,
      "command": "claude -p - --allowedTools \"Read,Grep,Glob,Bash\" --output-format json",
      "outputFormat": "json"
    },
    "codex": {
      "enabled": true,
      "command": "codex exec - --output-last-message {outputFile} --json",
      "outputFormat": "json"
    }
  },
  "thresholds": {
    "stopAt": "p1"
  }
}

Thresholds

| Setting | Default | Description | |---|---|---| | stopAt | p1 | Suggests which findings to fix. p0 = critical only, p1 = critical + functional, p2 = + quality, p3 = fix everything. The user controls the loop and decides when to stop. |

Finding comparison

Cross-round finding comparison uses LLM-based semantic matching by default (via Claude Haiku). This handles renamed files, shifted line numbers, and reworded descriptions across review rounds. On LLM failure or timeout, it falls back to heuristic matching (file + title.toLowerCase()). Configure via findingComparison in config/default.json:

| Setting | Default | Description | |---|---|---| | method | "llm" | Comparison method: "llm" for semantic matching, "heuristic" for file+title matching | | model | "claude-haiku-4-5" | Model to use for LLM comparison | | timeoutMs | 60000 | Timeout for LLM comparison call (ms) | | fallback | "heuristic" | Fallback method when LLM fails |

Severity model

Findings are classified on two independent axes, then a P-level is derived:

| | Critical | Functional | Quality | Nitpick | |---|---|---|---|---| | Verified | P0 | P1 | P2 | P3 | | Likely | P0 | P1 | P2 | P3 | | Possible | P1 | P2 | P3 | P3 | | Speculative | P2 | P3 | P3 | P3 |

Pre-existing findings (outside the diff hunks) are tagged and excluded from recommendations. In supervised mode, the user can explicitly request fixing a pre-existing finding.

Component overview

review-orchestra/
├── src/
│   ├── orchestrator.ts               # Main orchestration: preflight → reviewers → consolidate → return ReviewResult
│   ├── reviewers/
│   │   ├── types.ts                  # Reviewer interface (+ ReviewerCallContext)
│   │   ├── claude.ts                 # Claude headless reviewer
│   │   ├── codex.ts                  # Codex headless reviewer
│   │   ├── command.ts                # Command template parsing (handles \" and \\ escapes)
│   │   ├── prompt.ts                 # Review prompt builder
│   │   ├── raw-output.ts             # persistRawOutput helper — saves reviewer stdout before parsing
│   │   └── index.ts                  # Registry / factory (GenericReviewer for custom reviewers)
│   ├── consolidator.ts              # Dedup, classify, merge findings
│   ├── findings-store.ts            # Persistent cross-session finding storage (~/.review-orchestra/findings.jsonl)
│   ├── fuzzy-match.ts               # Fuzzy matching for cross-reviewer dedup (tokenize, Jaccard similarity, isFuzzyMatch)
│   ├── scope.ts                     # Diff scope auto-detection
│   ├── config.ts                    # Configuration loading & defaults
│   ├── types.ts                     # Shared types (Finding, Round, SessionState, ReviewResult, etc.)
│   ├── state.ts                     # SessionManager: session-based state tracking
│   ├── worktree-hash.ts             # Worktree hash computation and stale detection
│   ├── finding-comparison.ts        # Cross-round finding comparison (new/persisting/resolved)
│   ├── reviewer-parser.ts           # Parse/normalize reviewer output
│   ├── parse-args.ts                # Natural language CLI argument parsing
│   ├── process.ts                   # Process spawning with streaming
│   ├── toolchain.ts                 # Project tech stack detection
│   ├── progress.ts                  # Progress file (reviewer status during review, progress.json)
│   ├── preflight.ts                 # Validates required binaries
│   ├── checks.ts                    # Shared check functions for setup/doctor
│   ├── setup.ts                     # Setup command (runs checks + fixes)
│   ├── doctor.ts                    # Doctor command (runs checks + reports)
│   ├── json-utils.ts                # JSON extraction & envelope unwrapping
│   └── log.ts                       # Logging utilities
├── config/
│   └── default.json                  # Default configuration (reviewers, thresholds)
├── skill/
│   └── SKILL.md                      # Claude Code skill entry point (supervised flow)
├── prompts/
│   └── review.md                     # Template for reviewer agents
├── schemas/
│   └── findings.schema.json          # JSON schema for structured findings output
├── test/                             # Unit/integration tests (Vitest)
└── evals/                            # LLM eval harness

The orchestrator (src/orchestrator.ts) runs a single pass: preflight → parallel reviewers → consolidation → finding comparison → return ReviewResult. There is no loop — the user controls re-review decisions via the skill.

State is managed by SessionManager (src/state.ts), which tracks sessions across multiple CLI invocations. Each invocation creates or continues a session, persisting round artifacts and worktree hashes for stale-detection and finding comparison.

Adding custom reviewers

Any CLI tool that accepts a prompt and produces output can be a reviewer. Implement the Reviewer interface:

// src/reviewers/types.ts
interface Reviewer {
  name: string;
  review(
    prompt: string,
    scope: DiffScope,
    context: ReviewerCallContext, // { roundNumber } — used to write round-N-<name>-raw.txt before parsing
  ): Promise<{ findings: Finding[]; rawOutput: string; elapsedMs?: number }>;
}

Or add a generic reviewer via config — any command with a {prompt} placeholder works:

{
  "reviewers": {
    "my-linter": {
      "enabled": true,
      "command": "my-tool review \"{prompt}\"",
      "outputFormat": "json"
    }
  }
}

The consolidator normalizes different output formats into the standard findings schema. Custom reviewers registered in config are handled by the GenericReviewer class, which executes the command and parses the output.

Tests and evals

# Lint (type-check)
npm run lint

# Unit/integration tests (Vitest)
npm test

# Run all evals (LLM-as-judge against synthetic repos)
npm run eval

# Run a single eval fixture
npm run eval -- sql-injection

# Override the judge model
npm run eval -- --judge-model claude-sonnet-4-6

Tests cover deterministic components (scope detection, consolidator, config, session management, reviewer parser, finding comparison, worktree hashing, CLI subcommands) using TDD. LLM-facing components (reviewer adapters, orchestrator wiring) have integration tests. The eval harness uses synthetic repos with planted bugs and LLM-as-judge scoring for precision, recall, and severity accuracy.

License

MIT