codeagora

v2.3.4

Published

13 days ago

Multi-LLM code review pipeline — parallel reviewers, structured debate, consensus verdict

0High
0Medium
0Low

code-review llm multi-agent cli tui ai static-analysis

Multiple LLMs review your code in parallel, debate conflicting opinions, then a head agent delivers the final verdict. Different models catch different bugs — consensus filters the noise.

Quick Start

npm i -g codeagora
agora init
git diff | agora review

agora init auto-detects your API keys and CLI tools, then generates a config.

Supported Providers (Tier 1)

| Provider | Type | Cost | |----------|------|------| | Groq | API | Free | | Anthropic | API | Paid | | Claude Code | CLI | Subscription | | Gemini CLI | CLI | Free | | Codex CLI | CLI | Subscription |

Full provider list (24+ API, 12 CLI) ->

How It Works

git diff | agora review

  Pre  --- Semantic Diff Classification
       --- TypeScript Diagnostics
       --- Change Impact Analysis
            |
  L1   --- Reviewer A (security) --+
       --- Reviewer B (logic)    --+-- parallel specialist reviews
       --- Reviewer C (general)  --+
            |
  Filter -- Hallucination Check (file/line validation)
       --- Self-contradiction Filter
       --- Evidence Dedup
            |
  L2   --- Adversarial Discussion (supporters must disprove)
       --- Static analysis evidence in debate
            |
  L3   --- Head Agent --> ACCEPT / REJECT / NEEDS_HUMAN
            |
  Output -- Triage: N must-fix / N verify / N ignore

Web Dashboard

Real-time web UI for monitoring reviews, browsing sessions, and managing configuration.

agora dashboard          # Start on http://localhost:6274
agora dashboard -p 8080  # Custom port

Features:

9 pages — Dashboard, Sessions, Models, Costs, Discussions, Config, Pipeline, Compare, Review Detail
Live pipeline — WebSocket-powered real-time stage progression and discussion updates
Model intelligence — Leaderboard, quality trends, selection frequency charts
httpOnly cookie auth — Secure token exchange via POST /api/auth
Server-side pagination — Filterable by status, search, date range

The dashboard token is printed on startup and persisted to .ca/dashboard-token.

Interactive TUI

Terminal UI for running reviews without leaving the terminal.

agora tui

8 screens: Review Setup, Pipeline Progress, Results, Diff Viewer, Debate, Config, Model Selector, Provider Status. Navigate with arrow keys, Enter to select, q to quit.

MCP Server (Claude Code / Cursor)

9-tool MCP server for AI IDE integration.

// claude_desktop_config.json or .cursor/mcp.json
{
  "mcpServers": {
    "codeagora": {
      "command": "npx",
      "args": ["-y", "@codeagora/mcp"]
    }
  }
}

Tools: review_diff, review_pr, review_staged, session_list, session_detail, explain_session, config_get, config_set, health_check.

Notifications

agora notify 2026-03-27/001  # Send notification for a past session

Supported channels:

Discord — Real-time thread updates + summary (webhook URL in config)
Slack — Summary notification (webhook URL in config)
Generic webhook — HMAC-SHA256 signed payloads over HTTPS

Configure in .ca/config.json under notifications.

Extensions

All extensions are optional — install only what you need.

| Package | Install | What it does | |---------|---------|-------------| | @codeagora/web | npm i -g @codeagora/web | Web dashboard — 9-page SPA with real-time pipeline monitoring, session history, model leaderboard, cost tracking | | @codeagora/tui | npm i -g @codeagora/tui | Interactive terminal UI — run reviews, browse sessions, edit config, watch debates in real-time | | @codeagora/mcp | npm i -g @codeagora/mcp | MCP server (9 tools) — integrates with Claude Code, Cursor, and any MCP-compatible IDE | | @codeagora/notifications | npm i -g @codeagora/notifications | Webhooks — Discord (real-time threads + summary), Slack (summary), generic (HMAC-SHA256 signed) |

Each extension works standalone or together. The core codeagora CLI includes everything needed for command-line reviews and GitHub Actions.

Extension guide ->

GitHub Actions

Add CodeAgora to any repo in 2 steps:

1. Create .ca/config.json (or run agora init):

{
  "mode": "pragmatic",
  "reviewers": [
    { "id": "r1", "model": "llama-3.3-70b-versatile", "backend": "api", "provider": "groq", "enabled": true, "timeout": 120 },
    { "id": "r2", "model": "qwen/qwen3-32b", "backend": "api", "provider": "groq", "enabled": true, "timeout": 120 },
    { "id": "r3", "model": "meta-llama/llama-4-scout-17b-16e-instruct", "backend": "api", "provider": "groq", "enabled": true, "timeout": 120 }
  ]
}

2. Add the workflow (.github/workflows/codeagora-review.yml):

name: CodeAgora Review
on:
  pull_request:
    types: [opened, synchronize, reopened]

permissions:
  contents: read
  pull-requests: write
  statuses: write

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: bssm-oss/CodeAgora@v2
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
        env:
          GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}

3. Add GROQ_API_KEY to your repo's Settings > Secrets > Actions.

Every PR gets inline review comments, a summary verdict, and a commit status check. Add review:skip label to any PR to bypass.

Documentation

| Doc | Content | |-----|---------| | CLI Reference | All commands and options | | Configuration | Config file guide | | Providers | Full provider list with tiers | | Architecture | Pipeline design and project structure | | Extensions | Web, TUI, MCP, Notifications | | Troubleshooting | Common errors and fixes, exit codes | | FAQ | Frequently asked questions |

Development

pnpm install && pnpm build
pnpm test          # 3386 tests
pnpm test:coverage # with coverage report
pnpm typecheck
pnpm cli review path/to/diff.patch

Benchmarks

Golden-bug fixtures under benchmarks/golden-bugs/ drive the false-negative measurement framework (see #472).

Score pre-computed results (fast, no API calls):

pnpm bench:fn -- --validate-only                     # schema-check fixtures
pnpm bench:fn -- --results path/to/results-dir       # score against pre-computed review output
pnpm bench:fn -- --results path/to/results-dir --json  # CI-friendly JSON report

Run the live pipeline against every fixture (produces the results dir above):

export OPENROUTER_API_KEY=...
pnpm bench:fn:run -- --results ./bench-out
pnpm bench:fn     -- --results ./bench-out

The driver uses benchmarks/.ca/config.json — a lean 3-reviewer OpenRouter setup. A full run over the 4 seed fixtures costs roughly $0.04–$0.10 depending on discussion rounds. Add --fixtures id1,id2 to restrict, --skip-head to skip the L3 verdict stage.

Two fixture kinds live side by side:

Recall cases (expectedFindings non-empty) — review must surface each listed bug. Misses count as FN.
FP regression cases (expectedFindings is []) — review must report nothing. Any finding is a regression.

Current seed fixtures: 3 recall cases (off-by-one, null-deref, SQL injection) + 1 FP regression (PR #490 moderator regex). See benchmarks/golden-bugs/README.md for fixture format.

Baseline (n=3, 2026-04-20)

Three live runs with the default 3-reviewer OpenRouter config (#24666562754, #24667305646, #24667897271):

| Metric | Mean | Min | Max | |---|---|---|---| | recall@3 | 100.0% | 100.0% | 100.0% | | recall@5 | 100.0% | 100.0% | 100.0% | | recall@10 | 100.0% | 100.0% | 100.0% | | FPs per fp-regression fixture | 2.3 | 2 | 3 | | fp-regression triggered | 3/3 runs |

Recall stable — all three recall cases (off-by-one, null-deref, SQL injection) caught in top-3 on every run.

FP regression triggered on every run — but the content of the phantom findings shifts between runs: CRITICAL×3 about unhandled JSON.parse on run 1, WARNING×2 about regex DoS + input size on run 2, WARNING + CRITICAL about unbounded string + missing type import on run 3. Each individual claim is a plausible-sounding, code-level assertion that the review would make against a real diff, which is exactly why the current calibration stack does not filter them. This confirms the "high-confidence corroborated FP" blind spot documented in project_calibration_stack.md. This fixture is the regression gate for future calibration work (see #468).

License

MIT