@lcv-ideas-software/cross-review-v2

v2.4.1

Published

8 hours ago

API-first MCP server for multi-model cross-review with unanimous convergence gates.

0High
0Medium
0Low

lcv-leo

mcp model-context-protocol cross-review multi-agent openai anthropic gemini deepseek api

cross-review-v2

MCP server orchestrating API-first cross-review between Claude, ChatGPT Codex, Gemini, and DeepSeek with unanimous convergence gates.

Install. npm install -g @lcv-ideas-software/cross-review-v2 (npmjs.com) or npm install -g @lcv-ideas-software/cross-review-v2 --registry=https://npm.pkg.github.com (GitHub Packages mirror).

Status. Stable. Current release: v02.04.01 (npm package 2.4.1) paired with an API-first stable public surface. See CHANGELOG.md for the release history. v2.x releases use the organization display-tag standard (v00.00.00) while npm packages keep SemVer (2.x.y). The stable public rename from the temporary development name was completed at v02.01.00 / npm package 2.1.0, and all active docs, package metadata, publishing workflows, and runtime identity now use cross-review-v2.

The version history at a glance:

| Release | Scope | | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | v02.04.01 | CI hotfix for the v2.4.0 stub fail-fast gate. The v2.4.0 P1.1 fail-fast threw at module-import time when CI's workflow env already had CROSS_REVIEW_V2_STUB=1 set, because mcp/server.ts ran main() on every import (including the smoke harness's import { SessionIdSchema, ... }). v2.4.1 guards main() with the canonical ESM "is main module" check so importing named exports no longer spins up a server, and adds CROSS_REVIEW_V2_STUB_CONFIRMED: "1" to both ci.yml and publish.yml as defense in depth. | | v02.04.00 | Audit-closure hardening pass. Closes 18 priorities + 5 misc items from the v2.3.3 internal audit. MCP schema caps on task/draft/initial_draft; status parser 64 KiB byte-cap before JSON.parse; per-peer streaming buffer cap (16 MiB) on Anthropic, OpenAI, Gemini, DeepSeek; atomicWriteFile retry+backoff with crypto nonce on the temp filename; orphan .tmp boot sweep + stale meta.in_flight boot sweep; SessionIdSchema lowercase normalization; CROSS_REVIEW_V2_DATA_DIR tilde expansion; retry backoff jitter; Retry-After header extraction in all four providers; 502/503/504 marked retryable; AbortSignal propagation in Gemini; in-memory monotonic event seq counter; double-confirmation for CROSS_REVIEW_V2_STUB=1 (now requires NODE_ENV=test OR CROSS_REVIEW_V2_STUB_CONFIRMED=1); per-session format-recovery quota of 6; cost preflight amplified for retry/fallback chains; redact() extended to env-style assignments (PASSWORD=, API_KEY=, Authorization:); dashboard ships Content-Security-Policy + X-Frame-Options: DENY; SECURITY.md now documents the single-user trusted-host threat model and multi-host caveats; convergence strict equality; model-selection nullish coalescing + env-override validation. | | v02.03.03 | Review focus shielding and FinOps gates. The front-loaded review_focus block is now wrapped in escaped <review_focus>...</review_focus> tags, and paid API calls are blocked until explicit cost ceilings and per-peer rate cards are configured. | | v02.03.02 | README/metadata release hygiene. Reissued the README organizational standardization after Prettier formatting and active-document rename cleanup, keeping the latest release green end-to-end. | | v02.03.01 | README organizational standardization. Harmonized the public README opening with the shared organizational pattern, preserving the API-first operational sections while aligning badges, status framing, and version-history presentation. | | v02.03.00 | Review focus tightening. Added provider-neutral review_focus across the main orchestration tools, front-loaded it in prompts, stripped accidental /focus prefixes, and introduced explicit OUT OF SCOPE guidance so reviewers stay anchored without hiding critical blockers. | | v02.02.00 | Live token streaming. Added real provider token streaming, count-based progress events, CROSS_REVIEW_V2_STREAM_TOKENS, optional redacted streamed text for trusted diagnostics, and a real API streaming smoke. | | v02.01.01 | Hardening and advanced-model enforcement. Closed CodeQL issues in redaction/logging, added decision-retry recovery, standardized CROSS_REVIEW_V2_MAX_OUTPUT_TOKENS, removed weak/deprecated model fallbacks, and enforced advanced thinking-capable peer selection. | | v02.01.00 | First stable release as cross-review-v2. Promoted the API-first implementation to stable, added cancellation, restart recovery, metrics, runtime capabilities, fallback events, and completed the public rename to cross-review-v2. | | v2.0.x | Foundation hardening before stability. The pre-stable v02.00.xx line built the durable session model, dashboard/reporting surface, provider adapters, retry behavior, and release/publishing baseline that enabled the stable cut. |

What It Does

cross-review-v2 is the stable API-first implementation of the cross-review pattern. It does not execute Claude CLI, Codex CLI, Gemini CLI, DeepSeek CLI, PowerShell shells, or terminal sessions. The peers are called through provider APIs and official client libraries:

OpenAI client library for the Codex/OpenAI peer.
Anthropic TypeScript client library for Claude.
Google Gen AI client library for Gemini.
OpenAI-compatible DeepSeek API through the OpenAI client library.

Runtime calls are real provider calls by default. Stubs exist only for smoke tests and CI when CROSS_REVIEW_V2_STUB=1.

Topology

cross-review-v2 is MCP stdio on the outside and provider-API orchestration on the inside. The caller host opens a durable session, the server fans out to the four peers through official APIs, and convergence is granted only when the unanimity gate is satisfied.

Peers and Transport

| Peer | Transport | Notes | | ---------- | -------------------------------------------------------- | --------------------------------------------------------------------------- | | codex | OpenAI API via official client library | Advanced reasoning model selection with explicit reasoning effort controls. | | claude | Anthropic API via official TypeScript SDK | Advanced Opus-class selection with adaptive thinking. | | gemini | Google Gen AI SDK | Advanced Gemini 3.1 Pro preview selection with thinking-capable preference. | | deepseek | DeepSeek OpenAI-compatible API via OpenAI client library | Advanced deepseek-v4-pro selection with reasoning effort controls. |

Rename Notice

Starting with stable version 2.1.0, this project is named cross-review-v2. The previous development name is retained only in historical changelog and memory notes.

Secrets

API keys are read only from Windows environment variables. This project does not save API keys in JSON, .env, logs, session files, or prompts.

PowerShell examples:

[Environment]::SetEnvironmentVariable("OPENAI_API_KEY", "<OPENAI_API_KEY>", "User")
[Environment]::SetEnvironmentVariable("ANTHROPIC_API_KEY", "<ANTHROPIC_API_KEY>", "User")
[Environment]::SetEnvironmentVariable("GEMINI_API_KEY", "<GEMINI_API_KEY>", "User")
[Environment]::SetEnvironmentVariable("DEEPSEEK_API_KEY", "<DEEPSEEK_API_KEY>", "User")

Restart the terminal or application after changing Windows environment variables.

Model Selection

At startup/session initialization, the server queries provider model APIs when keys are present and selects the highest-capability model available to that key according to documented provider priorities.

Current documented priority defaults:

OpenAI/Codex: gpt-5.5 with CROSS_REVIEW_OPENAI_REASONING_EFFORT=xhigh.
Anthropic/Claude: claude-opus-4-7 with adaptive thinking and CROSS_REVIEW_ANTHROPIC_REASONING_EFFORT=xhigh.
Google/Gemini: gemini-3.1-pro-preview.
DeepSeek: deepseek-v4-pro.

Cross-review requires advanced thinking/reasoning-capable models. Model priority lists must not include provider models that are known to lack thinking support, low-capacity models that are unsuitable for peer review, or models marked for deprecation in official provider documentation. If no advanced priority model is available to a key, the runtime keeps the documented advanced fallback so the problem is visible instead of silently downgrading.

Explicit env var overrides always win:

[Environment]::SetEnvironmentVariable("CROSS_REVIEW_OPENAI_MODEL", "gpt-5.5", "User")
[Environment]::SetEnvironmentVariable("CROSS_REVIEW_OPENAI_REASONING_EFFORT", "xhigh", "User")
[Environment]::SetEnvironmentVariable("CROSS_REVIEW_ANTHROPIC_MODEL", "claude-opus-4-7", "User")
[Environment]::SetEnvironmentVariable("CROSS_REVIEW_ANTHROPIC_REASONING_EFFORT", "xhigh", "User")
[Environment]::SetEnvironmentVariable("CROSS_REVIEW_GEMINI_MODEL", "gemini-3.1-pro-preview", "User")
[Environment]::SetEnvironmentVariable("CROSS_REVIEW_DEEPSEEK_MODEL", "deepseek-v4-pro", "User")
[Environment]::SetEnvironmentVariable("CROSS_REVIEW_DEEPSEEK_REASONING_EFFORT", "max", "User")

Optional fallback model lists are comma-separated:

[Environment]::SetEnvironmentVariable("CROSS_REVIEW_OPENAI_FALLBACK_MODELS", "gpt-5.4,gpt-5.3", "User")

Each probe records the selected model, candidates returned by the API, source URL, confidence and selection reason.

Output Token Budget

CROSS_REVIEW_V2_MAX_OUTPUT_TOKENS controls the maximum output budget sent to all peer providers. The same value is applied to OpenAI, Anthropic, Gemini and DeepSeek for review calls and generation calls.

Default: 20000.

Set it in the MCP host configuration when you want the limit to travel with that MCP server entry:

[mcp_servers.cross-review-v2]
tool_timeout_sec = 1800
command = "C:/Users/leona/AppData/Roaming/npm/cross-review-v2.cmd"
args = []
env_vars = ["OPENAI_API_KEY", "ANTHROPIC_API_KEY", "GEMINI_API_KEY", "DEEPSEEK_API_KEY"]
env = { CROSS_REVIEW_V2_MAX_OUTPUT_TOKENS = "20000" }

You can also set it as a Windows environment variable:

[Environment]::SetEnvironmentVariable("CROSS_REVIEW_V2_MAX_OUTPUT_TOKENS", "20000", "User")

Invalid, zero or negative values are ignored and the runtime falls back to 20000.

CROSS_REVIEW_V2_MAX_REVIEW_FOCUS_CHARS controls the maximum length of the optional review_focus prompt block. Default: 2000.

Financial Controls

Cost controls are mandatory for real provider calls. cross-review-v2 does not hard-code provider prices or financial fallback limits, because model pricing changes outside this repository and each operator has a different budget tolerance. If any required cost variable is missing, paid tools such as ask_peers, run_until_unanimous, session_start_round and session_start_unanimous stop before calling provider APIs and return a clear financial_controls_missing diagnostic listing the missing variables.

These settings have two jobs:

Budget ceilings define how much one session or one round is allowed to cost before the run is blocked or stopped.
Rate cards tell the runtime how to estimate provider cost from input/output token usage.

Without both, the server cannot honestly enforce a budget, so it refuses to spend API credits.

Required budget ceilings:

CROSS_REVIEW_V2_MAX_SESSION_COST_USD: maximum estimated cost for one complete session. This is the main safety ceiling.
CROSS_REVIEW_V2_PREFLIGHT_MAX_ROUND_COST_USD: maximum estimated cost for a single round before that round starts. This prevents an unexpectedly large prompt from starting an expensive fan-out.
CROSS_REVIEW_V2_UNTIL_STOPPED_MAX_COST_USD: required ceiling for until_stopped=true runs. This exists because open-ended unanimity loops must still have a financial stop condition.

Required rate cards, in USD per million tokens:

CROSS_REVIEW_OPENAI_INPUT_USD_PER_MILLION
CROSS_REVIEW_OPENAI_OUTPUT_USD_PER_MILLION
CROSS_REVIEW_ANTHROPIC_INPUT_USD_PER_MILLION
CROSS_REVIEW_ANTHROPIC_OUTPUT_USD_PER_MILLION
CROSS_REVIEW_GEMINI_INPUT_USD_PER_MILLION
CROSS_REVIEW_GEMINI_OUTPUT_USD_PER_MILLION
CROSS_REVIEW_DEEPSEEK_INPUT_USD_PER_MILLION
CROSS_REVIEW_DEEPSEEK_OUTPUT_USD_PER_MILLION

Example with the local budget ceiling preferred by this workspace:

[Environment]::SetEnvironmentVariable("CROSS_REVIEW_V2_MAX_SESSION_COST_USD", "20", "User")
[Environment]::SetEnvironmentVariable("CROSS_REVIEW_V2_PREFLIGHT_MAX_ROUND_COST_USD", "20", "User")
[Environment]::SetEnvironmentVariable("CROSS_REVIEW_V2_UNTIL_STOPPED_MAX_COST_USD", "20", "User")

Set the eight provider rate-card variables from current official provider pricing before running paid cross-review sessions. server_info reports financial_controls.paid_calls_ready and the exact missing variable names.

For example, if official pricing says a provider model costs 15.00 USD per million input tokens and 75.00 USD per million output tokens, configure that provider with:

[Environment]::SetEnvironmentVariable("CROSS_REVIEW_OPENAI_INPUT_USD_PER_MILLION", "15", "User")
[Environment]::SetEnvironmentVariable("CROSS_REVIEW_OPENAI_OUTPUT_USD_PER_MILLION", "75", "User")

Repeat the same pattern for Anthropic, Gemini and DeepSeek using their current official prices. Restart the MCP host after changing Windows environment variables.

You can also set the same variables directly in the MCP host configuration. This is useful when you want a specific host entry to carry a fixed budget policy:

[mcp_servers.cross-review-v2]
tool_timeout_sec = 1800
command = "C:/Users/leona/AppData/Roaming/npm/cross-review-v2.cmd"
args = []
env_vars = [
  "OPENAI_API_KEY",
  "ANTHROPIC_API_KEY",
  "GEMINI_API_KEY",
  "DEEPSEEK_API_KEY",
  "CROSS_REVIEW_OPENAI_INPUT_USD_PER_MILLION",
  "CROSS_REVIEW_OPENAI_OUTPUT_USD_PER_MILLION",
  "CROSS_REVIEW_ANTHROPIC_INPUT_USD_PER_MILLION",
  "CROSS_REVIEW_ANTHROPIC_OUTPUT_USD_PER_MILLION",
  "CROSS_REVIEW_GEMINI_INPUT_USD_PER_MILLION",
  "CROSS_REVIEW_GEMINI_OUTPUT_USD_PER_MILLION",
  "CROSS_REVIEW_DEEPSEEK_INPUT_USD_PER_MILLION",
  "CROSS_REVIEW_DEEPSEEK_OUTPUT_USD_PER_MILLION",
]
env = {
  CROSS_REVIEW_V2_MAX_SESSION_COST_USD = "20",
  CROSS_REVIEW_V2_PREFLIGHT_MAX_ROUND_COST_USD = "20",
  CROSS_REVIEW_V2_UNTIL_STOPPED_MAX_COST_USD = "20"
}

If a run is blocked, call server_info and check:

financial_controls.paid_calls_ready: false means the server is intentionally refusing paid calls.
financial_controls.missing_variables: exact variable names that must be configured.
financial_controls.policy: the reason paid calls are blocked.

Token Streaming

Token streaming is enabled by default. Provider progress is written to the session event stream as peer.token.delta events with character counts, followed by one peer.token.completed event per peer call. This lets MCP hosts, dashboards and future UIs show long-running work as it happens instead of waiting for the complete provider response.

[Environment]::SetEnvironmentVariable("CROSS_REVIEW_V2_STREAM_TOKENS", "1", "User")

Disable token streaming only if a host cannot consume frequent session events:

[Environment]::SetEnvironmentVariable("CROSS_REVIEW_V2_STREAM_TOKENS", "0", "User")

For safety, streamed event text is not included by default. Enable it only for trusted local diagnostics:

[Environment]::SetEnvironmentVariable("CROSS_REVIEW_V2_STREAM_TEXT", "1", "User")

When CROSS_REVIEW_V2_STREAM_TEXT=1, emitted text is passed through the same redaction layer used for session artifacts before it is persisted to events.ndjson. Even then, streaming text should be treated as diagnostic data, because providers can split sensitive strings across chunks. The default count-only mode avoids that class of leakage.

server_info exposes the effective streaming configuration, and runtime_capabilities.token_streaming reflects the active CROSS_REVIEW_V2_STREAM_TOKENS setting. Streaming does not change the unanimity gate and does not persist raw chain-of-thought.

Install

npm install
npm run build

Run MCP Server

npm run build
node dist/src/mcp/server.js

Real peer calls can take longer than a generic MCP client's default 60-second request timeout. Hosts and test clients should use at least 300s for MCP tool calls:

[mcp_servers.cross-review-v2]
tool_timeout_sec = 300
command = "node"
args = ["C:/Users/leona/lcv-workspace/cross-review-v2/dist/src/mcp/server.js"]
env_vars = ["OPENAI_API_KEY", "ANTHROPIC_API_KEY", "GEMINI_API_KEY", "DEEPSEEK_API_KEY"]
env = { CROSS_REVIEW_V2_MAX_OUTPUT_TOKENS = "20000" }

Provider HTTP calls use CROSS_REVIEW_V2_TIMEOUT_MS, which defaults to 30 minutes. The 300s setting above is for the MCP client-to-server request.

For local no-cost smoke tests only:

$env:CROSS_REVIEW_V2_STUB="1"
npm test

For a real provider streaming check with the four API keys, run:

npm run api-streaming-smoke

The real smoke prints model names, status, usage and token-event counts only. It does not print prompts, provider text or API keys.

Dashboard

npm run dashboard

Then open http://127.0.0.1:4588.

MCP Tools

server_info
runtime_capabilities
probe_peers
session_init
session_list
session_read
ask_peers
session_start_round
run_until_unanimous
session_start_unanimous
session_cancel_job
session_recover_interrupted
session_poll
session_events
session_metrics
session_report
session_check_convergence
session_attach_evidence
escalate_to_operator
session_sweep
session_finalize

Review Focus

Use optional review_focus when a broad review needs a stable scope anchor, for example services/billing, src/core/session-store.ts, or release automation.

The field is available on session_init, ask_peers, session_start_round, run_until_unanimous and session_start_unanimous. Session-level focus is saved as meta.review_focus; per-call focus overrides it for that round or unanimous run. The runtime injects the value as a bounded/redacted Review Focus block at the start of generation, review, revision and retry prompts, wrapping the operator-provided text in escaped <review_focus>...</review_focus> delimiters. The tagged content is treated as scope data, not as instructions that override the cross-review protocol, response schema, safety rules, or task directives. If an operator accidentally pastes a leading /focus, the prefix is stripped during normalization and only the plain scope text is forwarded.

The injected block also tells reviewers to label possible findings outside that focus as OUT OF SCOPE instead of counting them as blocking issues, unless the issue is a critical cross-cutting blocker that invalidates the result. This keeps broad reviews anchored without hiding genuinely fatal problems.

This is intentionally not Claude Code's /focus slash command. Official Claude Code docs describe /focus as a focus-mode UI toggle; Cross Review uses review_focus so the same instruction works for OpenAI/Codex, Anthropic/Claude, Gemini and DeepSeek.

Observe the Session

Session metadata records in-flight rounds, convergence scope, convergence health, failed attempts, operator escalations, fallback events and attached evidence files. Each session can also produce events.ndjson, aggregate metrics and session-report.md, so long-running runs can be followed without waiting for a synchronous MCP call to return.

session_start_round and session_start_unanimous return immediately with a job_id and session_id. Use session_poll for state, session_events for incremental events, session_metrics for cost/latency/failure summaries, session_cancel_job for cooperative cancellation and session_report for the current Markdown report.

Provider responses that report a different model from the model requested are recorded as silent_model_downgrade failures and block convergence. Responses that cannot be parsed after one automatic format-recovery retry are recorded as unparseable_after_recovery failures.

When a provider rejects a prompt through moderation or safety filtering, the orchestrator records prompt_flagged_by_moderation, retries once with a compact sanitized review prompt, and marks successful retries with decision_quality: recovered. This is designed for verbose peer discussions that should be summarized, not replayed verbatim, in later prompts.

Secret redaction is applied when prompts, responses, evidence and JSON metadata are written. The redactor covers known API-key and token formats; new credential formats should be added before public test fixtures are promoted.

Security

Public-repo ready .gitignore.
No secrets in committed files.
GitHub Pages via Actions artifact deployment.
Dependabot configured.
Dependabot automerge workflow prepared.
Pushes to main auto-create an organization-standard display tag such as v02.01.00 from package.json; the tag then creates a normal GitHub Release and publishes @lcv-ideas-software/cross-review-v2 to npmjs.com and GitHub Packages.
CodeQL must be enabled through GitHub Default Setup after repository creation. Advanced Setup requires prior authorization.

Status

Current version: v02.03.03 (npm package 2.3.3).

Version v02.01.00 (npm package 2.1.0) is the first stable release of cross-review-v2.

License

Apache License 2.0 — see LICENSE and NOTICE.