@mmerterden/multi-agent-pipeline

v8.6.2

Published

a month ago

8-phase AI development pipeline. Full orchestration on Claude Code + Copilot CLI; knowledge layer (rules + skills) ports to Cursor, Windsurf, and Cline. Analysis, planning, TDD, CLI-aware parallel review with Opus triage, wiki generation, commit automatio

@mmerterden/multi-agent-pipeline

8-phase AI development pipeline for Claude Code and Copilot CLI. Multi-repo orchestration, platform identity routing, push-must-succeed policy, CLI-aware parallel review with Opus triage (Claude Code: 2-model Opus+Sonnet · Copilot CLI: 3-model GPT-5.4+Opus+Sonnet), commit/PR creation — and, as of 5.0.0, a generic Figma-to-Component pipeline that works on iOS (SwiftUI) and Android (Jetpack Compose) with the same workflow.

Current at-a-glance (live from filesystem)

| Surface | Count | |---|---| | Slash commands (colon-form /multi-agent:*) | 29 | | Copilot skills (dash-form multi-agent-*) | 29 | | Third-party tool adapters (Tier 2 + Tier 3) | 6 | | Store-compliance skills (apple-archive-compliance, google-play-compliance) | 2 | | Figma skills (iOS + Android + Common) | 37 | | External skill catalog (shared/external/) | 127 | | Total SKILL.md files across all groups | 196 | | Smoke suites | 88 | | Golden-task fixtures (pipeline/eval/golden-tasks/) | 2 | | Eval-triage fixtures | 11 | | JSON schemas | 13 | | Agent personas | 8 | | Pipeline phases | 8 |

What's new

Latest release: v8.5.6 (review-actions contract: inline comments + approve/needs-work + Bitbucket Server, 2026-05-08) — /multi-agent:review'in PR-mode kontratı yeniden yazıldı. v8.5.5'in tek-büyük-advisory-yorumu pattern'ı KALDIRILDI; yerine per-finding inline comment + explicit approve/needs-work state kontratı geldi. Karar kuralı: 0 accepted blocking AND 0 accepted important → APPROVE (yorum yok). ≥1 → NEEDS_WORK + her accepted blocking/important için inline yorum (anchored to file:line). Suggestion'lar PR'a hiç gitmez (chat-only). Bitbucket Server URL'leri artık first-class — full URL parser + REST API ile diff/comment/approve. PR yorum body'leri her zaman outputLanguage'e renderlanır (v8.5.5'te EN-only bug'ı vardı). Yeni: lib/post-pr-review.sh (provider-aware orchestrator), refs/channels/pr-review-actions.md (yeni kanonik kontrat), smoke-pr-review-actions.sh (decision rule + lang + provider switch lint). Silinen: pr-review-comment.md, render-pr-review-body.sh, smoke-pr-review-comment-template.sh — fork'lar post-pr-review.sh "$TASK_ID" çağrısına geçmeli. Previous: v8.5.5 (review accepts PR input + posts verdict back, 2026-05-08) — /multi-agent:review now takes 5 input shapes (empty / branch / #N / repo#N / full PR URL) and in PR-mode fetches the diff via gh pr diff and prompts to post the parallel reviewer verdict as one canonical comment per run. Renderer reads agent-state.review.* and emits the body for gh pr comment --body-file; new refs/channels/pr-review-comment.md template enforces verdict block + per-severity findings + triage notes + build/test + footer marker, with hard prohibitions on Closes/Fixes/Resolves keywords, em-dashes in prose, and model vendor names. New smoke gate smoke-pr-review-comment-template.sh lints anchors + wiring + input parser. Branch-mode invocations stay chat-only (no auto-post). Previous: v8.5.4 (drop 4 unused adapters, 2026-05-08) — Windsurf, Cline, Zed AI, and Continue.dev knowledge-layer adapters removed because the pipeline owner did not use them; their smoke surface added cost without benefit. Tier 1 (Claude Code + Copilot CLI orchestration) and Tier 2 (Cursor + GitHub Copilot Chat knowledge layer) are unchanged. The adapter base framework (pipeline/adapters/_base.mjs) is preserved — re-adding any dropped tool is a one-file PR. Previous: v8.5.3 (Defect 1 picker language axis fully closed, 2026-05-08) — phase docs that drive runtime picker rendering had legacy "ask in English (promptLanguage)" wording that overrode the v8.5.1 rules.md matrix; user with outputLanguage="tr" saw the Phase 6 WIP-checkout picker render in English. Fix narrows phase-6-commit.md + _dev-context.md + _repo-picker.md to the per-field matrix (question + description follow outputLanguage; label + header stay English) and adds smoke-language-axis.sh to prevent regression. Previous: v8.5.2 (v9 stability defects 5 / 8 / 10 closed, 2026-05-08) — Phase 1.5 Existing-Component Discovery gate prevents silent overwrite of components that already have a Code Connect mapping or an existing source file (5 statuses: GREENFIELD_OK / ALREADY_MAPPED / EXISTING_SOURCE_NO_CC / AMBIGUOUS_SOURCE / GAP_ANALYSIS); Phase 2 + Phase 6 ENTRY GATEs now consult discovery.status. Token telemetry block enforced in phases 1 / 2 / 3 / 4 — every LLM call must invoke phase-tracker.sh tokens <N> <in> <out> so live cost shows on the active phase tile. New smoke gates: smoke-tracker-tokens-invocation, smoke-worktree-path-convention (forbidden $HOME/.worktrees patterns), smoke-existing-discovery-gate, smoke-issue-comment-template, smoke-no-token-prompt. Previous: v8.5.1 (stability fixes for 9 GH1241 defects + genericity review, 2026-05-08) — language matrix narrowed (AskUserQuestion.label + header stay English, question + description follow outputLanguage), canonical refs/channels/issue-comment.md template + update-issue-progress.sh script (Phase 7 Step 1.5 wires comment + flag sync), keychain.md Rule 1 forbids mid-run token prompts (routes to Setup Wizard instead), phase-tracker.sh pointer-fallback now warns on concurrent init, dev.md Phase 3 dispatch on taskType==="component" keeps the figma 17-substep orchestrator running even in --dev mode. v8.5.0 added the GitHub Projects v2 board adapter as the 5th channel + figma source incremental sync. v8.4.1 added the provider-aware account picker (Bitbucket / GitHub / GitLab), network reachability gate before the branch picker, Phase 6 PR creation contract with default reviewers, and the explicit promptLanguage=en (spec docs) / outputLanguage=user-selectable (chat + external bodies) split. v8.4.0 retired the standalone ArchiveGuard Swift binary in favour of the ios_app_store_audit MCP tool in @mmerterden/dev-toolkit-mcp ≥ v2.4 — same 17 rules, same JSON shape, all four consumer surfaces work without changes. Cross-platform status: macOS Claude Code path is production-ready; Linux libsecret + Windows PowerShell paths are coded but unverified — see CHANGELOG for honest field-testing notes.

Four orthogonal additions, all advisory-by-default, all Cross-CLI parity preserved:

Per-task Cost Breakdown in agent-log.md — every Phase 7 run now appends a ## Cost Breakdown block with per-phase tokens (in/out) + estimated USD, sourced from phase-tracker.sh tokens accumulators and cost-table.json prices. Rendered by pipeline/scripts/render-agent-log-cost.sh. Independent of the channels-side reportContent.costSummary (PR/Jira gating). Token forwarder LOG_METRIC_FORWARD_TO_TRACKER=1 keeps metrics.jsonl and the tracker in sync from one call site.
Phase 4 Step 1.75 — Diff Risk Scoring (advisory): pipeline/scripts/diff-risk-score.mjs runs before reviewer dispatch and injects a top-N risk-ranked priority list into each reviewer's prompt. Heuristic, deterministic, sub-second, no LLM. Signals: security paths (×3), schema migrations (×4), public API surfaces (×2), no-test-change (×2.5), complexity delta (×1.5), UI-critical paths (×1.5), loc changed (×1). Default ON; flip prefs.global.diffRiskAdvisory = false to opt out.
Phase 5 Step 0 — Test Gap Report (advisory): pipeline/scripts/test-gap-scan.mjs walks the diff for newly added public symbols and reports those with no paired test. Stack-specific rules ship for iOS, Android, Python, and Node.js. iOS Views and Android @Composable symbols default to important; other public API additions to suggestion. Optional gating via prefs.testGap.blockingThreshold.
Phase 4 Triage Memory (advisory): per-repo append-only JSONL corpus at ~/.claude/memory/multi-agent/<repo-slug>/triage-corpus.jsonl records every accepted/deferred/rejected finding. Phase 7 ingests on completion (idempotent); Phase 1 enriches the analysis with similar past tasks; Phase 4 triage attaches prior-art hits to each raw finding. Token-overlap recall, zero deps. /multi-agent:search "<text>" --semantic flag routes the query to the corpus instead of agent-log grep. Default ON; flip prefs.global.priorArtEnrichment.enabled = false to opt out.

Cross-CLI parity: every change ships byte-identical on Claude Code (colon-form /multi-agent:*) and Copilot CLI (dash-form multi-agent-*). New smokes (5): smoke-agent-log-cost, smoke-diff-risk, smoke-test-gap, smoke-triage-memory plus extensions to smoke-cost-summary. New schemas: diff-risk.schema.json, test-gap.schema.json, triage-corpus.schema.json (3 new).

Full version history lives in CHANGELOG.md. Every release entry is recorded there to avoid drift between the two files.

Security issue? See SECURITY.md. Please do not open public issues for vulnerabilities.

Prerequisites

Node.js 18+
Claude Code (https://claude.ai/code) or GitHub Copilot CLI
macOS, Linux, or Windows (Git Bash / WSL). Native credential storage everywhere:
- macOS → Keychain (security)
- Linux → libsecret (secret-tool — apt install libsecret-tools / dnf install libsecret)
- Windows → Credential Manager (PowerShell CredentialManager module — Install-Module CredentialManager)
Runtime CLIs the pipeline shells out to:
- gh (GitHub CLI) for issue / PR flows
- jq for JSON parsing
- python3 (stdlib only, used by the deterministic keychain helper)
- bash 4+

The package is public on npmjs.org — no auth, no PAT, no ~/.npmrc setup needed:

npm view @mmerterden/multi-agent-pipeline version
# → prints "8.6.1" (or newer)

If npm view shows a stale version, run npm cache clean --force and retry.

New here? Worked end-to-end transcripts live in examples/ — bug fix from Jira, feature in autopilot, --dev fast path, and recovery from a broken run. Read one before you run the pipeline on your own code.
Something broke? docs/recovery-guide.md is the single-page index of every failure mode (triage fallback, worktree collision, state corruption, identity rewind, etc.) and its fix.
Planning breaking changes? ROADMAP.md tracks what's coming and what's been declined.

Quick Start

Option A — Install from source (recommended for development)

Clone the repo and run the installer directly — full read access to source files.

git clone [email protected]:mmerterden/multi-agent-pipeline.git
cd multi-agent-pipeline
npm install
node install.js                         # Claude Code (default)
node install.js --copilot               # Copilot CLI
node install.js --all                   # Both Claude + Copilot

# Third-party AI tool adapters (knowledge layer only)
node install.js --cursor                # Cursor (.cursor/rules/*.mdc + .cursorrules)
node install.js --copilot-chat          # GitHub Copilot Chat (.github/copilot-instructions.md)
node install.js --all-tools             # Everything: 2 orchestration (Claude + Copilot CLI) + 2 adapter targets
node install.js --cursor --target=/path/to/repo   # Adapter target override (default: cwd)

node install.js --link                  # Symlink mode (dev, saves ~10K tokens)

# Token-preserving uninstall (Keychain access tokens NEVER touched)
node install.js                         # ...later...
node pipeline/scripts/uninstall.mjs --dry-run     # preview what would be removed
node pipeline/scripts/uninstall.mjs --yes         # remove from every installed target
node pipeline/scripts/uninstall.mjs --cursor      # selective uninstall

# Optional: expose the CLI globally as 'multi-agent-pipeline'
npm link

IMPORTANT — run setup before your first task:

/multi-agent:setup

This discovers your Keychain tokens (Jira, Bitbucket, GitHub, etc.), sets up your git identity, and maps everything into ~/.claude/multi-agent-preferences.json. Without this step, the pipeline cannot find your tokens and will ask for them repeatedly.

Update later with git pull && npm install inside the clone. Pin to a specific version by checking out the corresponding tag (git tag -l to list).

Option B — npx (public registry, no auth)

npx @mmerterden/multi-agent-pipeline install            # Claude Code (default)
npx @mmerterden/multi-agent-pipeline install --copilot  # Copilot CLI
npx @mmerterden/multi-agent-pipeline install --all      # Both Claude + Copilot
npx @mmerterden/multi-agent-pipeline install --cursor       # Cursor adapter
npx @mmerterden/multi-agent-pipeline install --copilot-chat # GitHub Copilot Chat adapter
npx @mmerterden/multi-agent-pipeline install --all-tools    # Every supported tool
npx @mmerterden/multi-agent-pipeline install --link     # Symlink mode

# Token-preserving uninstall
npx @mmerterden/multi-agent-pipeline uninstall          # interactive, all targets
npx @mmerterden/multi-agent-pipeline uninstall --dry-run # preview only

Option C — Global install (public registry, no auth)

npm install -g @mmerterden/multi-agent-pipeline
multi-agent-pipeline install            # same flags apply

Privacy & Telemetry

Opt-in only. The installer sends nothing by default. If you want to help the project by sharing an anonymous install ping, opt in per-install with MULTI_AGENT_TELEMETRY=1:

MULTI_AGENT_TELEMETRY=1 node install.js
# or for npx/global
MULTI_AGENT_TELEMETRY=1 npx @mmerterden/multi-agent-pipeline install

When opted in, the ping includes:

Package name + version
Install method (source, npx, global, npm-install)
Flags passed (e.g. --copilot, --all)
Your GitHub username (via gh api user if authenticated) and Git email
Hostname, OS, arch, Node version

Nothing else is collected. Ping failures are silent — telemetry never blocks or slows down the install.

Troubleshooting

`sh: multi-agent-pipeline: command not found`

npx couldn't fetch the package. Most common causes:

Network blocked or proxy in front of npmjs.org.
Stale npx cache — npx clear-npx-cache then retry.
Wrong package name (it is @mmerterden/multi-agent-pipeline, with the scope).

`npm error code E404` on `registry.npmjs.org`

The package wasn't reachable. Quick checks:

curl -fsSL "https://registry.npmjs.org/@mmerterden/multi-agent-pipeline" | jq -r '."dist-tags".latest'
# Should print 8.6.1 (or newer)

If this prints a version → your local npm cache is stale; run npm cache clean --force and retry.
If it 404s → registry outage or scope/package-name typo. Verify the URL hits the JSON above.

Older PAT / GitHub Packages docs

Earlier README revisions referenced GitHub Packages with a Classic PAT and ~/.npmrc setup. The package moved to public npmjs.org in v8.6.1; no token, no ~/.npmrc line is needed. If you set one up previously, you can leave it — it just won't be consulted.

Tool support

TL;DR — Full pipeline orchestration runs on Claude Code + Copilot CLI. Knowledge layer (rules + skills) ports to Cursor and GitHub Copilot Chat via dedicated adapters. Other tools require manual port. (Pre-v8.5.4 the adapter set also covered Windsurf, Cline, Zed, and Continue.dev — those were dropped in v8.5.4 because the pipeline owner did not use them; reintroducing any of them is a one-file add against pipeline/adapters/_base.mjs.)

Tier 1 — Full pipeline (orchestration + knowledge)

| Tool | Install Flag | How It Works | | --- | --- | --- | | Claude Code | --claude (default) | Slash commands + shared + figma skills + agents + rules + scripts | | Copilot CLI | --copilot | Instructions + shared + figma skills + scripts |

Both CLIs install from the same pipeline/skills/ source. Tree organization:

pipeline/skills/shared/core/ — orchestration skills (multi-agent-* dash-form mirrors of colon-form slash commands + compliance skills + orchestrator)
pipeline/skills/shared/external/ — 127 third-party / curated iOS, Android, and generic guidance skills (SwiftUI, Jetpack Compose, testing, performance, security, etc.)
pipeline/skills/figma-ios/ + pipeline/skills/figma-android/ — 5 + 5 platform-specific Phase 3 sub-skills (SwiftUI and Jetpack Compose code generation)
pipeline/skills/figma-common/ — 27 platform-agnostic Figma helpers (iterate, commit, wiki setup, MCP auth, performance harness)

You get identical skill coverage regardless of which CLI you use. Both CLIs also receive the same pipeline/scripts/ tree so multi-CLI-only installs stay self-contained.

Tier 2 — Knowledge layer only

These tools don't have subagent dispatch, so the 8-phase orchestration can't run there. The rules tree + skills catalog ports natively via per-tool adapters.

| Tool | Install Flag | Output | What works | What doesn't | | --- | --- | --- | --- | --- | | Cursor | --cursor | .cursor/rules/multi-agent-*.mdc (modern, 2025+) + .cursorrules legacy fallback | 193 SKILL contents, 12 rules, glob-aware activation | Slash commands, parallel review, autopilot, Phase 4 triage | | GitHub Copilot Chat | --copilot-chat | .github/copilot-instructions.md (marker-wrapped) + .github/instructions/multi-agent-*.instructions.md | Per-skill instructions with applyTo: glob frontmatter; loaded automatically into every Copilot Chat conversation in the repo | Slash commands, parallel review, autopilot, Phase 4 triage |

All adapters default to process.cwd(); override with --target=<path>. Multiple tools at once: --all-tools (covers both Tier 2 surfaces). Filter by stack: --platform=ios|android|all.

Tier 3 — Manual port

Windsurf, Cline, Zed AI, Continue.dev, JetBrains AI, Codeium, Tabnine, Amazon Q Developer — closed/proprietary config, deprecated adapter, or different paradigm. Skill content can be copy-pasted into native rule formats; no automated installer ships today. PRs welcome.

Uninstall (token-preserving, all tiers)

npx @mmerterden/multi-agent-pipeline uninstall            # interactive, all installed targets
npx @mmerterden/multi-agent-pipeline uninstall --dry-run  # preview, zero side effects
npx @mmerterden/multi-agent-pipeline uninstall --cursor   # selective: only this tool

Personal access tokens stored in macOS Keychain / Windows Credential Manager / Linux libsecret are never touched by the uninstaller. Smoke tests enforce this with a static check that fails the build if the script ever references a credential-store deletion API.

Pipeline Phases

Phase 0: Init      — Project selection, branch setup, identity, worktree
Phase 1: Analysis  — Stack detection, codebase exploration
Phase 2: Planning  — Task decomposition, architecture review, user approval
Phase 3: Dev       — TDD cycle: test → code → build
Phase 4: Review    — Deterministic gates + parallel review + Opus triage
                     • Claude Code → Opus + Sonnet (2 paralel)
                     • Copilot CLI → GPT-5.4 + Opus + Sonnet (3 paralel)
Phase 5: Test      — Optional manual testing + MCP device audits (on-demand)
Phase 6: Commit    — Pre-commit local checkout prompt, git commit, push, PR creation
Phase 7: Report   — External: Jira comment · Wiki + Figma screenshots · Confluence
                     Internal: Log · Knowledge + memory capture

Full Pipeline Flow

flowchart TD
    INPUT["🎯 <b>User Input</b><br/>Issue # · Jira URL · Free-text · jira · issue"]

    subgraph SETUP ["Setup"]
        P0["<b>Phase 0: Init</b><br/>Project detect · Worktree<br/>Branch · Identity · Task type"]
        P1["<b>Phase 1: Analysis</b><br/>Parallel Explore agents<br/>Stack detection · Guide load"]
        P2["<b>Phase 2: Planning</b><br/>Task decompose<br/>Architect review · User approval"]
    end

    subgraph DEVELOP ["Development"]
        P3["<b>Phase 3: Dev</b><br/>🔴 RED: test<br/>🟢 GREEN: implement<br/>🔵 REFACTOR · Build pass"]
    end

    subgraph REVIEW ["Review"]
        R1["<b>Opus</b><br/>Security<br/>Architecture"]
        R2["<b>GPT-5.4</b><br/>Quality<br/>Edge cases"]
        R3["<b>Sonnet</b><br/>Correctness<br/>Style"]
        TRIAGE["<b>Opus Triage</b><br/>Filter noise · Deduplicate<br/>Forward actionable only"]
    end

    subgraph DELIVER ["Delivery"]
        P5["<b>Phase 5: Test</b><br/>Manual test<br/>Device audits (on-demand)"]
        P6["<b>Phase 6: Commit</b><br/>Secret scan · Commit<br/>Push · PR create"]
        P7["<b>Phase 7: Report</b>"]
        subgraph REPORT ["Phase 7 sub-steps"]
            direction LR
            H1["Jira comment<br/>(analysis + tests)"]
            H2["Wiki + Figma<br/>screenshots"]
            H3["Confluence<br/>(optional)"]
            H4["Report · Log"]
            H5["Knowledge +<br/>memory capture"]
        end
        P7 --> H1 --> H2 --> H3 --> H4 --> H5
    end

    INPUT --> P0
    P0 --> P1
    P1 --> P2
    P2 --> P3
    P3 --> R1 & R2 & R3
    R1 & R2 & R3 --> TRIAGE
    TRIAGE -->|"✅ Approved"| P5
    TRIAGE -->|"🔧 Fix needed (≤3x)"| P3
    P5 --> P6
    P6 --> P7

    style INPUT fill:#818cf8,stroke:#6366f1,color:#fff
    style P3 fill:#fbbf24,stroke:#f59e0b,color:#000
    style TRIAGE fill:#38bdf8,stroke:#0ea5e9,color:#000
    style P7 fill:#4ade80,stroke:#22c55e,color:#000
    style H1 fill:#fde68a,stroke:#f59e0b,color:#000
    style H2 fill:#c084fc,stroke:#a855f7,color:#000
    style H3 fill:#bae6fd,stroke:#0ea5e9,color:#000

Operating Modes

flowchart LR
    subgraph NORMAL ["Normal (Full 8-phase)"]
        direction LR
        N0[Init] --> N1[Analysis] --> N2[Planning] --> N3[Dev] --> N4[Review] --> N5[Test] --> N6[Commit] --> N7[Report]
    end

    subgraph DEV ["--dev (Fast, Opus)"]
        direction LR
        D0[Init] --> D3["Dev<br/>(Opus)"] --> D6[Commit] --> D7[Report]
    end

    subgraph AUTO ["autopilot (No confirmations)"]
        direction LR
        A0[Init] --> A1[Analysis] --> A2[Planning] --> A3[Dev] --> A4[Review] --> A6[Commit] --> A7[Report]
    end

    subgraph FAST ["--dev autopilot (Fastest)"]
        direction LR
        F0[Init] --> F3["Dev<br/>(Opus)"] --> F6["Commit<br/>(auto)"] --> F7[Report]
    end

    style D3 fill:#fbbf24,stroke:#f59e0b,color:#000
    style F3 fill:#fbbf24,stroke:#f59e0b,color:#000
    style F6 fill:#4ade80,stroke:#22c55e,color:#000

Review Architecture (Phase 4)

flowchart TD
    DIFF["📝 Code Diff"]

    DIFF --> OPUS["<b>Opus</b><br/>🔒 Security · Architecture<br/>Data flow · Auth"]
    DIFF --> GPT["<b>GPT-5.4</b><br/>✨ Code quality · Edge cases<br/>Error paths · Logic"]
    DIFF --> SON["<b>Sonnet</b><br/>✅ Correctness · Best practices<br/>Naming · Style"]

    OPUS --> TRIAGE
    GPT --> TRIAGE
    SON --> TRIAGE

    TRIAGE["<b>Opus Triage</b><br/>Deduplicate findings<br/>Filter false-positives<br/>Reject out-of-scope<br/>Classify severity"]

    TRIAGE -->|"✅ PASS"| NEXT["Phase 5: Test"]
    TRIAGE -->|"🔧 FIX_REQUIRED"| BACK["Phase 3: Dev<br/>(retry ≤3x)"]

    style DIFF fill:#818cf8,stroke:#6366f1,color:#fff
    style TRIAGE fill:#38bdf8,stroke:#0ea5e9,color:#000
    style NEXT fill:#4ade80,stroke:#22c55e,color:#000
    style BACK fill:#f87171,stroke:#ef4444,color:#000

Figma SubPhase Integration (Phase 3)

When figmaConfigPath is set in project preferences, Phase 3 dispatches the Figma-to-SwiftUI pipeline instead of standard TDD:

flowchart TD
    P3["<b>Phase 3: Dev</b>"]

    P3 -->|default| TDD["<b>Standard TDD</b><br/>RED → GREEN → REFACTOR → build"]
    P3 -->|"figmaConfigPath set"| FIG

    subgraph FIG ["Figma-to-SwiftUI Pipeline (17 SubPhases)"]
        direction TB

        subgraph INIT_G ["Init + Gather"]
            S0["3.0 Init<br/>Parse URL · Branch · Assign"]
            S1["3.1 Gather<br/>Fetch design context"]
        end

        subgraph PREP ["Preparation (parallel)"]
            S2A["3.2A TestingIDs"]
            S2B["3.2B Localization"]
            S2C["3.2C Accessibility"]
            S2D["3.2D Analytics"]
        end

        S3["3.3 Token Mapping<br/>Figma values → design tokens"]

        subgraph IMPL ["Implementation (sequential)"]
            S4A["3.4A Config"]
            S4B["3.4B View"]
            S4C["3.4C Docs"]
            S4D["3.4D Preview"]
            S4E["3.4E Modifiers"]
            S4F["3.4F Wiki"]
        end

        subgraph TEST_G ["Testing"]
            S5A["3.5A ViewInspector"]
            S5B["3.5B Snapshot"]
            S5C["3.5C Unit"]
        end

        S6["3.6 CodeConnect<br/>Figma ↔ code link"]

        S0 --> S1
        S1 --> S2A & S2B & S2C & S2D
        S2A & S2B & S2C & S2D --> S3
        S3 --> S4A --> S4B --> S4C --> S4D --> S4E --> S4F
        S4F --> S5A --> S5B --> S5C
        S5C --> S6
    end

    style P3 fill:#fbbf24,stroke:#f59e0b,color:#000
    style TDD fill:#4ade80,stroke:#22c55e,color:#000
    style S3 fill:#818cf8,stroke:#6366f1,color:#fff
    style S6 fill:#c084fc,stroke:#a855f7,color:#000

Ecosystem Architecture

flowchart TD
    CC["<b>Claude Code</b><br/>(Source of Truth)<br/><br/>~/.claude/commands/<br/>~/.claude/agents/<br/>~/.claude/scripts/"]

    CC -->|"instructions + 192 skills + scripts"| COP["<b>Copilot CLI</b><br/>~/.copilot/skills/<br/>~/.copilot/scripts/<br/>copilot-instructions.md"]
    CC -->|"genericized pipeline/"| REPO["<b>Pipeline Repo</b><br/>@mmerterden/<br/>multi-agent-pipeline"]
    CC -.->|"optional"| WEB["<b>Website</b><br/>(your docs site)"]
    CC -.->|"optional"| RC["<b>Remote Control</b><br/>(your dashboard)"]

    REPO -->|"npm publish"| NPM["<b>GitHub Packages</b><br/>(npm)"]
    WEB -->|"auto-deploy"| VERCEL["Vercel"]

    style CC fill:#818cf8,stroke:#6366f1,color:#fff
    style REPO fill:#fbbf24,stroke:#f59e0b,color:#000
    style NPM fill:#4ade80,stroke:#22c55e,color:#000
    style WEB fill:#38bdf8,stroke:#0ea5e9,color:#000

Claude Code (Full Mode)

All 8 phases with sub-agents, parallel review + Opus triage (2-model Opus+Sonnet on Claude Code, 3-model GPT+Opus+Sonnet on Copilot CLI), TaskCreate visual tracking.

# Pipeline tasks
/multi-agent "MOBILE-12345"                              # Jira issue
/multi-agent "#42"                                        # GitHub issue
/multi-agent "Fix dark mode colors in LoginView"          # Free-text
/multi-agent:dev "MOBILE-12345"                           # Fast mode (Opus)
/multi-agent:autopilot "MOBILE-12345"                     # Skip confirmations
/multi-agent:dev-autopilot "MOBILE-12345"                 # Zero interaction

# Helper commands
/multi-agent:status                                       # List all tasks
/multi-agent:log 1                                        # Show task log
/multi-agent:resume 1                                     # Resume stopped task
/multi-agent:kill 1                                       # Delete task worktree
/multi-agent:review                                       # Review current diff
/multi-agent:setup                                        # Token + identity onboarding (asks promptLanguage + outputLanguage)
/multi-agent:language tr                                  # Toggle pipeline languages (en / tr per axis)
/multi-agent:test                                         # UI Bug Hunter
/multi-agent:channels "PR-url"                            # Post report (Jira / Confluence / Wiki / PR)
/multi-agent:search "query"                               # Full-text log search
/multi-agent:scan                                         # Skill security scan
/multi-agent:refactor                                     # Refactor planner
/multi-agent:update                                       # Update pipeline
/multi-agent:sync                                         # Sync ecosystem
/multi-agent:purge                                        # Full reset (double-confirm)

# Flag syntax (equivalent to dedicated commands above)
/multi-agent "MOBILE-12345" --dev                         # same as :dev
/multi-agent "MOBILE-12345" --dev autopilot               # same as :dev-autopilot

Copilot CLI (Lite Mode)

Same pipeline logic, invoked via dash syntax (multi-agent-*).

# Pipeline tasks
multi-agent "MOBILE-12345"                               # Jira issue
multi-agent "#42"                                         # GitHub issue
multi-agent "Fix dark mode colors in LoginView"           # Free-text
multi-agent-dev "MOBILE-12345"                            # Fast mode (Opus)
multi-agent-autopilot "MOBILE-12345"                      # Skip confirmations
multi-agent-dev-autopilot "MOBILE-12345"                  # Zero interaction

# Helper commands
multi-agent-status                                        # List all tasks
multi-agent-log 1                                         # Show task log
multi-agent-resume 1                                      # Resume stopped task
multi-agent-kill 1                                        # Delete task worktree
multi-agent-review                                        # Review current diff
multi-agent-setup                                         # Token + identity onboarding
multi-agent-test                                          # UI Bug Hunter
multi-agent-channels "PR-url"                             # Post report (Jira / Confluence / Wiki / PR)
multi-agent-search "query"                                # Full-text log search
multi-agent-scan                                          # Skill security scan
multi-agent-refactor                                      # Refactor planner
multi-agent-update                                        # Update pipeline
multi-agent-sync                                          # Sync ecosystem
multi-agent-purge                                         # Full reset (double-confirm)

Supported Stacks

| Platform | Detection | Guide Loaded | | ----------------------------- | -------------------------------------------- | ----------------------- | | iOS/Swift | .xcodeproj, Package.swift | SwiftUI Component Guide | | Android/Kotlin | build.gradle, build.gradle.kts | Jetpack Compose Guide | | Backend (Python/Node/Go) | requirements.txt, package.json, go.mod | Backend API Guide | | Frontend (React/Vue/Next) | package.json + framework detection | Frontend Guide |

Stack is auto-detected. Build commands, test runners, lint tools, and review focus areas all adapt automatically.

Modes

| Mode | Claude Code | Copilot CLI | Description | | ---- | ----------- | ----------- | ----------- | | Normal | /multi-agent "task" | multi-agent "task" | Full 8 phases with CLI-aware parallel review (Claude: 2-model · Copilot: 3-model) | | Fast | /multi-agent:dev "task" | multi-agent-dev "task" | Init → Dev(Opus) → Commit → Report | | Local | /multi-agent "task" --local | multi-agent "task" --local | No worktree — works on local branch | | Autopilot | /multi-agent:autopilot "task" | multi-agent-autopilot "task" | Skip confirmations, auto commit/PR | | Fastest | /multi-agent:dev-autopilot "task" | multi-agent-dev-autopilot "task" | Zero interaction | | Test | /multi-agent:test | multi-agent-test | UI Bug Hunter — visual + accessibility | | Channels | /multi-agent:channels <target> | multi-agent-channels <target> | Post report to Jira / Confluence / Wiki / PR (multi-select, humanizer pass, reviewer-preserving) | | Stack | /multi-agent:stack ios | multi-agent-stack ios | Manually swap skill set per platform | | Language | /multi-agent:language [prompt\|output] <en\|tr> | multi-agent-language [prompt\|output] <en\|tr> | Toggle promptLanguage (interactive prompts) and/or outputLanguage (assistant explanations). External payloads stay English. |

UI Bug Hunter + Audit Tools

Automated visual testing and compliance audits. Requires the mobile MCP server.

# Claude Code                              # Copilot CLI
/multi-agent:test                           # multi-agent-test
/multi-agent:test "dark mode"               # multi-agent-test "dark mode"
/multi-agent:test "accessibility"           # multi-agent-test "accessibility"
/multi-agent:test "dynamic type"            # multi-agent-test "dynamic type"
/multi-agent:test "store-ready"             # multi-agent-test "store-ready"
/multi-agent:test "biometric"               # multi-agent-test "biometric"
/multi-agent:test "performance"             # multi-agent-test "performance"

How Audit Tools Work

All audits run via direct Bash commands — no MCP server dependency. Pipeline uses xcrun simctl, adb, codesign, aapt2 etc. natively.

| Audit | What It Does | Command | When | | --------------------- | ------------------------------------------------------ | ---------------------------- | ----------------------- | | iOS Accessibility | Missing labels, small tap targets | swift ui-tree-dumper.swift | Phase 5 — user requests | | Android Accessibility | Missing contentDescription, small touch targets | adb shell uiautomator dump | Phase 5 — user requests | | iOS Biometric | Face ID / Touch ID success/failure | xcrun simctl keychain | Phase 5 — auth flow | | Android Launch Time | Cold start time (ms) | adb shell am start -W | Phase 5 — performance | | iOS Archive | App Store compliance: signing, debug tools, privacy | codesign, plutil, nm | Phase 6 — release | | Android APK | Play Store compliance: target SDK, debuggable, signing | aapt2, apksigner | Phase 6 — release |

Important: Audits are on-demand — triggered by user, not automatic. Phase 4 does code-level accessibility review (free, no device needed). Phase 5/6 do device-level audits only when requested.

No external dependencies — only standard Xcode CLI tools (iOS) and Android SDK (Android). Platform guides include compliance rules that map 1:1 to audit checks — follow the guide, pass the audit.

Stack Swap

Stack is auto-detected at session start by pipeline/scripts/stack-swap.sh (SessionStart hook) based on project markers (.xcodeproj, build.gradle, package.json, etc.). To override manually:

# Claude Code                              # Copilot CLI
/multi-agent:stack                          multi-agent-stack                # show current
/multi-agent:stack ios                      multi-agent-stack ios            # SwiftUI + Xcode
/multi-agent:stack android                  multi-agent-stack android        # Compose + Gradle
/multi-agent:stack backend                  multi-agent-stack backend        # Python/Node/Go
/multi-agent:stack frontend                 multi-agent-stack frontend       # React/Vue/Next
/multi-agent:stack mobile                   multi-agent-stack mobile         # iOS + Android
/multi-agent:stack all                      multi-agent-stack all            # load everything

Re-runs stack-swap.sh with a forced mode and loads the matching guide (swiftui-guide.md, android-guide.md, backend-guide.md, frontend-guide.md).

Setup

# 1. Install pipeline
npx @mmerterden/multi-agent-pipeline install --all

# 2. Configure
/multi-agent:setup                          # Claude Code
multi-agent-setup                           # Copilot CLI
# -> Jira project key (e.g., MOBILE, APP, ENG)
# -> Git identity (name + email)
# -> Keychain token scan + mapping

# That's it! Pipeline works standalone — no additional dependencies needed.

Hooks & Context Management

Pipeline includes automated safety hooks and session optimization, configured during installation.

Pre-Commit Secret Detection

A PreToolUse hook runs before every git commit, scanning staged files for:

Hardcoded API keys, tokens, secrets
AWS access keys (AKIA...)
Private keys (RSA/EC/DSA/OPENSSH)
.env files and credentials files
Firebase/GCP service account JSON

If secrets are found, the commit is blocked with a clear message.

Context Management

CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=65 triggers context compaction at 65% usage instead of the default ~80%. This prevents performance degradation in long 8-phase pipeline sessions.

Local CI (pre-push)

CI is local-only via the pre-push git hook (no GitHub Actions). Run the full gate manually anytime, or wire the hook to run it on every git push:

# One-off check
bash pipeline/scripts/pre-push-check.sh

# Install as a git hook — runs automatically on `git push`
ln -sf ../../pipeline/scripts/pre-push-check.sh .git/hooks/pre-push
chmod +x .git/hooks/pre-push

Runs unit tests, smoke suites, eval fixtures, schema validation, and lint — the same gate that used to run in CI. Bypass only in emergencies with git push --no-verify.

Scripts

| Script | Purpose | | ---------------------- | ------------------------------------------------ | | pre-commit-check.sh | Secret detection before commits | | stack-swap.sh | Auto-detect project stack on session start | | keychain-save.sh | Save tokens/JSON to macOS Keychain (interactive) | | keychain.py | Deterministic Python Keychain helper (get/set/delete/list/doctor); shell driver auto-delegates on macOS/Linux | | github-ssh-setup.sh | GitHub SSH key generation + config | | ui-tree-dumper.swift | iOS accessibility tree dumper for audits |

All scripts are installed to ~/.claude/scripts/ during setup.

Platform Support

| Tier | Platform | Requirements | | --------- | --------------------- | ----------------------------------------------------------- | | Primary | macOS 13+ | Keychain-backed token storage, xcodebuild, xcrun simctl | | Primary | Linux (Ubuntu 22.04+) | Token storage via env vars / gh auth; no Xcode features | | Secondary | Windows (WSL2) | Same as Linux; Xcode features unavailable |

Node.js 18+ required everywhere. Git 2.38+ (for git worktree improvements) recommended.

Testing

npm test

Runs ~1,300 assertions across four layers:

| Layer | Count | What | | ----------------- | ------------------------------ | -------------------------------------------------------- | | Node unit tests | 92 tests across 19 suites | CLI routing, install helpers, settings.json hooks, security | | Smoke suites | 73 scripts, ~1,210 assertions | End-to-end contract tests for every script and phase doc | | Eval fixtures | 11 adversarial cases + 2 golden tasks | Semantic regression for triage classification + full pipeline replay | | Schema validation | 13 schemas | Structural integrity of JSON schemas |

CI runs locally via the pre-push git hook (see Scripts above) — no GitHub Actions. Enable the hook once per clone with ln -sf ../../pipeline/scripts/pre-push-check.sh .git/hooks/pre-push.

Key Features

Ten highlights — see docs/features.md for the full catalog.

8-phase orchestration with lazy-loaded phase specs — only pay token cost for the current phase.
CLI-aware parallel review + Opus triage — Opus (security/architecture) + Sonnet (quality/correctness) run in parallel on Claude Code (2-model); Copilot CLI adds GPT-5.4 (edge cases/different perspective) for a 3-model set. A single Opus triage pass filters false-positives, rejects out-of-scope findings, and forwards only actionable items to Phase 3. Reviewer noise never auto-triggers rework.
Bitbucket / GitHub PR automation with default reviewers auto-injected, draft/ready prompt, body preservation (no literal \n, no HTML entities).
channels command posts task reports to Jira / Confluence / Wiki / PR with multi-select channel + content. Humanizer pass per-channel, reviewer-preserving Bitbucket PR PUT. Phase 7 delegates; also invocable post-hoc for fixes made outside the pipeline.
Deterministic safety gates + pre-commit local-checkout prompt + runtime triage validator: pre-commit secret scan (PreToolUse hook), xcodebuild lock queue, 3-iteration hard-kill on retry loops, "checkout locally and test before commit?" Phase 6 prompt, and a zero-dep Node validator that gates Phase 4 triage output on real exit codes — no longer just markdown guidance. Includes telemetry (metrics.jsonl + aggregator — GPT-5.4 reviewer metric emitted only on Copilot CLI) and a sync-parity script that catches Claude↔Copilot↔repo drift before it ships.
Cross-session learning: per-project knowledge base (architecture, patterns, gotchas) + user-level memory (feedback, project constraints, references).
Schema-validated state + smoke-tested contracts: schemas/*.schema.json for agent-state.json and preferences; scripts/smoke-channels-flow.sh guards body-preservation + Bitbucket PUT contract + multi-channel dispatch.
Task Type Detection — every task classified as component/bugfix/feature/refactor/chore at Phase 0 Step 9, used by every downstream phase for deterministic routing.
SubPhase progress tracking for specialized workflows — component generation (figma-to-swiftui) reports nested SubPhases under the parent main phase instead of inflating the top-level phase count.
Interactive launchers: multi-agent-jira lists your open Jira issues; multi-agent-issue lists unassigned GitHub issues. Pick one → choose branch → choose mode (full/--dev) → autopilot? → pipeline starts. GitHub issues are auto-assigned on selection.

What's Included

pipeline/
  commands/
    multi-agent.md              Main orchestrator
    sim-test.md                 Mobile UI Bug Hunter
    figma-to-swiftui.md         Figma -> SwiftUI component generator
    deploy.md                   iOS deployment checklist
    archive-guard.md            .xcarchive App Store compliance scan
    security-review.md          Deep security audit
    multi-agent/
      help.md                   Usage guide
      setup.md                  Token + identity + Jira key onboarding
      status.md                 List all tasks
      log.md                    Show task log
      resume.md                 Resume stopped task
      kill.md                   Delete task worktree
      purge.md                  Full reset (double-confirm)
      review.md                 Review current diff
      channels.md               Multi-channel reporter (Jira/Confluence/Wiki/PR)
      jira.md                   Interactive Jira picker
      issue.md                  Interactive GitHub issue picker
      dev.md                    Fast mode (Opus)
      autopilot.md              Skip confirmations
      dev-autopilot.md          Fastest path (dev + autopilot)
      test.md                   UI Bug Hunter
      search.md                 Full-text log search
      scan.md                   Skill security scanner
      refactor.md               Refactor planner
      update.md                 Update pipeline
      sync.md                   Sync ecosystem (Claude/Copilot/repo)
      refs/
        rules.md                Global non-negotiable rules
        keychain.md             Token registry
        knowledge.md            Project knowledge system
        audit-guide.md          Audit tool integration rules
        swiftui-guide.md        iOS component guide + compliance rules
        android-guide.md        Android component guide + compliance rules
        backend-guide.md        Backend API guide
        frontend-guide.md       Frontend component guide
        phases.md               Phase reference + ASCII flow diagram
        phases/
          phase-0-init.md       Project setup (8-step interactive)
          phase-1-analysis.md   Stack detection + codebase exploration
          phase-2-planning.md   Task decomposition + architecture review
          phase-3-dev.md        TDD development + build queue
          phase-4-review.md     Gates + code review + accessibility check
          phase-5-test.md       User testing + device audits (on-demand)
          phase-6-commit.md     Commit + PR (reviewers + draft prompt)
          phase-7-report.md    External (Jira/Wiki/Confluence) + internal log + knowledge + memory
          modes.md              Autopilot, --dev, --local
          operations.md         Kill, resume, purge
          log-format.md         Log file spec
  skills/
    shared/
      core/                     22 orchestration skills (multi-agent-*) —
                                pipeline-critical; changes here affect pipeline
                                behavior directly.
      external/                 127 iOS/Android/generic guidance skills imported
                                from upstream (mirrors of third-party skill sets
                                — SwiftUI, Compose, Kotlin, Swift, web, backend,
                                CI/CD, HIG, etc.).
    figma-common/               Platform-agnostic Figma pipeline shared steps
    figma-ios/                  Figma → SwiftUI component generator (iOS)
    figma-android/              Figma → Jetpack Compose component generator (Android)
  schemas/                      JSON Schemas
    agent-state.schema.json     Validates $HOME/.claude/logs/.../agent-state.json
    prefs.schema.json           Validates $HOME/.claude/multi-agent-preferences.json
    triage-output.schema.json   Validates Phase 4 triage output
    token-budget.json           Per-phase token limits + warn thresholds
  agents/
    code-reviewer.md            Phase 4 reviewer (CLI-aware: 2-model Claude / 3-model Copilot + Opus triage)
    explorer.md                 Phase 1 codebase scanner
    ios-architect.md            iOS architecture review
    android-architect.md        Android architecture review
    backend-architect.md        Backend/API architecture review
    security-auditor.md         Security audit (OWASP)
  rules/
    code-style.md               Naming, structure, patterns
    git-conventions.md          Commit messages, branching
    testing.md                  Test naming, structure, coverage
    tdd.md                      Red-Green-Refactor, testing pyramid
    code-review.md              Review priority, severity, checklist
    security.md                 Keychain, ATS, credentials, privacy
    performance.md              Bottlenecks, caching strategy
    debugging.md                Scientific debugging method
    app-store-guidelines.md     App Store Review Guidelines
    figma-pipeline.md           Figma -> SwiftUI generation rules
    swiftui-qa.md               3-layer test strategy, 13-item checklist
    kotlin-android.md           Kotlin & Android conventions
  eval/
    triage/
      01-10/                    10 adversarial eval fixtures
  scripts/
    pre-commit-check.sh         Secret detection hook
    stack-swap.sh               Auto stack detection (SessionStart)
    keychain-save.sh            Save tokens/JSON to macOS Keychain
    github-ssh-setup.sh         GitHub SSH key generation + setup
    ui-tree-dumper.swift        iOS accessibility tree dumper
    eval-triage.mjs             Eval runner for triage fixtures
    validate-triage.mjs         Runtime triage validator
    validate-schemas.mjs        Zero-dep shallow validator for *.schema.json
    aggregate-metrics.mjs       Telemetry aggregator
    log-metric.sh               Telemetry metric logger
    phase-banner.sh             Phase banner UI renderer
    phase-tracker.sh            Phase progress tracker
    sync-parity-check.sh        Claude↔Copilot↔repo parity checker
    smoke-*.sh                  Contract smoke tests (10 suites, 115 assertions)
  claude-md-template.md         CLAUDE.md starter template
  preferences-template.json     Empty config template

docs/
  features.md                   Full feature catalog
  adr/                          Architecture Decision Records (0001+)

CHANGELOG.md                    Version history
docs/architecture.md            Mermaid diagrams: pipeline flow, modes, components
docs/best-practices.md          Competitor-informed patterns and pipeline rules
SECURITY.md                     Vulnerability reporting policy

Ecosystem Sync

The pipeline maintains consistency across multiple repositories and CLI targets:

# Claude Code                              # Copilot CLI
/multi-agent:sync                           # multi-agent-sync
/multi-agent:sync status                    # multi-agent-sync status
/multi-agent:sync to-copilot                # multi-agent-sync to-copilot
/multi-agent:sync to-repo                   # multi-agent-sync to-repo
/multi-agent:sync release                   # multi-agent-sync release

| Target | What syncs | | ------------------ | ---------------------------------------------------- | | Claude Code | Source of truth — commands, agents, scripts | | Copilot CLI | Summary mirror + 192 unified skills + scripts | | Pipeline Repo | Genericized open-source version (no personal data) | | Website | Version, phase/model counts, feature strings (EN+TR) | | Remote Control | Pipeline feature references |

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@mmerterden/multi-agent-pipeline

Current at-a-glance (live from filesystem)

What's new

Prerequisites

Quick Start

Option A — Install from source (recommended for development)

Option B — npx (public registry, no auth)

Option C — Global install (public registry, no auth)

Privacy & Telemetry

Troubleshooting

sh: multi-agent-pipeline: command not found

npm error code E404 on registry.npmjs.org

Older PAT / GitHub Packages docs

Tool support

Tier 1 — Full pipeline (orchestration + knowledge)

Tier 2 — Knowledge layer only

Tier 3 — Manual port

Uninstall (token-preserving, all tiers)

Pipeline Phases

Full Pipeline Flow

Operating Modes

Review Architecture (Phase 4)

Figma SubPhase Integration (Phase 3)

Ecosystem Architecture

Claude Code (Full Mode)

Copilot CLI (Lite Mode)

Supported Stacks

Modes

UI Bug Hunter + Audit Tools

How Audit Tools Work

Stack Swap

Setup

Hooks & Context Management

Pre-Commit Secret Detection

Context Management

Local CI (pre-push)

Scripts

Platform Support

Testing

Key Features

What's Included

Ecosystem Sync

License

`sh: multi-agent-pipeline: command not found`

`npm error code E404` on `registry.npmjs.org`