@mmerterden/multi-agent-pipeline
v8.6.2
Published
8-phase AI development pipeline. Full orchestration on Claude Code + Copilot CLI; knowledge layer (rules + skills) ports to Cursor, Windsurf, and Cline. Analysis, planning, TDD, CLI-aware parallel review with Opus triage, wiki generation, commit automatio
Maintainers
Readme
@mmerterden/multi-agent-pipeline
8-phase AI development pipeline for Claude Code and Copilot CLI. Multi-repo orchestration, platform identity routing, push-must-succeed policy, CLI-aware parallel review with Opus triage (Claude Code: 2-model Opus+Sonnet · Copilot CLI: 3-model GPT-5.4+Opus+Sonnet), commit/PR creation — and, as of 5.0.0, a generic Figma-to-Component pipeline that works on iOS (SwiftUI) and Android (Jetpack Compose) with the same workflow.
Current at-a-glance (live from filesystem)
| Surface | Count |
|---|---|
| Slash commands (colon-form /multi-agent:*) | 29 |
| Copilot skills (dash-form multi-agent-*) | 29 |
| Third-party tool adapters (Tier 2 + Tier 3) | 6 |
| Store-compliance skills (apple-archive-compliance, google-play-compliance) | 2 |
| Figma skills (iOS + Android + Common) | 37 |
| External skill catalog (shared/external/) | 127 |
| Total SKILL.md files across all groups | 196 |
| Smoke suites | 88 |
| Golden-task fixtures (pipeline/eval/golden-tasks/) | 2 |
| Eval-triage fixtures | 11 |
| JSON schemas | 13 |
| Agent personas | 8 |
| Pipeline phases | 8 |
What's new
Latest release: v8.5.6 (review-actions contract: inline comments + approve/needs-work + Bitbucket Server, 2026-05-08) — /multi-agent:review'in PR-mode kontratı yeniden yazıldı. v8.5.5'in tek-büyük-advisory-yorumu pattern'ı KALDIRILDI; yerine per-finding inline comment + explicit approve/needs-work state kontratı geldi. Karar kuralı: 0 accepted blocking AND 0 accepted important → APPROVE (yorum yok). ≥1 → NEEDS_WORK + her accepted blocking/important için inline yorum (anchored to file:line). Suggestion'lar PR'a hiç gitmez (chat-only). Bitbucket Server URL'leri artık first-class — full URL parser + REST API ile diff/comment/approve. PR yorum body'leri her zaman outputLanguage'e renderlanır (v8.5.5'te EN-only bug'ı vardı). Yeni: lib/post-pr-review.sh (provider-aware orchestrator), refs/channels/pr-review-actions.md (yeni kanonik kontrat), smoke-pr-review-actions.sh (decision rule + lang + provider switch lint). Silinen: pr-review-comment.md, render-pr-review-body.sh, smoke-pr-review-comment-template.sh — fork'lar post-pr-review.sh "$TASK_ID" çağrısına geçmeli. Previous: v8.5.5 (review accepts PR input + posts verdict back, 2026-05-08) — /multi-agent:review now takes 5 input shapes (empty / branch / #N / repo#N / full PR URL) and in PR-mode fetches the diff via gh pr diff and prompts to post the parallel reviewer verdict as one canonical comment per run. Renderer reads agent-state.review.* and emits the body for gh pr comment --body-file; new refs/channels/pr-review-comment.md template enforces verdict block + per-severity findings + triage notes + build/test + footer marker, with hard prohibitions on Closes/Fixes/Resolves keywords, em-dashes in prose, and model vendor names. New smoke gate smoke-pr-review-comment-template.sh lints anchors + wiring + input parser. Branch-mode invocations stay chat-only (no auto-post). Previous: v8.5.4 (drop 4 unused adapters, 2026-05-08) — Windsurf, Cline, Zed AI, and Continue.dev knowledge-layer adapters removed because the pipeline owner did not use them; their smoke surface added cost without benefit. Tier 1 (Claude Code + Copilot CLI orchestration) and Tier 2 (Cursor + GitHub Copilot Chat knowledge layer) are unchanged. The adapter base framework (pipeline/adapters/_base.mjs) is preserved — re-adding any dropped tool is a one-file PR. Previous: v8.5.3 (Defect 1 picker language axis fully closed, 2026-05-08) — phase docs that drive runtime picker rendering had legacy "ask in English (promptLanguage)" wording that overrode the v8.5.1 rules.md matrix; user with outputLanguage="tr" saw the Phase 6 WIP-checkout picker render in English. Fix narrows phase-6-commit.md + _dev-context.md + _repo-picker.md to the per-field matrix (question + description follow outputLanguage; label + header stay English) and adds smoke-language-axis.sh to prevent regression. Previous: v8.5.2 (v9 stability defects 5 / 8 / 10 closed, 2026-05-08) — Phase 1.5 Existing-Component Discovery gate prevents silent overwrite of components that already have a Code Connect mapping or an existing source file (5 statuses: GREENFIELD_OK / ALREADY_MAPPED / EXISTING_SOURCE_NO_CC / AMBIGUOUS_SOURCE / GAP_ANALYSIS); Phase 2 + Phase 6 ENTRY GATEs now consult discovery.status. Token telemetry block enforced in phases 1 / 2 / 3 / 4 — every LLM call must invoke phase-tracker.sh tokens <N> <in> <out> so live cost shows on the active phase tile. New smoke gates: smoke-tracker-tokens-invocation, smoke-worktree-path-convention (forbidden $HOME/.worktrees patterns), smoke-existing-discovery-gate, smoke-issue-comment-template, smoke-no-token-prompt. Previous: v8.5.1 (stability fixes for 9 GH1241 defects + genericity review, 2026-05-08) — language matrix narrowed (AskUserQuestion.label + header stay English, question + description follow outputLanguage), canonical refs/channels/issue-comment.md template + update-issue-progress.sh script (Phase 7 Step 1.5 wires comment + flag sync), keychain.md Rule 1 forbids mid-run token prompts (routes to Setup Wizard instead), phase-tracker.sh pointer-fallback now warns on concurrent init, dev.md Phase 3 dispatch on taskType==="component" keeps the figma 17-substep orchestrator running even in --dev mode. v8.5.0 added the GitHub Projects v2 board adapter as the 5th channel + figma source incremental sync. v8.4.1 added the provider-aware account picker (Bitbucket / GitHub / GitLab), network reachability gate before the branch picker, Phase 6 PR creation contract with default reviewers, and the explicit promptLanguage=en (spec docs) / outputLanguage=user-selectable (chat + external bodies) split. v8.4.0 retired the standalone ArchiveGuard Swift binary in favour of the ios_app_store_audit MCP tool in @mmerterden/dev-toolkit-mcp ≥ v2.4 — same 17 rules, same JSON shape, all four consumer surfaces work without changes. Cross-platform status: macOS Claude Code path is production-ready; Linux libsecret + Windows PowerShell paths are coded but unverified — see CHANGELOG for honest field-testing notes.
Four orthogonal additions, all advisory-by-default, all Cross-CLI parity preserved:
- Per-task Cost Breakdown in
agent-log.md— every Phase 7 run now appends a## Cost Breakdownblock with per-phase tokens (in/out) + estimated USD, sourced fromphase-tracker.sh tokensaccumulators andcost-table.jsonprices. Rendered bypipeline/scripts/render-agent-log-cost.sh. Independent of the channels-sidereportContent.costSummary(PR/Jira gating). Token forwarderLOG_METRIC_FORWARD_TO_TRACKER=1keepsmetrics.jsonland the tracker in sync from one call site. - Phase 4 Step 1.75 — Diff Risk Scoring (advisory):
pipeline/scripts/diff-risk-score.mjsruns before reviewer dispatch and injects a top-N risk-ranked priority list into each reviewer's prompt. Heuristic, deterministic, sub-second, no LLM. Signals: security paths (×3), schema migrations (×4), public API surfaces (×2), no-test-change (×2.5), complexity delta (×1.5), UI-critical paths (×1.5), loc changed (×1). Default ON; flipprefs.global.diffRiskAdvisory = falseto opt out. - Phase 5 Step 0 — Test Gap Report (advisory):
pipeline/scripts/test-gap-scan.mjswalks the diff for newly added public symbols and reports those with no paired test. Stack-specific rules ship for iOS, Android, Python, and Node.js. iOS Views and Android@Composablesymbols default toimportant; other public API additions tosuggestion. Optional gating viaprefs.testGap.blockingThreshold. - Phase 4 Triage Memory (advisory): per-repo append-only JSONL corpus at
~/.claude/memory/multi-agent/<repo-slug>/triage-corpus.jsonlrecords every accepted/deferred/rejected finding. Phase 7 ingests on completion (idempotent); Phase 1 enriches the analysis with similar past tasks; Phase 4 triage attaches prior-art hits to each raw finding. Token-overlap recall, zero deps./multi-agent:search "<text>" --semanticflag routes the query to the corpus instead of agent-log grep. Default ON; flipprefs.global.priorArtEnrichment.enabled = falseto opt out.
Cross-CLI parity: every change ships byte-identical on Claude Code (colon-form /multi-agent:*) and Copilot CLI (dash-form multi-agent-*). New smokes (5): smoke-agent-log-cost, smoke-diff-risk, smoke-test-gap, smoke-triage-memory plus extensions to smoke-cost-summary. New schemas: diff-risk.schema.json, test-gap.schema.json, triage-corpus.schema.json (3 new).
Full version history lives in CHANGELOG.md. Every release entry is recorded there to avoid drift between the two files.
Security issue? See SECURITY.md. Please do not open public issues for vulnerabilities.
Prerequisites
- Node.js 18+
- Claude Code (https://claude.ai/code) or GitHub Copilot CLI
- macOS, Linux, or Windows (Git Bash / WSL). Native credential storage everywhere:
- macOS → Keychain (
security) - Linux → libsecret (
secret-tool—apt install libsecret-tools/dnf install libsecret) - Windows → Credential Manager (PowerShell
CredentialManagermodule —Install-Module CredentialManager)
- macOS → Keychain (
- Runtime CLIs the pipeline shells out to:
gh(GitHub CLI) for issue / PR flowsjqfor JSON parsingpython3(stdlib only, used by the deterministic keychain helper)bash4+
The package is public on npmjs.org — no auth, no PAT, no ~/.npmrc setup needed:
npm view @mmerterden/multi-agent-pipeline version
# → prints "8.6.1" (or newer)If npm view shows a stale version, run npm cache clean --force and retry.
New here? Worked end-to-end transcripts live in
examples/— bug fix from Jira, feature in autopilot,--devfast path, and recovery from a broken run. Read one before you run the pipeline on your own code.Something broke?
docs/recovery-guide.mdis the single-page index of every failure mode (triage fallback, worktree collision, state corruption, identity rewind, etc.) and its fix.Planning breaking changes?
ROADMAP.mdtracks what's coming and what's been declined.
Quick Start
Option A — Install from source (recommended for development)
Clone the repo and run the installer directly — full read access to source files.
git clone [email protected]:mmerterden/multi-agent-pipeline.git
cd multi-agent-pipeline
npm install
node install.js # Claude Code (default)
node install.js --copilot # Copilot CLI
node install.js --all # Both Claude + Copilot
# Third-party AI tool adapters (knowledge layer only)
node install.js --cursor # Cursor (.cursor/rules/*.mdc + .cursorrules)
node install.js --copilot-chat # GitHub Copilot Chat (.github/copilot-instructions.md)
node install.js --all-tools # Everything: 2 orchestration (Claude + Copilot CLI) + 2 adapter targets
node install.js --cursor --target=/path/to/repo # Adapter target override (default: cwd)
node install.js --link # Symlink mode (dev, saves ~10K tokens)
# Token-preserving uninstall (Keychain access tokens NEVER touched)
node install.js # ...later...
node pipeline/scripts/uninstall.mjs --dry-run # preview what would be removed
node pipeline/scripts/uninstall.mjs --yes # remove from every installed target
node pipeline/scripts/uninstall.mjs --cursor # selective uninstall
# Optional: expose the CLI globally as 'multi-agent-pipeline'
npm linkIMPORTANT — run setup before your first task:
/multi-agent:setupThis discovers your Keychain tokens (Jira, Bitbucket, GitHub, etc.), sets up your git identity, and maps everything into ~/.claude/multi-agent-preferences.json. Without this step, the pipeline cannot find your tokens and will ask for them repeatedly.
Update later with git pull && npm install inside the clone. Pin to a specific version by checking out the corresponding tag (git tag -l to list).
Option B — npx (public registry, no auth)
npx @mmerterden/multi-agent-pipeline install # Claude Code (default)
npx @mmerterden/multi-agent-pipeline install --copilot # Copilot CLI
npx @mmerterden/multi-agent-pipeline install --all # Both Claude + Copilot
npx @mmerterden/multi-agent-pipeline install --cursor # Cursor adapter
npx @mmerterden/multi-agent-pipeline install --copilot-chat # GitHub Copilot Chat adapter
npx @mmerterden/multi-agent-pipeline install --all-tools # Every supported tool
npx @mmerterden/multi-agent-pipeline install --link # Symlink mode
# Token-preserving uninstall
npx @mmerterden/multi-agent-pipeline uninstall # interactive, all targets
npx @mmerterden/multi-agent-pipeline uninstall --dry-run # preview onlyOption C — Global install (public registry, no auth)
npm install -g @mmerterden/multi-agent-pipeline
multi-agent-pipeline install # same flags applyPrivacy & Telemetry
Opt-in only. The installer sends nothing by default. If you want to help the
project by sharing an anonymous install ping, opt in per-install with
MULTI_AGENT_TELEMETRY=1:
MULTI_AGENT_TELEMETRY=1 node install.js
# or for npx/global
MULTI_AGENT_TELEMETRY=1 npx @mmerterden/multi-agent-pipeline installWhen opted in, the ping includes:
- Package name + version
- Install method (
source,npx,global,npm-install) - Flags passed (e.g.
--copilot,--all) - Your GitHub username (via
gh api userif authenticated) and Git email - Hostname, OS, arch, Node version
Nothing else is collected. Ping failures are silent — telemetry never blocks or slows down the install.
Troubleshooting
sh: multi-agent-pipeline: command not found
npx couldn't fetch the package. Most common causes:
- Network blocked or proxy in front of npmjs.org.
- Stale npx cache —
npx clear-npx-cachethen retry. - Wrong package name (it is
@mmerterden/multi-agent-pipeline, with the scope).
npm error code E404 on registry.npmjs.org
The package wasn't reachable. Quick checks:
curl -fsSL "https://registry.npmjs.org/@mmerterden/multi-agent-pipeline" | jq -r '."dist-tags".latest'
# Should print 8.6.1 (or newer)- If this prints a version → your local npm cache is stale; run
npm cache clean --forceand retry. - If it 404s → registry outage or scope/package-name typo. Verify the URL hits the JSON above.
Older PAT / GitHub Packages docs
Earlier README revisions referenced GitHub Packages with a Classic PAT and ~/.npmrc setup. The package moved to public npmjs.org in v8.6.1; no token, no ~/.npmrc line is needed. If you set one up previously, you can leave it — it just won't be consulted.
Tool support
TL;DR — Full pipeline orchestration runs on Claude Code + Copilot CLI. Knowledge layer (rules + skills) ports to Cursor and GitHub Copilot Chat via dedicated adapters. Other tools require manual port. (Pre-v8.5.4 the adapter set also covered Windsurf, Cline, Zed, and Continue.dev — those were dropped in v8.5.4 because the pipeline owner did not use them; reintroducing any of them is a one-file add against
pipeline/adapters/_base.mjs.)
Tier 1 — Full pipeline (orchestration + knowledge)
| Tool | Install Flag | How It Works |
| --- | --- | --- |
| Claude Code | --claude (default) | Slash commands + shared + figma skills + agents + rules + scripts |
| Copilot CLI | --copilot | Instructions + shared + figma skills + scripts |
Both CLIs install from the same pipeline/skills/ source. Tree organization:
pipeline/skills/shared/core/— orchestration skills (multi-agent-*dash-form mirrors of colon-form slash commands + compliance skills + orchestrator)pipeline/skills/shared/external/— 127 third-party / curated iOS, Android, and generic guidance skills (SwiftUI, Jetpack Compose, testing, performance, security, etc.)pipeline/skills/figma-ios/+pipeline/skills/figma-android/— 5 + 5 platform-specific Phase 3 sub-skills (SwiftUI and Jetpack Compose code generation)pipeline/skills/figma-common/— 27 platform-agnostic Figma helpers (iterate, commit, wiki setup, MCP auth, performance harness)
You get identical skill coverage regardless of which CLI you use. Both CLIs also receive the same pipeline/scripts/ tree so multi-CLI-only installs stay self-contained.
Tier 2 — Knowledge layer only
These tools don't have subagent dispatch, so the 8-phase orchestration can't run there. The rules tree + skills catalog ports natively via per-tool adapters.
| Tool | Install Flag | Output | What works | What doesn't |
| --- | --- | --- | --- | --- |
| Cursor | --cursor | .cursor/rules/multi-agent-*.mdc (modern, 2025+) + .cursorrules legacy fallback | 193 SKILL contents, 12 rules, glob-aware activation | Slash commands, parallel review, autopilot, Phase 4 triage |
| GitHub Copilot Chat | --copilot-chat | .github/copilot-instructions.md (marker-wrapped) + .github/instructions/multi-agent-*.instructions.md | Per-skill instructions with applyTo: glob frontmatter; loaded automatically into every Copilot Chat conversation in the repo | Slash commands, parallel review, autopilot, Phase 4 triage |
All adapters default to process.cwd(); override with --target=<path>. Multiple tools at once: --all-tools (covers both Tier 2 surfaces). Filter by stack: --platform=ios|android|all.
Tier 3 — Manual port
Windsurf, Cline, Zed AI, Continue.dev, JetBrains AI, Codeium, Tabnine, Amazon Q Developer — closed/proprietary config, deprecated adapter, or different paradigm. Skill content can be copy-pasted into native rule formats; no automated installer ships today. PRs welcome.
Uninstall (token-preserving, all tiers)
npx @mmerterden/multi-agent-pipeline uninstall # interactive, all installed targets
npx @mmerterden/multi-agent-pipeline uninstall --dry-run # preview, zero side effects
npx @mmerterden/multi-agent-pipeline uninstall --cursor # selective: only this toolPersonal access tokens stored in macOS Keychain / Windows Credential Manager / Linux libsecret are never touched by the uninstaller. Smoke tests enforce this with a static check that fails the build if the script ever references a credential-store deletion API.
Pipeline Phases
Phase 0: Init — Project selection, branch setup, identity, worktree
Phase 1: Analysis — Stack detection, codebase exploration
Phase 2: Planning — Task decomposition, architecture review, user approval
Phase 3: Dev — TDD cycle: test → code → build
Phase 4: Review — Deterministic gates + parallel review + Opus triage
• Claude Code → Opus + Sonnet (2 paralel)
• Copilot CLI → GPT-5.4 + Opus + Sonnet (3 paralel)
Phase 5: Test — Optional manual testing + MCP device audits (on-demand)
Phase 6: Commit — Pre-commit local checkout prompt, git commit, push, PR creation
Phase 7: Report — External: Jira comment · Wiki + Figma screenshots · Confluence
Internal: Log · Knowledge + memory captureFull Pipeline Flow
flowchart TD
INPUT["🎯 <b>User Input</b><br/>Issue # · Jira URL · Free-text · jira · issue"]
subgraph SETUP ["Setup"]
P0["<b>Phase 0: Init</b><br/>Project detect · Worktree<br/>Branch · Identity · Task type"]
P1["<b>Phase 1: Analysis</b><br/>Parallel Explore agents<br/>Stack detection · Guide load"]
P2["<b>Phase 2: Planning</b><br/>Task decompose<br/>Architect review · User approval"]
end
subgraph DEVELOP ["Development"]
P3["<b>Phase 3: Dev</b><br/>🔴 RED: test<br/>🟢 GREEN: implement<br/>🔵 REFACTOR · Build pass"]
end
subgraph REVIEW ["Review"]
R1["<b>Opus</b><br/>Security<br/>Architecture"]
R2["<b>GPT-5.4</b><br/>Quality<br/>Edge cases"]
R3["<b>Sonnet</b><br/>Correctness<br/>Style"]
TRIAGE["<b>Opus Triage</b><br/>Filter noise · Deduplicate<br/>Forward actionable only"]
end
subgraph DELIVER ["Delivery"]
P5["<b>Phase 5: Test</b><br/>Manual test<br/>Device audits (on-demand)"]
P6["<b>Phase 6: Commit</b><br/>Secret scan · Commit<br/>Push · PR create"]
P7["<b>Phase 7: Report</b>"]
subgraph REPORT ["Phase 7 sub-steps"]
direction LR
H1["Jira comment<br/>(analysis + tests)"]
H2["Wiki + Figma<br/>screenshots"]
H3["Confluence<br/>(optional)"]
H4["Report · Log"]
H5["Knowledge +<br/>memory capture"]
end
P7 --> H1 --> H2 --> H3 --> H4 --> H5
end
INPUT --> P0
P0 --> P1
P1 --> P2
P2 --> P3
P3 --> R1 & R2 & R3
R1 & R2 & R3 --> TRIAGE
TRIAGE -->|"✅ Approved"| P5
TRIAGE -->|"🔧 Fix needed (≤3x)"| P3
P5 --> P6
P6 --> P7
style INPUT fill:#818cf8,stroke:#6366f1,color:#fff
style P3 fill:#fbbf24,stroke:#f59e0b,color:#000
style TRIAGE fill:#38bdf8,stroke:#0ea5e9,color:#000
style P7 fill:#4ade80,stroke:#22c55e,color:#000
style H1 fill:#fde68a,stroke:#f59e0b,color:#000
style H2 fill:#c084fc,stroke:#a855f7,color:#000
style H3 fill:#bae6fd,stroke:#0ea5e9,color:#000Operating Modes
flowchart LR
subgraph NORMAL ["Normal (Full 8-phase)"]
direction LR
N0[Init] --> N1[Analysis] --> N2[Planning] --> N3[Dev] --> N4[Review] --> N5[Test] --> N6[Commit] --> N7[Report]
end
subgraph DEV ["--dev (Fast, Opus)"]
direction LR
D0[Init] --> D3["Dev<br/>(Opus)"] --> D6[Commit] --> D7[Report]
end
subgraph AUTO ["autopilot (No confirmations)"]
direction LR
A0[Init] --> A1[Analysis] --> A2[Planning] --> A3[Dev] --> A4[Review] --> A6[Commit] --> A7[Report]
end
subgraph FAST ["--dev autopilot (Fastest)"]
direction LR
F0[Init] --> F3["Dev<br/>(Opus)"] --> F6["Commit<br/>(auto)"] --> F7[Report]
end
style D3 fill:#fbbf24,stroke:#f59e0b,color:#000
style F3 fill:#fbbf24,stroke:#f59e0b,color:#000
style F6 fill:#4ade80,stroke:#22c55e,color:#000Review Architecture (Phase 4)
flowchart TD
DIFF["📝 Code Diff"]
DIFF --> OPUS["<b>Opus</b><br/>🔒 Security · Architecture<br/>Data flow · Auth"]
DIFF --> GPT["<b>GPT-5.4</b><br/>✨ Code quality · Edge cases<br/>Error paths · Logic"]
DIFF --> SON["<b>Sonnet</b><br/>✅ Correctness · Best practices<br/>Naming · Style"]
OPUS --> TRIAGE
GPT --> TRIAGE
SON --> TRIAGE
TRIAGE["<b>Opus Triage</b><br/>Deduplicate findings<br/>Filter false-positives<br/>Reject out-of-scope<br/>Classify severity"]
TRIAGE -->|"✅ PASS"| NEXT["Phase 5: Test"]
TRIAGE -->|"🔧 FIX_REQUIRED"| BACK["Phase 3: Dev<br/>(retry ≤3x)"]
style DIFF fill:#818cf8,stroke:#6366f1,color:#fff
style TRIAGE fill:#38bdf8,stroke:#0ea5e9,color:#000
style NEXT fill:#4ade80,stroke:#22c55e,color:#000
style BACK fill:#f87171,stroke:#ef4444,color:#000Figma SubPhase Integration (Phase 3)
When figmaConfigPath is set in project preferences, Phase 3 dispatches the Figma-to-SwiftUI pipeline instead of standard TDD:
flowchart TD
P3["<b>Phase 3: Dev</b>"]
P3 -->|default| TDD["<b>Standard TDD</b><br/>RED → GREEN → REFACTOR → build"]
P3 -->|"figmaConfigPath set"| FIG
subgraph FIG ["Figma-to-SwiftUI Pipeline (17 SubPhases)"]
direction TB
subgraph INIT_G ["Init + Gather"]
S0["3.0 Init<br/>Parse URL · Branch · Assign"]
S1["3.1 Gather<br/>Fetch design context"]
end
subgraph PREP ["Preparation (parallel)"]
S2A["3.2A TestingIDs"]
S2B["3.2B Localization"]
S2C["3.2C Accessibility"]
S2D["3.2D Analytics"]
end
S3["3.3 Token Mapping<br/>Figma values → design tokens"]
subgraph IMPL ["Implementation (sequential)"]
S4A["3.4A Config"]
S4B["3.4B View"]
S4C["3.4C Docs"]
S4D["3.4D Preview"]
S4E["3.4E Modifiers"]
S4F["3.4F Wiki"]
end
subgraph TEST_G ["Testing"]
S5A["3.5A ViewInspector"]
S5B["3.5B Snapshot"]
S5C["3.5C Unit"]
end
S6["3.6 CodeConnect<br/>Figma ↔ code link"]
S0 --> S1
S1 --> S2A & S2B & S2C & S2D
S2A & S2B & S2C & S2D --> S3
S3 --> S4A --> S4B --> S4C --> S4D --> S4E --> S4F
S4F --> S5A --> S5B --> S5C
S5C --> S6
end
style P3 fill:#fbbf24,stroke:#f59e0b,color:#000
style TDD fill:#4ade80,stroke:#22c55e,color:#000
style S3 fill:#818cf8,stroke:#6366f1,color:#fff
style S6 fill:#c084fc,stroke:#a855f7,color:#000Ecosystem Architecture
flowchart TD
CC["<b>Claude Code</b><br/>(Source of Truth)<br/><br/>~/.claude/commands/<br/>~/.claude/agents/<br/>~/.claude/scripts/"]
CC -->|"instructions + 192 skills + scripts"| COP["<b>Copilot CLI</b><br/>~/.copilot/skills/<br/>~/.copilot/scripts/<br/>copilot-instructions.md"]
CC -->|"genericized pipeline/"| REPO["<b>Pipeline Repo</b><br/>@mmerterden/<br/>multi-agent-pipeline"]
CC -.->|"optional"| WEB["<b>Website</b><br/>(your docs site)"]
CC -.->|"optional"| RC["<b>Remote Control</b><br/>(your dashboard)"]
REPO -->|"npm publish"| NPM["<b>GitHub Packages</b><br/>(npm)"]
WEB -->|"auto-deploy"| VERCEL["Vercel"]
style CC fill:#818cf8,stroke:#6366f1,color:#fff
style REPO fill:#fbbf24,stroke:#f59e0b,color:#000
style NPM fill:#4ade80,stroke:#22c55e,color:#000
style WEB fill:#38bdf8,stroke:#0ea5e9,color:#000Claude Code (Full Mode)
All 8 phases with sub-agents, parallel review + Opus triage (2-model Opus+Sonnet on Claude Code, 3-model GPT+Opus+Sonnet on Copilot CLI), TaskCreate visual tracking.
# Pipeline tasks
/multi-agent "MOBILE-12345" # Jira issue
/multi-agent "#42" # GitHub issue
/multi-agent "Fix dark mode colors in LoginView" # Free-text
/multi-agent:dev "MOBILE-12345" # Fast mode (Opus)
/multi-agent:autopilot "MOBILE-12345" # Skip confirmations
/multi-agent:dev-autopilot "MOBILE-12345" # Zero interaction
# Helper commands
/multi-agent:status # List all tasks
/multi-agent:log 1 # Show task log
/multi-agent:resume 1 # Resume stopped task
/multi-agent:kill 1 # Delete task worktree
/multi-agent:review # Review current diff
/multi-agent:setup # Token + identity onboarding (asks promptLanguage + outputLanguage)
/multi-agent:language tr # Toggle pipeline languages (en / tr per axis)
/multi-agent:test # UI Bug Hunter
/multi-agent:channels "PR-url" # Post report (Jira / Confluence / Wiki / PR)
/multi-agent:search "query" # Full-text log search
/multi-agent:scan # Skill security scan
/multi-agent:refactor # Refactor planner
/multi-agent:update # Update pipeline
/multi-agent:sync # Sync ecosystem
/multi-agent:purge # Full reset (double-confirm)
# Flag syntax (equivalent to dedicated commands above)
/multi-agent "MOBILE-12345" --dev # same as :dev
/multi-agent "MOBILE-12345" --dev autopilot # same as :dev-autopilotCopilot CLI (Lite Mode)
Same pipeline logic, invoked via dash syntax (multi-agent-*).
# Pipeline tasks
multi-agent "MOBILE-12345" # Jira issue
multi-agent "#42" # GitHub issue
multi-agent "Fix dark mode colors in LoginView" # Free-text
multi-agent-dev "MOBILE-12345" # Fast mode (Opus)
multi-agent-autopilot "MOBILE-12345" # Skip confirmations
multi-agent-dev-autopilot "MOBILE-12345" # Zero interaction
# Helper commands
multi-agent-status # List all tasks
multi-agent-log 1 # Show task log
multi-agent-resume 1 # Resume stopped task
multi-agent-kill 1 # Delete task worktree
multi-agent-review # Review current diff
multi-agent-setup # Token + identity onboarding
multi-agent-test # UI Bug Hunter
multi-agent-channels "PR-url" # Post report (Jira / Confluence / Wiki / PR)
multi-agent-search "query" # Full-text log search
multi-agent-scan # Skill security scan
multi-agent-refactor # Refactor planner
multi-agent-update # Update pipeline
multi-agent-sync # Sync ecosystem
multi-agent-purge # Full reset (double-confirm)Supported Stacks
| Platform | Detection | Guide Loaded |
| ----------------------------- | -------------------------------------------- | ----------------------- |
| iOS/Swift | .xcodeproj, Package.swift | SwiftUI Component Guide |
| Android/Kotlin | build.gradle, build.gradle.kts | Jetpack Compose Guide |
| Backend (Python/Node/Go) | requirements.txt, package.json, go.mod | Backend API Guide |
| Frontend (React/Vue/Next) | package.json + framework detection | Frontend Guide |
Stack is auto-detected. Build commands, test runners, lint tools, and review focus areas all adapt automatically.
Modes
| Mode | Claude Code | Copilot CLI | Description |
| ---- | ----------- | ----------- | ----------- |
| Normal | /multi-agent "task" | multi-agent "task" | Full 8 phases with CLI-aware parallel review (Claude: 2-model · Copilot: 3-model) |
| Fast | /multi-agent:dev "task" | multi-agent-dev "task" | Init → Dev(Opus) → Commit → Report |
| Local | /multi-agent "task" --local | multi-agent "task" --local | No worktree — works on local branch |
| Autopilot | /multi-agent:autopilot "task" | multi-agent-autopilot "task" | Skip confirmations, auto commit/PR |
| Fastest | /multi-agent:dev-autopilot "task" | multi-agent-dev-autopilot "task" | Zero interaction |
| Test | /multi-agent:test | multi-agent-test | UI Bug Hunter — visual + accessibility |
| Channels | /multi-agent:channels <target> | multi-agent-channels <target> | Post report to Jira / Confluence / Wiki / PR (multi-select, humanizer pass, reviewer-preserving) |
| Stack | /multi-agent:stack ios | multi-agent-stack ios | Manually swap skill set per platform |
| Language | /multi-agent:language [prompt\|output] <en\|tr> | multi-agent-language [prompt\|output] <en\|tr> | Toggle promptLanguage (interactive prompts) and/or outputLanguage (assistant explanations). External payloads stay English. |
UI Bug Hunter + Audit Tools
Automated visual testing and compliance audits. Requires the mobile MCP server.
# Claude Code # Copilot CLI
/multi-agent:test # multi-agent-test
/multi-agent:test "dark mode" # multi-agent-test "dark mode"
/multi-agent:test "accessibility" # multi-agent-test "accessibility"
/multi-agent:test "dynamic type" # multi-agent-test "dynamic type"
/multi-agent:test "store-ready" # multi-agent-test "store-ready"
/multi-agent:test "biometric" # multi-agent-test "biometric"
/multi-agent:test "performance" # multi-agent-test "performance"How Audit Tools Work
All audits run via direct Bash commands — no MCP server dependency. Pipeline uses xcrun simctl, adb, codesign, aapt2 etc. natively.
| Audit | What It Does | Command | When |
| --------------------- | ------------------------------------------------------ | ---------------------------- | ----------------------- |
| iOS Accessibility | Missing labels, small tap targets | swift ui-tree-dumper.swift | Phase 5 — user requests |
| Android Accessibility | Missing contentDescription, small touch targets | adb shell uiautomator dump | Phase 5 — user requests |
| iOS Biometric | Face ID / Touch ID success/failure | xcrun simctl keychain | Phase 5 — auth flow |
| Android Launch Time | Cold start time (ms) | adb shell am start -W | Phase 5 — performance |
| iOS Archive | App Store compliance: signing, debug tools, privacy | codesign, plutil, nm | Phase 6 — release |
| Android APK | Play Store compliance: target SDK, debuggable, signing | aapt2, apksigner | Phase 6 — release |
Important: Audits are on-demand — triggered by user, not automatic. Phase 4 does code-level accessibility review (free, no device needed). Phase 5/6 do device-level audits only when requested.
No external dependencies — only standard Xcode CLI tools (iOS) and Android SDK (Android). Platform guides include compliance rules that map 1:1 to audit checks — follow the guide, pass the audit.
Stack Swap
Stack is auto-detected at session start by pipeline/scripts/stack-swap.sh (SessionStart hook) based on project markers (.xcodeproj, build.gradle, package.json, etc.). To override manually:
# Claude Code # Copilot CLI
/multi-agent:stack multi-agent-stack # show current
/multi-agent:stack ios multi-agent-stack ios # SwiftUI + Xcode
/multi-agent:stack android multi-agent-stack android # Compose + Gradle
/multi-agent:stack backend multi-agent-stack backend # Python/Node/Go
/multi-agent:stack frontend multi-agent-stack frontend # React/Vue/Next
/multi-agent:stack mobile multi-agent-stack mobile # iOS + Android
/multi-agent:stack all multi-agent-stack all # load everythingRe-runs stack-swap.sh with a forced mode and loads the matching guide (swiftui-guide.md, android-guide.md, backend-guide.md, frontend-guide.md).
Setup
# 1. Install pipeline
npx @mmerterden/multi-agent-pipeline install --all
# 2. Configure
/multi-agent:setup # Claude Code
multi-agent-setup # Copilot CLI
# -> Jira project key (e.g., MOBILE, APP, ENG)
# -> Git identity (name + email)
# -> Keychain token scan + mapping
# That's it! Pipeline works standalone — no additional dependencies needed.Hooks & Context Management
Pipeline includes automated safety hooks and session optimization, configured during installation.
Pre-Commit Secret Detection
A PreToolUse hook runs before every git commit, scanning staged files for:
- Hardcoded API keys, tokens, secrets
- AWS access keys (
AKIA...) - Private keys (RSA/EC/DSA/OPENSSH)
.envfiles and credentials files- Firebase/GCP service account JSON
If secrets are found, the commit is blocked with a clear message.
Context Management
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=65 triggers context compaction at 65% usage instead of the default ~80%. This prevents performance degradation in long 8-phase pipeline sessions.
Local CI (pre-push)
CI is local-only via the pre-push git hook (no GitHub Actions).
Run the full gate manually anytime, or wire the hook to run it on every git push:
# One-off check
bash pipeline/scripts/pre-push-check.sh
# Install as a git hook — runs automatically on `git push`
ln -sf ../../pipeline/scripts/pre-push-check.sh .git/hooks/pre-push
chmod +x .git/hooks/pre-pushRuns unit tests, smoke suites, eval fixtures, schema validation, and lint —
the same gate that used to run in CI. Bypass only in emergencies with
git push --no-verify.
Scripts
| Script | Purpose |
| ---------------------- | ------------------------------------------------ |
| pre-commit-check.sh | Secret detection before commits |
| stack-swap.sh | Auto-detect project stack on session start |
| keychain-save.sh | Save tokens/JSON to macOS Keychain (interactive) |
| keychain.py | Deterministic Python Keychain helper (get/set/delete/list/doctor); shell driver auto-delegates on macOS/Linux |
| github-ssh-setup.sh | GitHub SSH key generation + config |
| ui-tree-dumper.swift | iOS accessibility tree dumper for audits |
All scripts are installed to ~/.claude/scripts/ during setup.
Platform Support
| Tier | Platform | Requirements |
| --------- | --------------------- | ----------------------------------------------------------- |
| Primary | macOS 13+ | Keychain-backed token storage, xcodebuild, xcrun simctl |
| Primary | Linux (Ubuntu 22.04+) | Token storage via env vars / gh auth; no Xcode features |
| Secondary | Windows (WSL2) | Same as Linux; Xcode features unavailable |
Node.js 18+ required everywhere. Git 2.38+ (for git worktree improvements) recommended.
Testing
npm testRuns ~1,300 assertions across four layers:
| Layer | Count | What | | ----------------- | ------------------------------ | -------------------------------------------------------- | | Node unit tests | 92 tests across 19 suites | CLI routing, install helpers, settings.json hooks, security | | Smoke suites | 73 scripts, ~1,210 assertions | End-to-end contract tests for every script and phase doc | | Eval fixtures | 11 adversarial cases + 2 golden tasks | Semantic regression for triage classification + full pipeline replay | | Schema validation | 13 schemas | Structural integrity of JSON schemas |
CI runs locally via the pre-push git hook (see Scripts above) — no GitHub Actions. Enable the hook once per clone with ln -sf ../../pipeline/scripts/pre-push-check.sh .git/hooks/pre-push.
Key Features
Ten highlights — see docs/features.md for the full catalog.
- 8-phase orchestration with lazy-loaded phase specs — only pay token cost for the current phase.
- CLI-aware parallel review + Opus triage — Opus (security/architecture) + Sonnet (quality/correctness) run in parallel on Claude Code (2-model); Copilot CLI adds GPT-5.4 (edge cases/different perspective) for a 3-model set. A single Opus triage pass filters false-positives, rejects out-of-scope findings, and forwards only actionable items to Phase 3. Reviewer noise never auto-triggers rework.
- Bitbucket / GitHub PR automation with default reviewers auto-injected, draft/ready prompt, body preservation (no literal
\n, no HTML entities). channelscommand posts task reports to Jira / Confluence / Wiki / PR with multi-select channel + content. Humanizer pass per-channel, reviewer-preserving Bitbucket PR PUT. Phase 7 delegates; also invocable post-hoc for fixes made outside the pipeline.- Deterministic safety gates + pre-commit local-checkout prompt + runtime triage validator: pre-commit secret scan (PreToolUse hook), xcodebuild lock queue, 3-iteration hard-kill on retry loops, "checkout locally and test before commit?" Phase 6 prompt, and a zero-dep Node validator that gates Phase 4 triage output on real exit codes — no longer just markdown guidance. Includes telemetry (metrics.jsonl + aggregator — GPT-5.4 reviewer metric emitted only on Copilot CLI) and a sync-parity script that catches Claude↔Copilot↔repo drift before it ships.
- Cross-session learning: per-project knowledge base (architecture, patterns, gotchas) + user-level memory (feedback, project constraints, references).
- Schema-validated state + smoke-tested contracts:
schemas/*.schema.jsonforagent-state.jsonand preferences;scripts/smoke-channels-flow.shguards body-preservation + Bitbucket PUT contract + multi-channel dispatch. - Task Type Detection — every task classified as component/bugfix/feature/refactor/chore at Phase 0 Step 9, used by every downstream phase for deterministic routing.
- SubPhase progress tracking for specialized workflows — component generation (figma-to-swiftui) reports nested SubPhases under the parent main phase instead of inflating the top-level phase count.
- Interactive launchers:
multi-agent-jiralists your open Jira issues;multi-agent-issuelists unassigned GitHub issues. Pick one → choose branch → choose mode (full/--dev) → autopilot? → pipeline starts. GitHub issues are auto-assigned on selection.
What's Included
pipeline/
commands/
multi-agent.md Main orchestrator
sim-test.md Mobile UI Bug Hunter
figma-to-swiftui.md Figma -> SwiftUI component generator
deploy.md iOS deployment checklist
archive-guard.md .xcarchive App Store compliance scan
security-review.md Deep security audit
multi-agent/
help.md Usage guide
setup.md Token + identity + Jira key onboarding
status.md List all tasks
log.md Show task log
resume.md Resume stopped task
kill.md Delete task worktree
purge.md Full reset (double-confirm)
review.md Review current diff
channels.md Multi-channel reporter (Jira/Confluence/Wiki/PR)
jira.md Interactive Jira picker
issue.md Interactive GitHub issue picker
dev.md Fast mode (Opus)
autopilot.md Skip confirmations
dev-autopilot.md Fastest path (dev + autopilot)
test.md UI Bug Hunter
search.md Full-text log search
scan.md Skill security scanner
refactor.md Refactor planner
update.md Update pipeline
sync.md Sync ecosystem (Claude/Copilot/repo)
refs/
rules.md Global non-negotiable rules
keychain.md Token registry
knowledge.md Project knowledge system
audit-guide.md Audit tool integration rules
swiftui-guide.md iOS component guide + compliance rules
android-guide.md Android component guide + compliance rules
backend-guide.md Backend API guide
frontend-guide.md Frontend component guide
phases.md Phase reference + ASCII flow diagram
phases/
phase-0-init.md Project setup (8-step interactive)
phase-1-analysis.md Stack detection + codebase exploration
phase-2-planning.md Task decomposition + architecture review
phase-3-dev.md TDD development + build queue
phase-4-review.md Gates + code review + accessibility check
phase-5-test.md User testing + device audits (on-demand)
phase-6-commit.md Commit + PR (reviewers + draft prompt)
phase-7-report.md External (Jira/Wiki/Confluence) + internal log + knowledge + memory
modes.md Autopilot, --dev, --local
operations.md Kill, resume, purge
log-format.md Log file spec
skills/
shared/
core/ 22 orchestration skills (multi-agent-*) —
pipeline-critical; changes here affect pipeline
behavior directly.
external/ 127 iOS/Android/generic guidance skills imported
from upstream (mirrors of third-party skill sets
— SwiftUI, Compose, Kotlin, Swift, web, backend,
CI/CD, HIG, etc.).
figma-common/ Platform-agnostic Figma pipeline shared steps
figma-ios/ Figma → SwiftUI component generator (iOS)
figma-android/ Figma → Jetpack Compose component generator (Android)
schemas/ JSON Schemas
agent-state.schema.json Validates $HOME/.claude/logs/.../agent-state.json
prefs.schema.json Validates $HOME/.claude/multi-agent-preferences.json
triage-output.schema.json Validates Phase 4 triage output
token-budget.json Per-phase token limits + warn thresholds
agents/
code-reviewer.md Phase 4 reviewer (CLI-aware: 2-model Claude / 3-model Copilot + Opus triage)
explorer.md Phase 1 codebase scanner
ios-architect.md iOS architecture review
android-architect.md Android architecture review
backend-architect.md Backend/API architecture review
security-auditor.md Security audit (OWASP)
rules/
code-style.md Naming, structure, patterns
git-conventions.md Commit messages, branching
testing.md Test naming, structure, coverage
tdd.md Red-Green-Refactor, testing pyramid
code-review.md Review priority, severity, checklist
security.md Keychain, ATS, credentials, privacy
performance.md Bottlenecks, caching strategy
debugging.md Scientific debugging method
app-store-guidelines.md App Store Review Guidelines
figma-pipeline.md Figma -> SwiftUI generation rules
swiftui-qa.md 3-layer test strategy, 13-item checklist
kotlin-android.md Kotlin & Android conventions
eval/
triage/
01-10/ 10 adversarial eval fixtures
scripts/
pre-commit-check.sh Secret detection hook
stack-swap.sh Auto stack detection (SessionStart)
keychain-save.sh Save tokens/JSON to macOS Keychain
github-ssh-setup.sh GitHub SSH key generation + setup
ui-tree-dumper.swift iOS accessibility tree dumper
eval-triage.mjs Eval runner for triage fixtures
validate-triage.mjs Runtime triage validator
validate-schemas.mjs Zero-dep shallow validator for *.schema.json
aggregate-metrics.mjs Telemetry aggregator
log-metric.sh Telemetry metric logger
phase-banner.sh Phase banner UI renderer
phase-tracker.sh Phase progress tracker
sync-parity-check.sh Claude↔Copilot↔repo parity checker
smoke-*.sh Contract smoke tests (10 suites, 115 assertions)
claude-md-template.md CLAUDE.md starter template
preferences-template.json Empty config template
docs/
features.md Full feature catalog
adr/ Architecture Decision Records (0001+)
CHANGELOG.md Version history
docs/architecture.md Mermaid diagrams: pipeline flow, modes, components
docs/best-practices.md Competitor-informed patterns and pipeline rules
SECURITY.md Vulnerability reporting policyEcosystem Sync
The pipeline maintains consistency across multiple repositories and CLI targets:
# Claude Code # Copilot CLI
/multi-agent:sync # multi-agent-sync
/multi-agent:sync status # multi-agent-sync status
/multi-agent:sync to-copilot # multi-agent-sync to-copilot
/multi-agent:sync to-repo # multi-agent-sync to-repo
/multi-agent:sync release # multi-agent-sync release| Target | What syncs | | ------------------ | ---------------------------------------------------- | | Claude Code | Source of truth — commands, agents, scripts | | Copilot CLI | Summary mirror + 192 unified skills + scripts | | Pipeline Repo | Genericized open-source version (no personal data) | | Website | Version, phase/model counts, feature strings (EN+TR) | | Remote Control | Pipeline feature references |
License
MIT
