shipwright-cli
v3.2.0
Published
Orchestrate autonomous Claude Code agent teams in tmux
Downloads
283
Maintainers
Readme
Table of Contents
- Shipwright Builds Itself
- Code Factory Pattern
- What's New in v3.2.0
- How It Works
- Install
- Quick Start
- Features
- Commands
- Pipeline Templates for Teams
- Configuration
- Prerequisites
- Architecture
- Contributing
- License
Shipwright Builds Itself
This repo uses Shipwright to process its own issues. Label a GitHub issue with shipwright and the autonomous pipeline takes over: semantic triage, plan, design, build, test, review, quality gates, PR. No human in the loop.
See it live | Create an issue and watch it build.
Code Factory Pattern
Shipwright implements the complete Code Factory control-plane pattern — where agents write 100% of the code and the repo enforces deterministic, risk-aware checks before every merge. Every decision is traceable to policy. Every merge is backed by machine-verifiable evidence.
Agent writes code → Risk policy gate → Tier-appropriate CI → Code review agent
→ Findings auto-remediated → SHA-validated evidence → Bot threads cleaned → Merge
→ Incidents feed back into harness coverageWhat makes Shipwright best-in-class
| Code Factory Layer | Shipwright Implementation |
| ---------------------- | ------------------------------------------------------------------------------------------------------- |
| Single contract | config/policy.json — risk tiers, merge policy, docs drift, evidence specs, harness SLAs in one file |
| Preflight gate | risk-policy-gate.yml classifies risk from changed files before expensive CI runs |
| SHA discipline | All checks, reviews, and approvals validated against current PR head — stale evidence is never trusted |
| Rerun writer | sw-review-rerun.sh — SHA-deduped, single canonical writer, no duplicate bot comments |
| Remediation loop | review-remediation.yml — agent reads findings, patches code, validates, pushes fix to same branch |
| Bot thread cleanup | auto-resolve-threads.yml — resolves bot-only threads after clean rerun, never touches human threads |
| Evidence framework | sw-evidence.sh — browser, API, database, CLI, webhook, and custom evidence with freshness enforcement |
| Harness-gap loop | shipwright incident gap — every regression creates a test case with SLA tracking |
Beyond the baseline
Shipwright extends the Code Factory pattern with capabilities most implementations don't have:
- 12-stage pipeline with self-healing builds, adversarial review, and compound quality gates
- Predictive risk scoring using GitHub signals (security alerts, contributor expertise, file churn)
- Persistent memory — failure patterns, fix effectiveness, and prediction accuracy compound over time
- Auto-learning — self-optimize runs automatically after every pipeline completion, including context efficiency tuning
- Decision engine — tiered autonomous decisions with outcome learning and deduplication
- Unified model routing — single source of truth for model selection across all components
- Evidence-gated merges — SHA discipline ensures all evidence validated against current PR head
- Semantic quality audits — Claude-powered audits with grep fallback when Claude unavailable
- 18 autonomous agents with specialized roles (PM, reviewer, security auditor, test generator, etc.)
- Cross-platform compatibility — portable date helpers, file_mtime, and compat layer for macOS/Linux
- Fleet operations — the Code Factory pattern applied across every repo in your org
- Cost intelligence — per-pipeline cost tracking, budget enforcement, adaptive model routing
- Self-optimization — DORA metrics analysis auto-tunes daemon config and template weights
# Evidence framework — capture and verify all types
npm run harness:evidence:capture # All collectors (browser, API, DB, CLI)
npm run harness:evidence:capture:api # API endpoints only
npm run harness:evidence:capture:cli # CLI commands only
npm run harness:evidence:capture:database # Database checks only
npm run harness:evidence:verify # Verify manifest + freshness
npm run harness:evidence:pre-pr # Capture + verify in one step
# Risk and policy
npm run harness:risk-tier
# Incident-to-harness loop
shipwright incident gap list
shipwright incident gap slaFull Code Factory documentation
What's New in v3.2.0
Code Factory pattern — deterministic, risk-aware agent delivery with machine-verifiable evidence:
- Risk policy gate — PR-level preflight classifies risk tier from changed files; blocks before expensive CI
- SHA discipline — All evidence validated against current PR head SHA; stale evidence never trusted
- Evidence framework — 6 collector types (browser, API, database, CLI, webhook, custom) with freshness enforcement
- Review remediation — Agent reads review findings, patches code, validates, pushes fix commit in-branch
- Auto-resolve bot threads — Bot-only PR threads cleaned up after clean rerun; human threads untouched
- Harness-gap loop — Every incident creates a test case requirement with SLA tracking (P0: 24h, P1: 72h)
- Policy contract v2 — Risk tiers, merge policy, docs drift rules, evidence specs, harness SLAs in one file
v2.3.1: Autonomous feedback loops, testing foundation, chaos resilience
v2.3.0: Fleet Command completeness overhaul + autonomous team oversight
v2.0.0: 18 autonomous agents, 100+ CLI commands, intelligence layer, multi-repo fleet, local mode
How It Works
graph LR
A[GitHub Issue] -->|labeled 'shipwright'| B[Daemon]
B --> C[Triage & Score]
C --> D[Select Template]
D --> E[Pipeline]
subgraph Pipeline ["12-Stage Pipeline"]
direction LR
E1[intake] --> E2[plan] --> E3[design] --> E4[build]
E4 --> E5[test] --> E6[review] --> E7[quality]
E7 --> E8[PR] --> E9[merge] --> E10[deploy]
E10 --> E11[validate] --> E12[monitor]
end
E --> E1
E12 --> F[Merged PR]
subgraph Intelligence ["Intelligence Layer"]
I1[Predictive Risk]
I2[Model Routing]
I3[Adversarial Review]
I4[Self-Optimization]
end
Intelligence -.->|enriches| Pipeline
style A fill:#00d4ff,color:#000
style F fill:#4ade80,color:#000When tests fail, the pipeline re-enters the build loop with error context — self-healing like a developer reading failures and fixing them. Convergence detection stops infinite loops. Error classification routes retries intelligently.
Install
One-command install (recommended):
git clone https://github.com/sethdford/shipwright.git && cd shipwright && ./install.shcurl
curl -fsSL https://raw.githubusercontent.com/sethdford/shipwright/main/scripts/install-remote.sh | bashnpm (global)
npm install -g shipwright-cliVerify
shipwright doctorQuick Start
# One-command setup
shipwright init
# See what's running
shipwright status
# Process a GitHub issue end-to-end
shipwright pipeline start --issue 42
# Run daemon 24/7 with agent orchestration
shipwright daemon start --detach
# See live agent activity
shipwright activity
# Spin up agent team for manual work
shipwright session my-feature -t feature-dev
# View DORA metrics and pipeline vitals
shipwright dora
# Continuous build loop with test validation
shipwright loop "Build auth module" --test-cmd "npm test"
# Multi-repo operations
shipwright fleet start
shipwright fix "upgrade deps" --repos ~/a,~/b,~/c
# Release automation
shipwright version bump 2.4.0
shipwright changelog generateFeatures
18 Autonomous Agents
Wave 1 (Organizational):
- Swarm Manager — Orchestrates dynamic agent teams with specialization roles
- Autonomous PM — Team leadership, task scheduling, roadmap execution
- Knowledge Guild — Cross-team learning, pattern capture, mentorship
- Recruitment System — Talent acquisition and team composition
- Standup Automaton — Daily standups, progress tracking, blocker detection
Wave 2 (Operational Backbone):
- Quality Oversight — Intelligent audits, zero-defect gates, completeness verification
- Strategic Agent — Long-term planning, goal decomposition, roadmap intelligence
- Code Reviewer — Architecture analysis, clean code standards, best practices
- Security Auditor — Vulnerability detection, threat modeling, compliance
- Test Generator — Coverage analysis, scenario discovery, regression prevention
- Incident Commander — Autonomous triage, root cause analysis, resolution
- Dependency Manager — Semantic versioning, update orchestration, compatibility checking
- Release Manager — Release planning, changelog generation, deployment orchestration
- Adaptive Tuner — DORA metrics analysis, self-optimization, performance tuning
- Strategic Intelligence — Predictive analysis, trend detection, proactive recommendations
Plus 10+ specialized agents for observability, UX, documentation, and more.
12-Stage Delivery Pipeline
intake → plan → design → build → test → review → compound_quality → pr → merge → deploy → validate → monitorEach stage is configurable with quality gates that auto-proceed or pause for approval. 8 pipeline templates:
| Template | Stages | Use Case |
| ------------ | --------------------------------- | ------------------------- |
| fast | intake → build → test → PR | Quick fixes, score >= 70 |
| standard | + plan, design, review | Normal feature work |
| full | All 12 stages | Production deployment |
| hotfix | Minimal, all auto | Urgent production fixes |
| autonomous | All stages, all auto | Daemon-driven delivery |
| enterprise | All stages, all gated | Maximum safety + rollback |
| cost-aware | All stages + budget checks | Budget-limited delivery |
| deployed | All + deploy + validate + monitor | Full deploy pipeline |
Intelligence Layer
7 modules that make the pipeline smarter over time. Enabled by default: intelligence is on when Claude CLI is available, with optimization and prediction active out of the box. Set intelligence.enabled=false to disable. All modules degrade gracefully.
| Module | What It Does | | ---------------------------- | --------------------------------------------------------------------------------------------------------------------- | | Semantic Triage | AI-powered issue analysis, complexity scoring, template selection | | Pipeline Composer | Generates custom pipeline configs from codebase analysis (file churn, test coverage, dependencies) | | Predictive Risk | Scores issues for risk using GitHub signals (security alerts, similar past issues, contributor expertise) | | Adversarial Review | Red-team code review — finds security flaws, edge cases, failure modes. Cross-checks against CodeQL/Dependabot alerts | | Self-Optimization | Reads DORA metrics and auto-tunes daemon config. Includes context efficiency closed loop for token budget tuning | | Developer Simulation | 3-persona review (security, performance, maintainability) before PR creation | | Architecture Enforcement | Living architectural model with violation detection and dependency direction rules |
Adaptive everything: thresholds learn from history, model routing uses SPRT evidence-based switching, poll intervals adjust to queue depth, memory timescales tune based on fix effectiveness.
GitHub Deep Integration
Native GitHub API integration enriches every intelligence module:
| API | Integration | | --------------------- | ---------------------------------------------------------------------------------------- | | GraphQL | File change frequency, blame data, contributor expertise, similar issues, commit history | | Checks API | Native check runs per pipeline stage — visible in PR timeline, blocks merges on failure | | Deployments API | Tracks deployments per environment (staging/prod), rollback support, deployment history | | Security | CodeQL + Dependabot alerts feed into risk scoring and adversarial review | | Contributors | CODEOWNERS-based reviewer routing, top-contributor fallback, auto-approve as last resort | | Branch Protection | Checks required reviews and status checks before attempting auto-merge |
Decision Engine
The autonomous decision engine (config/policy.json → decision section) handles routine operational decisions with outcome learning. Decisions are tiered by risk, with low-risk actions auto-approved and higher tiers escalated. The engine learns from outcomes to improve future decisions.
Context Engineering
Intelligent context window management for pipeline agents:
- Budget-aware trimming — Configurable character budgets for prompt composition (
context_budget_chars) - Section-level trimming — Independent limits for memory, git history, hotspot files, and test output
- Context efficiency metrics — Tracks budget utilization and trim ratios per iteration
- Self-tuning — The self-optimization loop analyzes context efficiency events and recommends budget adjustments
Autonomous Daemon
shipwright daemon start --detachWatches GitHub for labeled issues and processes them 24/7:
- Auto-scaling: Adjusts worker count based on CPU, memory, budget, and queue depth
- Priority lanes: Reserve a worker slot for urgent/hotfix issues
- Retry with escalation: Failed builds retry with template escalation (fast → standard → full)
- Patrol mode: Proactively scans for security issues, stale deps, dead code, coverage gaps
- Self-optimization: Tunes its own config based on DORA metrics over time
Fleet Operations
shipwright fleet startOrchestrate daemons across multiple repositories with a shared worker pool. Workers rebalance based on queue depth, issue complexity, and repo priority.
Persistent Memory
The pipeline learns from every run:
- Failure patterns: Captured and injected into future builds so agents don't repeat mistakes
- Fix effectiveness: Tracks which fixes actually resolved issues
- Prediction validation: Compares predicted risk against actual outcomes, auto-adjusts thresholds
- False-alarm tracking: Reduces noise by learning which anomalies are real
Cost Intelligence
shipwright cost showPer-pipeline cost tracking with model pricing, budget enforcement, and ROI analysis. Adaptive model routing picks the cheapest model that meets quality targets.
Real-Time Dashboard
shipwright dashboard startWeb dashboard with live pipeline progress, GitHub context (security alerts, contributors, deployments), DORA metrics, cost tracking, and context efficiency metrics. WebSocket-powered, updates in real-time.
Webhook Receiver
shipwright webhook listenInstant issue processing via GitHub webhooks instead of polling. Register webhook with shipwright webhook register, receive events in real-time, process issues with zero-lag.
PR Lifecycle Automation
shipwright pr review <pr#>
shipwright pr merge <pr#>
shipwright pr cleanupFully automated PR management: review based on predictive risk and coverage, intelligent auto-merge when gates pass, cleanup stale branches. Reduces manual PR overhead by 90%.
Fleet Auto-Discovery
shipwright fleet discover --org myorgScan a GitHub organization and auto-populate fleet config with all repos matching criteria (language, archived status, team ownership). One command instead of manual registry building.
SQLite Persistence
ACID-safe state management replacing JSON files. Replaces volatile .claude/pipeline-artifacts/ with reliable database schema. Atomic transactions ensure no partial states, crash recovery automatic.
Issue Decomposition
shipwright decompose analyze 42
shipwright decompose decompose 42AI-powered issue analysis: analyze scores complexity; decompose creates child issues with inherited labels/assignees and a dependency graph.
Linux systemd Support
Cross-platform process supervision. Use systemd on Linux instead of tmux, same daemon commands:
shipwright launchd install # macOS launchd
# systemd service auto-generated on LinuxContext Engine
shipwright context gatherRich context injection for pipeline stages. Pulls together: contributor history, file hotspots, architecture rules, related issues, failure patterns. Injected automatically at each stage for smarter decisions.
Commands
Over 100 commands. Key workflows:
# Autonomous delivery
shipwright pipeline start --issue 42
shipwright daemon start --detach
# Agent teams
shipwright swarm status
shipwright recruit --roles builder,tester
shipwright standup
shipwright guild list
# Quality gates
shipwright code-review
shipwright security-audit
shipwright testgen
shipwright quality validate
# Observability
shipwright vitals
shipwright dora
shipwright stream
shipwright activity
# Multi-repo operations
shipwright fleet start
shipwright fix "feat: add auth" --repos ~/a,~/b,~/c
shipwright fleet-viz
# Release automation
shipwright version bump 2.4.0
shipwright changelog generate
shipwright deploys list
# Setup & maintenance
shipwright init
shipwright prep
shipwright doctor
shipwright upgrade --apply
# See all commands
shipwright --helpSee .claude/CLAUDE.md for the complete 100+ command reference organized by workflow. Full documentation: https://sethdford.github.io/shipwright.
Pipeline Templates for Teams
24 team templates covering the full SDLC:
shipwright templates listConfiguration
| File | Purpose |
| ----------------------------- | ------------------------------------------------------------------------------------------- |
| config/policy.json | Central contract — risk tiers, merge policy, docs drift, browser evidence, harness SLAs |
| config/policy.schema.json | JSON Schema validation for the policy contract |
| .claude/daemon-config.json | Daemon settings, intelligence flags, patrol config |
| .claude/pipeline-state.md | Current pipeline state |
| templates/pipelines/*.json | 8 pipeline template definitions |
| tmux/templates/*.json | 24 team composition templates |
| ~/.shipwright/events.jsonl | Event log for metrics |
| ~/.shipwright/costs.json | Cost tracking data |
| ~/.shipwright/budget.json | Budget limits |
| ~/.shipwright/github-cache/ | Cached GitHub API responses |
Prerequisites
| Requirement | Version | Install |
| --------------- | ------- | -------------------------------------- |
| tmux | 3.2+ | brew install tmux |
| jq | any | brew install jq |
| Claude Code CLI | latest | npm i -g @anthropic-ai/claude-code |
| Node.js | 20+ | For hooks and dashboard |
| Git | any | For installation |
| gh CLI | any | brew install gh (GitHub integration) |
Architecture
100+ bash scripts (~100K lines), 125 shell test suites + 16 dashboard test files (141 total), plus E2E system test proving full daemon→pipeline→loop→PR flow. Dashboard at 98% coverage. Bash 3.2 compatible — runs on macOS and Linux out of the box.
Core Layers:
Pipeline Layer
sw-pipeline.sh # 12-stage delivery orchestration
sw-daemon.sh # Autonomous GitHub issue watcher
sw-loop.sh # Continuous multi-iteration build loop
Agent Layer (18 agents)
sw-swarm.sh # Dynamic agent team orchestration
sw-pm.sh # Autonomous PM coordination
sw-recruit.sh # Agent recruitment system
sw-standup.sh # Daily team standups
sw-guild.sh # Knowledge guilds
sw-oversight.sh # Quality oversight board
sw-strategic.sh # Strategic intelligence
sw-scale.sh # Dynamic team scaling
... 10 more agent scripts
Intelligence Layer
sw-intelligence.sh # AI analysis engine
sw-predictive.sh # Risk scoring + anomaly detection
sw-adaptive.sh # Data-driven pipeline tuning
sw-security-audit.sh # Security analysis
sw-code-review.sh # Code quality analysis
sw-testgen.sh # Test generation
sw-architecture.sh # Architecture enforcement
Operational Layer
sw-fleet.sh # Multi-repo orchestration
sw-ci.sh # CI/CD orchestration
sw-webhook.sh # GitHub webhooks
sw-incident.sh # Incident response
sw-release-manager.sh # Release automation
... 20+ operational scripts
Observability Layer
sw-vitals.sh # Pipeline health scoring
sw-dora.sh # DORA metrics dashboard
sw-activity.sh # Live activity streams
sw-replay.sh # Pipeline playback
sw-trace.sh # E2E traceability
sw-otel.sh # OpenTelemetry integration
... observability services
Infrastructure
sw-github-graphql.sh # GitHub GraphQL API client
sw-github-checks.sh # Native GitHub check runs
sw-github-deploy.sh # Deployment tracking
sw-memory.sh # Persistent learning system
sw-cost.sh # Cost intelligence
sw-db.sh # SQLite persistence
sw-eventbus.sh # Async event bus
Tools & UX
dashboard/server.ts # Real-time dashboard
sw-session.sh # tmux agent sessions
sw-status.sh # Team dashboard
sw-docs.sh # Documentation sync
sw-tmux.sh # tmux health managementContributing
Let Shipwright build it: Create an issue using the Shipwright template and label it shipwright. The autonomous pipeline will triage, plan, build, test, review, and create a PR.
Manual development: Fork, branch, then:
npm test # 125 shell suites + 16 dashboard test files (141 total), E2E system testLicense
MIT — Seth Ford, 2026.
