@jterrats/open-orchestra

v1.2.0

Published

3 days ago

Local control plane for AI-assisted development orchestration, evidence gates, and agent workflows.

0High
0Medium
0Low

jterrats

ai-agents agent-orchestration workflow-gates playwright developer-tools

Open Orchestra

You already have Claude Code, Codex, or Cursor. The problem is everything around them: prompts drift, context is assembled by hand, handoffs get lost, and "done" means different things every time.

Open Orchestra is the governance layer that runs on top of the agents you already use. It sequences PM → PO → Architect → Developer → QA → Release, injects the right context at each phase, enforces human gates where it matters, and leaves a verifiable audit trail. The public CLI is orchestra.

Unlike autonomous agents like Devin, Orchestra doesn't replace your tools — it governs them. Unlike agent frameworks like CrewAI, you don't write code to define agents — you run a workflow against your existing AI runtime.

Real result from a governed story — negative means time saved vs. that mode:

Actual: 1.4d  |  vs Solo: -72%  |  vs AI-unguided: -53%  |  Cost: $0.0257

-72% means the task took 72% less time than the solo baseline declared at story start. Orchestra measures this automatically from the run log — no manual input after the initial estimate.

Package naming: install @jterrats/open-orchestra; run the CLI as orchestra. The shorter @jterrats/orchestra package is not the canonical published install target.

Quick Start

Individual Mode: First Value In Minutes

npm install -g @jterrats/open-orchestra
orchestra init
orchestra task add --id DEMO-001 --title "Ship a governed README update" --owner developer --paths "README.md"
orchestra workflow run --task DEMO-001 --gates none
orchestra status

What happens automatically:

Orchestra scores and selects a workflow template for the task.
The active agent receives task-scoped context: role, paths, acceptance criteria, risks, evidence expectations, and relevant skills.
The run creates durable phase artifacts, handoffs, reviews, and evidence in .agent-workflow/.
Benchmark data can compare solo, AI-unguided, and AI-guided effort after the task is complete.

Use this path when you are one developer working quickly and still want a repeatable trail instead of one-off prompts.

Team Mode: Human Gates And Release Readiness

orchestra task add --id STORY-001 --title "Ship a governed change" --owner product_owner --paths "README.md"
orchestra estimate --task STORY-001 --sizing s --solo-days 1 --ai-unguided-days 0.5 --ai-guided-days 0.25
orchestra decision add --task STORY-001 --owner architect --title "Story sizing" --decision "s 2 points" --context "Production story" --consequences "Developer phase can start"
orchestra workflow run --task STORY-001 --gates phase
orchestra evidence add --task STORY-001 --role developer --type command --summary "Validation passed" --command "npm run precommit" --exit-code 0
orchestra review --task STORY-001 --role qa --result approve --findings "Acceptance criteria covered" --recommendation "Ready for release review"
orchestra release check --json

Use --gates phase when PO, architecture, QA, or release decisions need human review. Use --gates all for regulated or high-risk work that must pause at every phase transition.

Prefer a visual workflow? Start the local console with:

orchestra web

Stable installs use npm install -g @jterrats/open-orchestra@latest. Beta dogfooding uses npm install -g @jterrats/open-orchestra@beta, followed by:

orchestra upgrade --smoke --json

For discovery or audits before adopting project-owned instruction files, start in advisory mode:

orchestra init --advisory
orchestra advisory convert --json

When developing Open Orchestra from this repository, use npm install, npm run build, and node bin/orchestra.js ... to exercise the local source.

The short path is: initialize workflow files, create a task, run the local workflow, inspect status, and preview release readiness. The production path adds explicit estimates, architecture sizing, human gates, command evidence, QA review, and release checks. For a disposable fake-provider walkthrough, see docs/end-to-end-demo.md.

Core Product Surface

Start with the small command set that supports production delivery:

Workspace: orchestra init, orchestra health --json, orchestra status.
Work intake: orchestra task add, orchestra task list, orchestra task show.
Governed execution: orchestra workflow run, orchestra decision add, orchestra evidence add, orchestra review.
Tracker and release: orchestra github sync --issue <number>, orchestra release check --json, orchestra upgrade --smoke --json.
Workflow tooling: orchestra workflow phase-plan --task <id>, orchestra refresh --check --json, orchestra cursor canvas status --json.

Use orchestra -h for the human quickstart and orchestra help commands for the full command catalog. Use orchestra commands manifest --json for the complete machine-readable reference. See docs/core-command-surface.md and docs/command-contracts.md for the stable/advanced split and automation contract.

For Claude, Codex, Cursor, VS Code, Windsurf, and generic LLM usage, see docs/runtime-llm-flow.md and docs/runtime-adapters.md. For the system map, see docs/architecture.md. To start by job-to-be-done, use the persona workflows guide for PO refinement, developer execution, QA validation, tech lead oversight, and release management. For package naming rules, see docs/package-naming.md. For tracker fallback behavior when gh is unavailable, see docs/tracker-adapter-contract.md. For the local, SaaS, and orchestrator threat model, see docs/security-saas-orchestrator.md. For the 1.0.0 documentation path across install, workflow, providers, trackers, web console, release operations, troubleshooting, and migration, see docs/adoption-guide.md. For the public site content workflow, generated site manifest, Mermaid alignment, and Technical Writer review expectations, see docs/site-content-workflow.md.

1.0.0 Workflow Tooling

The 1.0.0 workflow surface keeps the CLI as the source of truth while making generated files and runtime views easier to audit:

orchestra workflow phase-plan --task <id> --json recommends optional UX, docs, release, and risk-owner phases from task scope and project signals.
Phase playbooks live in .agent-workflow/playbooks/ and are loaded only for the active phase in provider prompts, workflow render, runtime briefs, and runtime delegation packets.
orchestra refresh --check --json detects stale generated instruction files; orchestra refresh --dry-run previews changes; orchestra refresh --force updates managed blocks without overwriting user-authored content.
Provider progress is visible in workflow run output and the local web console through /api/workflow/progress.
orchestra cursor canvas status|sync|clean --json is a Cursor-specific local bridge. It is optional and separate from portable runtime bootstrap files.

Supported Platforms

Open Orchestra supports current Node.js LTS/runtime releases on macOS, Linux, and Windows. Repository scripts avoid Unix-only shell requirements for the core quality and workflow gates; npm run validate:workflow runs through a Node helper so it works in Windows shells as well as POSIX shells. The CI dogfood path validates installed-package init and workflow validation on ubuntu-latest, macos-latest, and windows-latest.

Known limitations: publish/release automation may still use GitHub-hosted Linux shell steps, but local development, validation, and installed CLI dogfooding are expected to be cross-platform.

Release readiness should be judged against the latest pushed commit. A production release is ready only after the local precommit gate and CI installed-package dogfood pass on Ubuntu, macOS, and Windows for the current HEAD.

For the 1.0.0 milestone, orchestra release check --json also reports a gaReadiness go/no-go section. GA blockers include uncovered acceptance criteria, blocking reviews, active locks, missing smoke or rollback evidence, missing documentation or observability evidence, missing security and package provenance evidence, missing public CLI contract evidence, missing migration readiness evidence, missing release test matrix evidence, and accepted risks without a follow-up or expiry.

The release test matrix is available with:

npm run release:matrix -- --json

See docs/release-test-matrix.md.

Autonomous Workflow

The workflow run command executes a full story lifecycle as a governed multi-phase sequence without manual step-by-step commands. Each phase creates a sub-task, generates handoff artifacts, and persists state in an append-only run log.

Use orchestra workflow run when you want Orchestra to drive the complete delivery lifecycle: PM, PO, Architect, Developer, QA, and Release, including phase gates and resumable run state. Use orchestra run only when you want to execute the already computed local task plan; it does not run the full multi-phase lifecycle and should not be treated as a replacement for workflow gates or release evidence.

PM → PO [gate] → Architect [sizing gate] → Developer → QA [gate] → Release

# Dry run — inspect the phase graph and gate annotations without persisting state
orchestra workflow run --task FEAT-001 --dry-run --gates phase

# Full autonomous run, no human gates
orchestra workflow run --task FEAT-001 --gates none

# Run with human approval gates at po→architect and qa→release
orchestra workflow run --task FEAT-001 --gates phase

# Resume a paused or clarification-suspended run
orchestra workflow run --task FEAT-001 --resume <run-id>

# List all runs with status and phase trace
orchestra workflow runs

Gate modes:

| Mode | Gates | |------|-------| | none | Fully autonomous — no human approval required | | phase | Pauses at po→architect and qa→release | | all | Pauses at every phase transition |

Architect sizing gate is always enforced regardless of --gates mode. The architect must record a sizing decision (xs/s/m/l/xl) before the developer phase starts. If missing, the run fails with the exact command to resolve it.

Clarification Loop

Developers or QA engineers can surface blocking questions to the PO or architect mid-phase without stopping the workflow or making unvalidated assumptions.

# Developer asks PO a question (suspends the current phase)
orchestra workflow clarify --run <run-id> --from developer --to po \
  --question "Should empty input return null or throw?"

# PO answers (resumes the phase)
orchestra workflow clarify-respond --run <run-id> --clarification <id> \
  --answer "Return null — downstream code handles it."

# Resume execution after the answer is recorded
orchestra workflow run --task FEAT-001 --resume <run-id>

# Inspect all clarifications for a run
orchestra workflow clarify-list --run <run-id>

Clarifications are persisted in .agent-workflow/clarifications.jsonl and visible in task context.

Benchmark & Sprint Burndown

Open Orchestra measures effectiveness across three development modes and generates a sprint burndown from story point estimates.

Estimate (declare baselines once, at story start)

orchestra estimate \
  --task FEAT-001 \
  --sizing m \
  --solo-days 5 \
  --ai-unguided-days 3 \
  --ai-guided-days 2 \
  --confidence high

Benchmark (auto-computed after run completes)

# Per-story report: cycle time, savings %, quality signals
orchestra benchmark --task FEAT-001

# Sprint summary table
orchestra benchmark --summary

Example output:

Benchmark: FEAT-001  [complete]
  Sizing:      m
  Solo:        5d  (declared)
  AI-unguided: 3d  (declared)
  AI-guided:   2d  (declared)
  Actual:      1.4d
  vs Solo:     -72%
  vs AI-U:     -53%
  vs AI-G:     -30%
  QA loops:    1
  Reviews:     3 (0 blocking)
  Evidence:    5 artifacts
  Lessons:     2
  Tokens:      17500in / 5000out
  Cost:        $0.0257

Quality signals (reviews, evidence, lessons, gate blocks, token usage, cost) are read automatically from the event log — no manual input after the initial estimate.

Sprint Burndown

Developer story points take priority over architect sizing; falls back to architect if developer hasn't estimated yet.

# ASCII chart + task breakdown
orchestra burndown --sprint FEAT-001,FEAT-002,FEAT-003

# JSON series for dashboards
orchestra burndown --sprint FEAT-001,FEAT-002,FEAT-003 --json

Developer records their own estimate with:

orchestra decision add \
  --task FEAT-001 \
  --owner developer \
  --title "Dev estimate" \
  --decision "M / 8 points" \
  --context "..." --consequences "..." --status accepted

See docs/benchmark.md for the full reference.

Role Catalog

Open Orchestra treats roles as capabilities and governance responsibilities, not only human job titles. Projects can keep roles inactive until risk, scope, impact area, or a workflow gate requires them.

Core delivery roles: Product Manager, Product Owner, Business Analyst, Architect, Developer, QA, Security, DevOps, SRE, DBA, Data Engineer, UX/UI Designer, Accessibility Reviewer, Release Manager, Compliance/Privacy, Technical Writer, Tech Lead, SDET, Platform Engineer, Frontend Specialist, Backend Specialist, Mobile Specialist, AI Evaluation Engineer, and Support/Customer Operations.

Orchestration roles for modern multi-agent systems:

Parent Agent / Orchestrator — sequencing, handoffs, locks, escalation, integration.
Planner — work breakdown, dependency mapping, role activation rationale.
Reviewer / Critic — independent review before gates or handoffs.
Toolsmith / Integration Engineer — tools, MCPs, providers, adapters, automation contracts.
Context Curator / Memory Manager — decisions, assumptions, stale context, shared memory hygiene.
Policy / Governance Agent — approvals, budgets, workflow rules, compliance gates.
Observability / Incident Response — telemetry, alerts, runbooks, incident readiness.
Data / Privacy Officer — PII, retention, encryption, access, data compliance.
Domain Expert — project-specific business or industry judgment.
Performance Engineer — load, latency, scalability, caching, concurrency, graceful degradation.
Game Designer — gameplay loops, tutorialization, player feedback, balance risk.

Each default role declares activation criteria, expected evidence, and gate participation so a parent agent can select only the roles needed for a task. See docs/dev-team-specialist-role-profiles.md for specialist profiles.

Workflow Files

.agent-workflow/
  config.json
  roles.json
  tasks.json
  locks.json
  events.jsonl
  workflow-runs.jsonl       ← autonomous run state (append-only)
  clarifications.jsonl      ← clarification loop records (append-only)
  estimates.jsonl           ← declared effort baselines (append-only)
  approvals/
  decisions/
  handoffs/
  evidence/
  reviews/
  runs/

Skills and Context Loading

Primary instruction files should stay short. Detailed procedures live in task-scoped skills loaded only when needed. See docs/skill-loading-strategy.md for the manifest, loading flow, and built-in skill candidates.

Prompt Registry

Open Orchestra scaffolds a stack-agnostic .generated-prompts/ registry during orchestra init. Agents use it to preserve prompt intent and generation conventions without bloating main instruction files. Split by artifact type: code, UI, services, tests, CI/CD, docs, diagrams, and evals.

VS Code Control Center

The VS Code extension scaffold lives in extensions/vscode-open-orchestra. It consumes stable CLI JSON contracts for status, validation, graph plan, roles, approvals, evidence, and config inspection. The CLI remains the source of truth.

npm run build
code extensions/vscode-open-orchestra

Compatibility

Existing .agent-workflow/ data remains valid.
Existing AGENTS.md, CLAUDE.md, Cursor rules, and generated instruction files remain supported.
ORCHESTRA.md is the intended future primary guide name and can coexist with current agent instruction files.

See docs/orchestra-mvp.md for the full command reference.