@adnova-group/muster

v0.2.8

Published

4 days ago

Glass-box agentic orchestrator: detects context, assembles the right crew, and shows its reasoning.

Downloads

1,477

0High
0Medium
0Low

adnova-group

orchestrator agentic ai claude-code cli multi-agent pipelines glass-box

Muster

Glass-box, multi-domain agentic orchestrator for Claude Code. Give it an outcome; it assembles the right crew and shows its reasoning before it acts.

📖 Documentation: adnova-group.github.io/muster

What it is

Muster turns an outcome into finished work. It detects your project, discovers the capabilities you already have installed, picks the best tool for each piece of the job, and runs a crew of specialists toward your success criteria. Every decision is inspectable: which role resolved to which provider, on which model, and why.

It runs on bare Claude Code with no extra services and no separate model API, and it gets better as you install more tools. The work is not limited to code. Product, business, content, and operations are first-class.

Quickstart

npx -y @adnova-group/muster install

install mutates nothing in your ~/.claude. It just prints the steps it cannot do for you, because registering a plugin is a Claude Code action:

/plugin marketplace add Adnova-Group/muster  # register the marketplace
/plugin install muster@muster                # install the plugin

Muster's glass-box output style ships inside the plugin and applies automatically when the plugin is enabled (force-for-plugin), so there is no command to run. Plugin install is a Claude Code action, so the running session picks Muster up only after you (re)install it through /plugin. Restart or /clear, then run your first outcome:

/muster:run Add rate limiting to the public API with tests

The four modes

| Mode | Command | What it does | | --- | --- | --- | | Run | /muster:run <outcome> | Plans, shows the crew manifest and plan, then stops for your approval. Does not execute. | | Autopilot | /muster:autopilot <outcome> | Hands-off lifecycle: branch, route, run waves, commit per wave, present the merge. Stops only for the merge decision or an escalation. | | Diagnose | /muster:diagnose <symptom> | Failure-first bug fix: reproduce, find root cause, fix, add a regression test, verify. No symptom-patching. | | Audit | /muster:audit [path] | Breadth-first whole-codebase review and fix across six dimensions, then fixes everything with tests and verifies. |

Run and Autopilot accept a GitHub issue reference (a bare number, #123, or an issues URL) as the outcome. A thin outcome gets refined first: muster assess does a deterministic gap-check, and if the outcome is vague, an interview skill asks one question at a time behind an approval gate before any crew is assembled.

How it works

The novel core is a capability and domain router. Muster names a fixed vocabulary of roles (the kinds of work a crew might need), and each role resolves through a ladder, best available first:

An installed external provider (a plugin, agent, or MCP server you already have)
A Muster built-in agent
A Muster built-in skill
Inline (the model does it directly)

muster capabilities walks this ladder for every role and reports the winner, the full fallback chain, installable recommendations, and the chosen model. Because the chain always ends at inline, every role resolves to something, so Muster works on bare Claude Code and improves as you add tools.

The role set is fixed but the provider set is not. When an outcome does not fit a named role, description-search bridges the gap: muster match "<task>" ranks every catalog provider by deterministic token overlap (no model call), so "audit this code for security vulnerabilities" surfaces the security specialist even though it never names a role.

Each role also carries a model picked to fit the work: mechanical roles run on Haiku, the default is Sonnet, and heavy judgment runs on Fable (degrades to Opus when unavailable on the plan). Muster composes the tools you already have and falls back to its own. For the full design, see the architecture reference (or docs/architecture.md in-repo).

Always-on guidance

Muster ships three plugin-native hooks that work together to keep the session anchored:

SessionStart: prepends muster's working principles, the four verbs, a one-line project detect, and the default-routing directive at the start of every session, so the session opens with full context and a clear posture for routing actionable prompts through muster.
UserPromptSubmit: fires before each user turn and re-injects a short drift-reinforcement nudge every N turns (default 3) and the full principles payload every N*K turns (default 9), so the session does not revert to default inline Claude behavior over time.
PreToolUse: wave-guard. While a wave is active (.muster/wave-active present), denies main-loop Edit/Write/NotebookEdit and Bash file-write patterns so the orchestrator's iron rule (dispatch via Agent tool, never edit inline) is enforced at the harness level.

All three hooks live inside the plugin, so they activate when muster is enabled and go away when muster is disabled. They do not write to your ~/.claude/CLAUDE.md or settings.json and create no global files. Each hook is fail-safe: any error falls back to an empty result and never blocks a session from starting.

Pipelines

A pipeline is a phased, gated recipe for producing one kind of artifact. Each declares a domain, an ordered list of phases, and a gate. Gating uses a floor principle: the weakest dimension must clear the floor and the total must clear a pass threshold, so a strong average cannot rescue one weak dimension.

The set spans software and knowledge work. A few examples: PRD, business-case, launch-plan, executive-summary, OKRs, AI implementation spec, competitive-battlecard, blog-post, case-study, runbook, and book (fiction and non-fiction). Roadmap prioritization is one to call out: goals go in, and a RICE-ranked now/next/later roadmap comes out, with the model estimating the factors and the CLI doing the arithmetic. Human-facing pipelines end with a humanize phase that strips em-dashes, AI-tell words, and robotic cadence.

Prompt evaluation

Muster can lint, eval, and optimize prompts, including the prompts an application generates at runtime to build agents and agentic workflows, and prompts found in a codebase Muster is working in. The deterministic core runs offline; a built-in skill (muster-prompt-smith, the prompt-quality role) supplies the model calls for empirical eval.

Lint (muster prompt lint <file|->) is a no-LLM structural check that enforces Anthropic's best practices (role, XML tags, multishot examples, explicit output format, positive framing) and the agent/guardrail rules (imperative tool framing, stop conditions, "I don't know" allowance, citations, input separation). Every finding cites the doc rule it comes from and suggests a fix; the rubric is gated by the same floor principle as pipelines. Pass --agent --tools for runtime agent prompts.
Eval (muster prompt eval <suite.json>) grades outputs against a test dataset with code graders (json/regex/python) plus an LLM-judge, and reports accuracy.
Optimize (muster prompt variations then muster prompt optimize) generates technique-driven variations, re-scores them, and keeps the winner via the tournament floor, flagging a regression when nothing beats the baseline.

# lint a runtime agent prompt piped straight from your app
your-app --print-agent-prompt | npx @adnova-group/muster prompt lint - --agent --tools

See the commands reference for the full surface.

Configuration

Muster's runtime behavior can be tuned with environment variables:

| Variable | Default | Semantics | | --- | --- | --- | | MUSTER_NUDGE_EVERY | 3 | Inject a short drift-reinforcement nudge every N turns. | | MUSTER_PRINCIPLES_EVERY | 3 | Inject the full principles + verbs every N*K turns (K = this value; at defaults: nudge every 3, full every 9). | | MUSTER_WAVE_GUARD | deny | PreToolUse hook enforcement while a wave is active: deny blocks inline edits, warn allows with a reminder, off disables the guard. | | MUSTER_MAX_TIER | (unset) | Caps the model tier policy (e.g. opus disables Fable, sonnet for budget mode); unset = no cap. Note: static agent frontmatter pins (e.g. muster-strategist) are not affected on direct invocation; in muster runs the dispatch override honors the cap. |

Built on

Muster's design was inspired by atomic-claude, superpowers, and gsd-core. It vendors a curated set of MIT-licensed skills and agents, with every source and item recorded for attribution:

| Source | License | Provides | | --- | --- | --- | | obra/superpowers | MIT | Brainstorming, planning, TDD, code-review, debugging, verification skills | | wshobson/agents | MIT | Software and knowledge-work agents across many specialties | | open-gsd/gsd-core | MIT | Plan, execute, and verify workflow phases |

Alongside the vendored material, Muster ships its own clean-room specialists, authored fresh from the role concept. Full provenance lives in NOTICE.

Contributing and license

Contributions are welcome. See CONTRIBUTING.md to get started.

Muster is licensed under Apache-2.0. See LICENSE and NOTICE.