swarm-engine
v1.54.0
Published
Self-aware multi-agent orchestration engine with knowledge graph, causal inference, GNN failure prediction, and self-evolving rules — pure TypeScript
Maintainers
Readme
Swarm Engine
Multi-agent orchestration for AI coding tools. Coordinates Claude, Codex, and Gemini through research, implementation, and review phases. 20-35% token reduction on parallel workflows (measured per-run), cost transparency after every orchestration, and a knowledge graph that sharpens context routing with each run. 1,848 tests. 14-finding security audit. MIT licensed.
26 agents, 7 composable patterns, pure TypeScript, zero external ML dependencies. Works with Claude Code, OpenAI Codex, Google Gemini CLI, and Vercel AI SDK. Mix models across agents in the same orchestration.
What It Looks Like
⚡ Swarm Engine — hybrid pattern
Phase: research ━━━━━━━━━━━━━━━━━━━━ done
✓ researcher-code sonnet-4-6 3.2K tok 14s
✓ researcher-context sonnet-4-6 1.8K tok 9s
Phase: implement ━━━━━━━━━━━━━━━━━━━ 1m 12s
● implementer opus-4-6 8.4K tok 1m 12s src/auth/rate-limit.ts
Phase: review ────────────────────── pending
○ reviewer-security opus-4-6
○ reviewer-perf sonnet-4-6
○ reviewer-convention sonnet-4-6
Timeline: ━━──── (1/3 phases)
Recent findings:
○ researcher-code: express-rate-limit already in package.json
○ researcher-context: vault says rate limiter goes before auth middleware
12.8K tokens │ $0.24 │ 1m 36s ~2m remainingWhen it finishes, you see exactly what the engine did for you:
┌─ Engine Benefits ──────────────────────────────────────────────────────┐
│ Cost: $2.8400 (avg: $3.6200, saved $0.7800) | 28% tokens saved │
├────────────────────────────────────────────────────────────────────────┤
│ TOKEN EFFICIENCY │
│ ├─ Smart routing -18,400 tok filtered phase outputs │
│ ├─ Verbatim compaction -12,800 tok replaced with refs │
│ ├─ Context decay -6,200 tok older phases summarized │
│ ├─ Prompt diet (balanced) -4,100 tok trimmed agent prompts │
│ ├─ Tool search deferred -14,000 tok ENABLE_TOOL_SEARCH=true │
│ └─ TOTAL SAVED -55,500 tok 28% reduction │
│ │
│ CACHE │
│ ├─ Cache read tokens 142,800 reused from prior turns │
│ ├─ Cache creation 28,400 new cache entries │
│ └─ Cache hit rate 83.4% no cliff events detected │
│ │
│ KNOWLEDGE GRAPH │
│ ├─ Context routing 8 graph-optimized per agent │
│ ├─ Confidence gates 2 eval all passed │
│ └─ Pattern history 91% success 14 runs │
│ │
│ ML / GNN │
│ └─ Predictive dropout 1 agents -5,000 tok saved │
│ │
│ ADAPTIVE │
│ ├─ Model downgrades 2 cheaper where safe │
│ └─ Living spec 3 updates refined during run │
└────────────────────────────────────────────────────────────────────────┘The more you use it, the richer this gets -- the knowledge graph, GNN predictions, cache baselines, and historical cost averages build over time.
Install
npm (recommended):
npm install -g swarm-engine
swarm install # set up Claude Code integration (agents, commands, hooks)
swarm doctor # verify everything worksHomebrew:
brew tap simoncoombes/swarm
brew install swarm-engine
swarm installnpx (try without installing):
npx swarm-engine doctorFrom source:
git clone https://github.com/simoncoombes/swarm-engine.git ~/dev/swarm-engine
cd ~/dev/swarm-engine && npm install && npm run build && npm link
swarm installRequires Node.js 20+, jq, and at least one of Claude Code, Codex, or Gemini CLI.
Quick Start
In Claude Code:
/swarm "add rate limiting to the API"That's it. Agents spawn as teammates, research the codebase, implement the changes, and review the result. You see their work in split panes and get a summary when they're done.
Other slash commands for specific patterns:
/research "how does the auth system work?"
/tdd "add input validation to user endpoints"
/red-team "harden the payment flow"
/review-cycle "refactor the database layer"Standalone CLI
You can also run orchestrations directly from any terminal, outside of Claude Code:
swarm orchestrate "add rate limiting" # inline progress
swarm orchestrate "add rate limiting" --panes # tmux split panes
swarm orchestrate "add rate limiting" --tui # full-screen dashboard
swarm plan "add rate limiting" # preview plan (free)The --panes flag uses tmux to show each agent in its own split pane. Install with brew install tmux (macOS) or sudo apt install tmux (Linux).
VS Code and Cursor
Swarm Engine ships with a VS Code extension that works in both VS Code and Cursor.
Install the extension:
cd ~/dev/swarm-engine/vscode-extension
npm install && npm run build
npx @vscode/vsce package --allow-missing-repositoryThen: Cmd+Shift+P > "Extensions: Install from VSIX" > select the .vsix file.
What you get:
@swarmin Copilot Chat - type@swarm add auth middlewareand it orchestrates the task- Sidebar panel with quick actions, pattern browser, and agent list
- Command palette (
Cmd+Shift+P> "Swarm") for all commands - Status bar shortcut
Copilot Chat examples:
@swarm add rate limiting to the API
@swarm plan add auth middleware
@swarm template bug-fix
@swarm statusUse as a Library
Swarm Engine can be imported directly into Node.js applications:
npm install swarm-engineimport { SwarmEngine } from 'swarm-engine';
const engine = new SwarmEngine({ mock: true });
const result = await engine.orchestrate({
task: 'Build a REST API',
pattern: 'hybrid',
});
console.log(result.status);Key exports:
import {
SwarmEngine, // Main orchestration engine
AgentRegistry, // Load and manage agent definitions
EventBus, // Typed event system for monitoring
PatternRegistry, // Composable orchestration patterns
BackendRegistry, // Multi-backend (Claude, Codex, Gemini, Vercel AI)
CostModel, // Estimate token costs before running
ModelRouter, // UCB1-based model selection
TemplateRegistry, // Save and replay successful workflows
} from 'swarm-engine';
// Tier 1: Core Graph
import {
ExecutionGraph, // Persistent execution knowledge graph
GraphLearner, // Cross-run pattern learning
GraphContextRouter, // Relevance-scored context assembly
GraphAnalyzer, // Topology analysis and failure prediction
ReviewFeedbackRecorder, // Review findings into graph
} from 'swarm-engine';
// Tier 2: Advanced ML
import {
CausalGraphEngine, // Do-calculus causal inference
FailurePropagationPredictor, // 3-layer GNN failure prediction
AdversarialEvolver, // Thompson sampling red-team
MetaPatternSelector, // TF-IDF + logistic pattern recommendation
PredictiveDropout, // Active learning agent dropout
} from 'swarm-engine';
// Tier 3: Self-Aware Engine
import {
PatternSynthesizer, // Topology diff → novel patterns
TrajectoryPredictor, // Mid-run success prediction
MetaAdversarialTester, // Red-teams the engine's own ML
RuleEvolver, // Self-evolving replanning rules
TaskDiscovery, // Mines failure patterns for tasks
OrchestrationEmbedder, // Topology embeddings for transfer learning
} from 'swarm-engine';
// Token Compression & Benefits
import {
VerbatimCompactor, // Replace file reads, diffs, stack traces with refs
ContextDecayManager, // Hierarchical time-decay summarization
ACONOptimizer, // Failure-driven compression guidelines
PromptCompressor, // Strip markdown boilerplate from prompts
getOutputSchema, // Structured JSON schemas per agent type
BenefitsCollector, // Aggregate optimization metrics
formatBenefitsTable, // Render styled benefits summary
createOutputSummarizerHook, // PostToolUse hook for Bash output reduction
} from 'swarm-engine';See src/index.ts for the full export surface.
Why Swarm Engine
Tools like Claude Code already let you spawn parallel agents with teams. Swarm Engine adds the orchestration layer: which agents to run, in what order, with what context, on which models, and how to learn from the results.
- 20-35% token reduction — measured per-run on typical parallel orchestrations. Cross-phase context filtering, verbatim compaction, context decay, prompt diet (3 modes), and tool schema deferral (~14K saved per session). No configuration needed. Cache-aware: tracks hit rates and detects cliff events.
- Knowledge graph improves with every run — first run uses heuristics, tenth run uses data. Context routing scores prior outputs by file overlap and recency. Pattern learning tracks success rates per orchestration type. Failure prediction estimates risk from historical topology before agents run. Cost baselines compare each run to your historical average.
- Cost transparency after every run — every orchestration ends with a benefits summary: actual cost, historical comparison, tokens saved (with compounding math), cache hit rates, and what each optimization contributed.
swarm planpreviews estimated cost before execution. - Failure prediction — 3-layer GNN propagation model predicts which agents are likely to fail based on historical topology. Causal inference (do-calculus) explains why. Pure TypeScript, no external ML deps.
- 7 composable patterns — hybrid, TDD, red-team, spike, discover, review-cycle, research. Compose them:
--pattern "tdd | red-team". Plus 12 slash commands including postmortem, diff-review, and fix-pr. - Mix backends per agent — Claude for implementation, Codex for review, Gemini for research. Assign models at the agent level within one orchestration.
- 26 specialized agents — 16 core roles plus 10 focused reviewers (security, performance, data integrity, API contracts, testing, accessibility, dependencies, error handling, concurrency, documentation).
- 14-finding security audit — 3 recon agents + 3 adversarial breakers, all findings hardened. Plugin trust model, MCP command allowlists, prompt injection defense, path traversal guards, secrets redaction (16 pattern categories), file permissions hardened to 0o600/0o700.
- 1,848 tests across 104 files — reusable templates let you save and replay successful orchestrations (
swarm template run bug-fix).
Templates
Save successful orchestrations as reusable templates:
swarm template list
add-endpoint - REST API endpoint with tests
bug-fix - Reproduce, diagnose, fix, verify
security-audit - OWASP security review
refactor - Safe refactoring with verification gates
migration - Schema migration with rollback
swarm template run add-endpoint -iExplain Plan
See exactly what will happen before it runs:
swarm plan "add auth middleware" --pattern hybrid
Phase 1: research [parallel]
researcher-code sonnet ~40K tokens
researcher-context sonnet ~25K tokens
Phase 2: implement [sequential]
implementer opus ~100K tokens
Phase 3: review [parallel]
reviewer-correct opus ~60K tokens
reviewer-security opus ~60K tokens
Est. cost: $3.20 | Est. duration: 95s
Optimizations: model downgrade for research phase (sonnet vs opus),
cross-phase filtering (reviewers skip research output)Commands
Daily use
| Command | What it does |
|---------|-------------|
| /swarm <task> | Full orchestration - research, implement, review |
| /review-cycle <task> | Implement with iterative quality gate |
| /diff-review [base] | Review branch diff before PR |
| /research <question> | Parallel research across multiple angles |
| /tdd <feature> | Test-driven: write tests first, then implement |
Advanced
| Command | What it does |
|---------|-------------|
| /spike <problem> | Two approaches compete, judge picks winner |
| /red-team <task> | Adversarial build and break |
| /discover <problem> | Hypothesize, experiment, implement winner |
| /dynamic <task> | Planner decomposes into custom agent workflow |
| /postmortem <error> | Root cause analysis, fix, and prevention |
| /fix-pr <PR#> | Fix PR review comments |
| /resume | Resume from checkpoint |
Patterns
7 composable patterns. Use them individually or combine them.
| Pattern | Flow | Phases | |---------|------|--------| | hybrid | Research, Implement, Review | 3 | | research | Parallel fan-out research | 1 | | review-cycle | Implement, Challenge, Review (iterative) | 4 | | tdd | Test-first, Implement, Verify, Review | 5 | | spike | Two approaches compete, judge decides | 4 | | red-team | Build, Break, Harden | 4 | | discover | Hypothesize, Experiment, Implement winner | 5 |
Compose patterns: swarm orchestrate "task" --pattern "tdd | red-team"
Agents
16 core agents plus 10 specialized reviewers.
| Agent | Role | |-------|------| | researcher | Explores code, finds patterns, traces dependencies | | implementer | Writes clean, tested code following conventions | | reviewer | Finds bugs, security issues, convention violations | | tester | Writes and runs tests, reports coverage | | debugger | Reproduces and fixes bugs systematically | | planner | Designs architecture before code is written | | refactorer | Safe incremental refactoring with test gates | | integrator | Verifies cross-module contracts after parallel work | | devils-advocate | Challenges every assumption before review | | grounding | Verifies the implementation solves the actual problem | | orchestrator | Coordinates everything | | judge | Evaluates competing implementations and picks the winner | | sentinel | Background: watches git activity | | guardian | Background: runs affected tests | | librarian | Background: maintains knowledge quality | | documenter | Writes technical documentation |
Specialized reviewers: security, performance, data integrity, API contracts, testing, accessibility, dependencies, error handling, concurrency, documentation.
Cross-Tool Support
Convert your agents to work natively in other tools:
swarm convert --to copilot # GitHub Copilot .agent.md format
swarm convert --to cursor # Cursor .mdc rules
swarm convert --to codex # OpenAI Codex prompts
swarm convert --to gemini # Gemini CLI skills
swarm convert --to opencode # OpenCode agents
swarm convert --to windsurf # Windsurf skillsToken Efficiency
Saves 20-35% of tokens on typical parallel orchestrations (measured and shown after every run). Multi-agent context grows linearly without optimization; Swarm Engine applies these compression techniques automatically -- no configuration needed.
| Technique | What it does | Typical savings |
|-----------|-------------|-----------------|
| Cross-phase filtering | Implement phases only get research/plan outputs. Review phases only get implement/test outputs. File-scope filtering removes irrelevant files. | 10-30% per phase |
| Verbatim compaction | Replaces file reads, git diffs, test output, and stack traces with compact references. Runs on shared-context files, not just agent output. | 50-70% on tool-heavy outputs |
| Context decay | Recent phases: full fidelity. Older phases: heuristic summary. Oldest: one-line reference. | 5-20x on deep pipelines |
| Prompt compression | Strips YAML frontmatter, markdown headers, duplicate prefixes, and excessive indentation from agent prompts. | 10-25% per prompt |
| Output schemas | Structured JSON contracts per agent type force compact output instead of verbose prose. | 3-5x on agent outputs |
| Tool search deferral | ENABLE_TOOL_SEARCH=true auto-set on Claude backend. Defers tool schema loading until needed. | ~14K tokens per agent session |
| Output summarizer | PostToolUse hook injects compact digest after verbose Bash output (>50 lines). Classifies output type, extracts errors and test results. | Reduces model reasoning load on verbose commands |
ACON (Agent Context Optimization) goes further: it records trajectory pairs (full context vs compressed), analyzes failures where compression caused worse outcomes, and iteratively refines compression guidelines. Ships with 8 built-in guidelines covering error messages, file paths, decisions, API contracts, and boilerplate. Gradient-free -- works with any model.
Cache tracking. The runtime captures cacheReadInputTokens and cacheCreationInputTokens end-to-end from the SDK through SQLite. Per-agent-type rolling baselines detect cache cliff events (>50% hit rate drop) and cold-wake gaps (>5min idle). These feed into the benefits table and cost model.
The benefits summary at the end of each orchestration shows exactly what was saved.
Prompt Diet
Control how aggressively the runtime trims agent system prompts (Before-You-Act / Self-Check / Debt / Meta sections) per turn.
# OrchestrationConfig
promptDiet:
default: balanced # aggressive | balanced | conservative
overrides:
orchestrator: full # keep full prompt for this agent type
my-custom-agent: liteCLI: swarm run --prompt-diet aggressive or swarm orchestrate --prompt-diet aggressive.
Env: SWARM_PROMPT_DIET=aggressive swarm ....
Modes:
aggressive: trims all agents (including orchestrator). Biggest savings, highest risk of behavior drift.balanced(default): trims reviewers, implementers, researchers, testers, debuggers, planners, integrators, and other workers. Keepsorchestrator,grounding,devils-advocate,judge, andrefactorerat full.conservative: no automatic trimming. Only explicitpromptTieron individual agents applies (legacy behavior).
Per-agent precedence (strongest first, applied after dietConfig is resolved):
- Explicit
config.promptTieron the AgentConfig — always wins dietConfig.overrides[key]where the agent type containskey(case-insensitive substring; longest matching key wins, sosecurity-implementerbeatsimplementer)mode === 'conservative'→ no automatic tieringmaxTurns ≤ 3→minimal(inbalancedandaggressivemodes)mode === 'aggressive'→litefor everyone (including the protected roles)mode === 'balanced'(default) → protected roles (orchestrator,grounding,devils-advocate,judge,refactorer) stay full; lite-eligible roles →lite; unknown roles → full- Default → full
Config-source precedence (which dietConfig the runtime sees): CLI flag (--prompt-diet) populates orchestrationConfig.promptDiet directly, so orchestrationConfig.promptDiet ?? resolveDietFromEnv() — i.e. CLI/orchestration config wins, then SWARM_PROMPT_DIET env, then balanced default.
Security
v1.50 addressed 14 findings from an internal red-team audit. Key hardening:
- Plugin trust model — repo-local plugins no longer auto-load. Explicit opt-in required via config. npm-installed plugins load normally.
- MCP/LSP command allowlist — only allowlisted binaries can be launched. Blocks arbitrary command execution through tool server configs.
- Prompt injection defenses — cross-phase context uses boundary markers, agent definitions are positioned before user context, and inline redaction strips injection attempts.
- Path traversal guard —
swarm agents installvalidates paths to prevent writing outside the agents directory. - Secrets redaction — Anthropic API keys, Slack tokens, npm tokens, and Vercel tokens are pattern-matched and stripped from logs, JSON reports, and vault files.
- Backend opt-in — Codex and Gemini backends require explicit
--unsafe-backendflag since they execute in less-sandboxed environments. - Config validation — YAML config files are validated on load; dangerous fields (
shell,exec,command) are stripped. - Error sanitization — error messages are cleaned before being injected into retry prompts to prevent reflection-based injection.
Memory and Knowledge Graph Intelligence
SQLite-backed knowledge base with full-text search, plus a 3-tier Execution Knowledge Graph that records every orchestration as a persistent, queryable topology. Syncs to Obsidian vault for cross-machine access.
Tier 1 -- Core Graph. ExecutionGraph records orchestration topology. GraphLearner extracts cross-run patterns. GraphContextRouter replaces dump-everything context with relevance-scored assembly. GraphAnalyzer detects god nodes, bottlenecks, and topology risks. Review findings feed back into the graph to refine future context routing.
Tier 2 -- Advanced ML. CausalGraphEngine applies do-calculus to estimate treatment effects and suggest interventions. FailurePropagationPredictor uses a 3-layer GNN to predict which nodes are at risk before execution starts. MetaPatternSelector recommends orchestration patterns via TF-IDF + logistic regression. PredictiveDropout uses active learning to skip redundant agents (saving ~5K tokens per dropped agent).
Tier 3 -- Self-Aware Engine. PatternSynthesizer generates novel orchestration patterns from topology diffs. TrajectoryPredictor forecasts orchestration success mid-run. RuleEvolver proposes and backtests its own replanning rules from historical failures. TaskDiscovery mines the graph for actionable tasks. OrchestrationEmbedder produces topology-based embeddings for similarity search and transfer learning. MetaAdversarialTester red-teams the engine's own ML subsystems.
All ML is implemented in pure TypeScript with zero external ML dependencies. Every optimization is tracked by the BenefitsCollector and surfaced in the post-orchestration summary -- the engine shows its work.
Knowledge Graph vs Claude Memory
Claude Code memory gives every session the same static context. Swarm's execution graph learns from every run:
- Context routing — agents get prior-phase outputs scored by relevance (file overlap, recency), not a dump of everything
- Pattern learning — tracks success rates per orchestration pattern and recommends what works for your codebase
- Failure prediction — estimates which agents are likely to fail based on historical topology, before they run
- Cost baselines — each run is compared to historical average so you see if this orchestration was cheap or expensive relative to your norm
First run uses heuristics. Tenth run uses data.
swarm memory search "authentication"
swarm compound statsRequirements
- Node.js 20+
- jq (
brew install jq) - At least one AI backend:
- Claude Code (recommended)
- Codex CLI (
npm i -g @openai/codex) - Gemini CLI (
npm i -g @google/gemini-cli)
Contributing
See CONTRIBUTING.md. Share agents, templates, patterns, and plugins.
Cost and Usage Disclaimer
Swarm Engine orchestrates AI coding tools that consume API tokens from third-party providers (Anthropic, OpenAI, Google, etc.). Each orchestration incurs costs billed directly to your accounts with those providers. You are solely responsible for monitoring and managing your API usage and costs.
Use the --budget flag to set cost caps, --dry-run to preview estimated costs before running, and swarm plan to inspect token estimates. The built-in cost tracking is an estimate and may not reflect exact provider billing.
Swarm Engine is not affiliated with Anthropic, OpenAI, or Google. Ensure your use of each provider's API complies with their respective terms of service.
License
MIT
