swarm-engine

v1.54.0

Published

a month ago

Self-aware multi-agent orchestration engine with knowledge graph, causal inference, GNN failure prediction, and self-evolving rules — pure TypeScript

0High
0Medium
0Low

simoncoombes

ai agent multi-agent orchestration claude codex gemini swarm coding developer-tools

Swarm Engine

Multi-agent orchestration for AI coding tools. Coordinates Claude, Codex, and Gemini through research, implementation, and review phases. 20-35% token reduction on parallel workflows (measured per-run), cost transparency after every orchestration, and a knowledge graph that sharpens context routing with each run. 1,848 tests. 14-finding security audit. MIT licensed.

26 agents, 7 composable patterns, pure TypeScript, zero external ML dependencies. Works with Claude Code, OpenAI Codex, Google Gemini CLI, and Vercel AI SDK. Mix models across agents in the same orchestration.

What It Looks Like

  ⚡ Swarm Engine — hybrid pattern

  Phase: research ━━━━━━━━━━━━━━━━━━━━ done
    ✓ researcher-code       sonnet-4-6     3.2K tok   14s
    ✓ researcher-context    sonnet-4-6     1.8K tok    9s

  Phase: implement ━━━━━━━━━━━━━━━━━━━ 1m 12s
    ● implementer           opus-4-6       8.4K tok   1m 12s  src/auth/rate-limit.ts

  Phase: review ────────────────────── pending
    ○ reviewer-security     opus-4-6
    ○ reviewer-perf         sonnet-4-6
    ○ reviewer-convention   sonnet-4-6

  Timeline: ━━──── (1/3 phases)

  Recent findings:
    ○ researcher-code: express-rate-limit already in package.json
    ○ researcher-context: vault says rate limiter goes before auth middleware

  12.8K tokens │ $0.24 │ 1m 36s  ~2m remaining

When it finishes, you see exactly what the engine did for you:

┌─ Engine Benefits ──────────────────────────────────────────────────────┐
│  Cost: $2.8400  (avg: $3.6200, saved $0.7800)  |  28% tokens saved   │
├────────────────────────────────────────────────────────────────────────┤
│  TOKEN EFFICIENCY                                                      │
│  ├─ Smart routing         -18,400 tok     filtered phase outputs       │
│  ├─ Verbatim compaction   -12,800 tok     replaced with refs           │
│  ├─ Context decay          -6,200 tok     older phases summarized      │
│  ├─ Prompt diet (balanced) -4,100 tok     trimmed agent prompts        │
│  ├─ Tool search deferred  -14,000 tok     ENABLE_TOOL_SEARCH=true      │
│  └─ TOTAL SAVED           -55,500 tok     28% reduction                │
│                                                                        │
│  CACHE                                                                 │
│  ├─ Cache read tokens     142,800         reused from prior turns      │
│  ├─ Cache creation         28,400         new cache entries            │
│  └─ Cache hit rate         83.4%          no cliff events detected     │
│                                                                        │
│  KNOWLEDGE GRAPH                                                       │
│  ├─ Context routing       8               graph-optimized per agent    │
│  ├─ Confidence gates      2 eval          all passed                   │
│  └─ Pattern history       91% success     14 runs                      │
│                                                                        │
│  ML / GNN                                                              │
│  └─ Predictive dropout    1 agents        -5,000 tok saved             │
│                                                                        │
│  ADAPTIVE                                                              │
│  ├─ Model downgrades      2               cheaper where safe           │
│  └─ Living spec           3 updates       refined during run           │
└────────────────────────────────────────────────────────────────────────┘

The more you use it, the richer this gets -- the knowledge graph, GNN predictions, cache baselines, and historical cost averages build over time.

Install

npm (recommended):

npm install -g swarm-engine
swarm install   # set up Claude Code integration (agents, commands, hooks)
swarm doctor    # verify everything works

Homebrew:

brew tap simoncoombes/swarm
brew install swarm-engine
swarm install

npx (try without installing):

npx swarm-engine doctor

From source:

git clone https://github.com/simoncoombes/swarm-engine.git ~/dev/swarm-engine
cd ~/dev/swarm-engine && npm install && npm run build && npm link
swarm install

Requires Node.js 20+, jq, and at least one of Claude Code, Codex, or Gemini CLI.

Quick Start

In Claude Code:

/swarm "add rate limiting to the API"

That's it. Agents spawn as teammates, research the codebase, implement the changes, and review the result. You see their work in split panes and get a summary when they're done.

Other slash commands for specific patterns:

/research "how does the auth system work?"
/tdd "add input validation to user endpoints"
/red-team "harden the payment flow"
/review-cycle "refactor the database layer"

Standalone CLI

You can also run orchestrations directly from any terminal, outside of Claude Code:

swarm orchestrate "add rate limiting"          # inline progress
swarm orchestrate "add rate limiting" --panes  # tmux split panes
swarm orchestrate "add rate limiting" --tui    # full-screen dashboard
swarm plan "add rate limiting"                 # preview plan (free)

The --panes flag uses tmux to show each agent in its own split pane. Install with brew install tmux (macOS) or sudo apt install tmux (Linux).

VS Code and Cursor

Swarm Engine ships with a VS Code extension that works in both VS Code and Cursor.

Install the extension:

cd ~/dev/swarm-engine/vscode-extension
npm install && npm run build
npx @vscode/vsce package --allow-missing-repository

Then: Cmd+Shift+P > "Extensions: Install from VSIX" > select the .vsix file.

What you get:

@swarm in Copilot Chat - type @swarm add auth middleware and it orchestrates the task
Sidebar panel with quick actions, pattern browser, and agent list
Command palette (Cmd+Shift+P > "Swarm") for all commands
Status bar shortcut

Copilot Chat examples:

@swarm add rate limiting to the API
@swarm plan add auth middleware
@swarm template bug-fix
@swarm status

Use as a Library

Swarm Engine can be imported directly into Node.js applications:

npm install swarm-engine

import { SwarmEngine } from 'swarm-engine';

const engine = new SwarmEngine({ mock: true });
const result = await engine.orchestrate({
  task: 'Build a REST API',
  pattern: 'hybrid',
});
console.log(result.status);

Key exports:

import {
  SwarmEngine,       // Main orchestration engine
  AgentRegistry,     // Load and manage agent definitions
  EventBus,          // Typed event system for monitoring
  PatternRegistry,   // Composable orchestration patterns
  BackendRegistry,   // Multi-backend (Claude, Codex, Gemini, Vercel AI)
  CostModel,         // Estimate token costs before running
  ModelRouter,       // UCB1-based model selection
  TemplateRegistry,  // Save and replay successful workflows
} from 'swarm-engine';

// Tier 1: Core Graph
import {
  ExecutionGraph,          // Persistent execution knowledge graph
  GraphLearner,            // Cross-run pattern learning
  GraphContextRouter,      // Relevance-scored context assembly
  GraphAnalyzer,           // Topology analysis and failure prediction
  ReviewFeedbackRecorder,  // Review findings into graph
} from 'swarm-engine';

// Tier 2: Advanced ML
import {
  CausalGraphEngine,              // Do-calculus causal inference
  FailurePropagationPredictor,    // 3-layer GNN failure prediction
  AdversarialEvolver,             // Thompson sampling red-team
  MetaPatternSelector,            // TF-IDF + logistic pattern recommendation
  PredictiveDropout,              // Active learning agent dropout
} from 'swarm-engine';

// Tier 3: Self-Aware Engine
import {
  PatternSynthesizer,     // Topology diff → novel patterns
  TrajectoryPredictor,    // Mid-run success prediction
  MetaAdversarialTester,  // Red-teams the engine's own ML
  RuleEvolver,            // Self-evolving replanning rules
  TaskDiscovery,          // Mines failure patterns for tasks
  OrchestrationEmbedder,  // Topology embeddings for transfer learning
} from 'swarm-engine';

// Token Compression & Benefits
import {
  VerbatimCompactor,       // Replace file reads, diffs, stack traces with refs
  ContextDecayManager,     // Hierarchical time-decay summarization
  ACONOptimizer,           // Failure-driven compression guidelines
  PromptCompressor,        // Strip markdown boilerplate from prompts
  getOutputSchema,         // Structured JSON schemas per agent type
  BenefitsCollector,       // Aggregate optimization metrics
  formatBenefitsTable,     // Render styled benefits summary
  createOutputSummarizerHook,  // PostToolUse hook for Bash output reduction
} from 'swarm-engine';

See src/index.ts for the full export surface.

Why Swarm Engine

Tools like Claude Code already let you spawn parallel agents with teams. Swarm Engine adds the orchestration layer: which agents to run, in what order, with what context, on which models, and how to learn from the results.

20-35% token reduction — measured per-run on typical parallel orchestrations. Cross-phase context filtering, verbatim compaction, context decay, prompt diet (3 modes), and tool schema deferral (~14K saved per session). No configuration needed. Cache-aware: tracks hit rates and detects cliff events.
Knowledge graph improves with every run — first run uses heuristics, tenth run uses data. Context routing scores prior outputs by file overlap and recency. Pattern learning tracks success rates per orchestration type. Failure prediction estimates risk from historical topology before agents run. Cost baselines compare each run to your historical average.
Cost transparency after every run — every orchestration ends with a benefits summary: actual cost, historical comparison, tokens saved (with compounding math), cache hit rates, and what each optimization contributed. swarm plan previews estimated cost before execution.
Failure prediction — 3-layer GNN propagation model predicts which agents are likely to fail based on historical topology. Causal inference (do-calculus) explains why. Pure TypeScript, no external ML deps.
7 composable patterns — hybrid, TDD, red-team, spike, discover, review-cycle, research. Compose them: --pattern "tdd | red-team". Plus 12 slash commands including postmortem, diff-review, and fix-pr.
Mix backends per agent — Claude for implementation, Codex for review, Gemini for research. Assign models at the agent level within one orchestration.
26 specialized agents — 16 core roles plus 10 focused reviewers (security, performance, data integrity, API contracts, testing, accessibility, dependencies, error handling, concurrency, documentation).
14-finding security audit — 3 recon agents + 3 adversarial breakers, all findings hardened. Plugin trust model, MCP command allowlists, prompt injection defense, path traversal guards, secrets redaction (16 pattern categories), file permissions hardened to 0o600/0o700.
1,848 tests across 104 files — reusable templates let you save and replay successful orchestrations (swarm template run bug-fix).

Templates

Save successful orchestrations as reusable templates:

swarm template list

  add-endpoint     - REST API endpoint with tests
  bug-fix          - Reproduce, diagnose, fix, verify
  security-audit   - OWASP security review
  refactor         - Safe refactoring with verification gates
  migration        - Schema migration with rollback

swarm template run add-endpoint -i

Explain Plan

See exactly what will happen before it runs:

swarm plan "add auth middleware" --pattern hybrid

  Phase 1: research [parallel]
    researcher-code    sonnet   ~40K tokens
    researcher-context sonnet   ~25K tokens
  Phase 2: implement [sequential]
    implementer        opus     ~100K tokens
  Phase 3: review [parallel]
    reviewer-correct   opus     ~60K tokens
    reviewer-security  opus     ~60K tokens

  Est. cost: $3.20 | Est. duration: 95s
  Optimizations: model downgrade for research phase (sonnet vs opus),
                 cross-phase filtering (reviewers skip research output)

Commands

Daily use

| Command | What it does | |---------|-------------| | /swarm <task> | Full orchestration - research, implement, review | | /review-cycle <task> | Implement with iterative quality gate | | /diff-review [base] | Review branch diff before PR | | /research <question> | Parallel research across multiple angles | | /tdd <feature> | Test-driven: write tests first, then implement |

Advanced

| Command | What it does | |---------|-------------| | /spike <problem> | Two approaches compete, judge picks winner | | /red-team <task> | Adversarial build and break | | /discover <problem> | Hypothesize, experiment, implement winner | | /dynamic <task> | Planner decomposes into custom agent workflow | | /postmortem <error> | Root cause analysis, fix, and prevention | | /fix-pr <PR#> | Fix PR review comments | | /resume | Resume from checkpoint |

Patterns

7 composable patterns. Use them individually or combine them.

| Pattern | Flow | Phases | |---------|------|--------| | hybrid | Research, Implement, Review | 3 | | research | Parallel fan-out research | 1 | | review-cycle | Implement, Challenge, Review (iterative) | 4 | | tdd | Test-first, Implement, Verify, Review | 5 | | spike | Two approaches compete, judge decides | 4 | | red-team | Build, Break, Harden | 4 | | discover | Hypothesize, Experiment, Implement winner | 5 |

Compose patterns: swarm orchestrate "task" --pattern "tdd | red-team"

Agents

16 core agents plus 10 specialized reviewers.

| Agent | Role | |-------|------| | researcher | Explores code, finds patterns, traces dependencies | | implementer | Writes clean, tested code following conventions | | reviewer | Finds bugs, security issues, convention violations | | tester | Writes and runs tests, reports coverage | | debugger | Reproduces and fixes bugs systematically | | planner | Designs architecture before code is written | | refactorer | Safe incremental refactoring with test gates | | integrator | Verifies cross-module contracts after parallel work | | devils-advocate | Challenges every assumption before review | | grounding | Verifies the implementation solves the actual problem | | orchestrator | Coordinates everything | | judge | Evaluates competing implementations and picks the winner | | sentinel | Background: watches git activity | | guardian | Background: runs affected tests | | librarian | Background: maintains knowledge quality | | documenter | Writes technical documentation |

Specialized reviewers: security, performance, data integrity, API contracts, testing, accessibility, dependencies, error handling, concurrency, documentation.

Cross-Tool Support

Convert your agents to work natively in other tools:

swarm convert --to copilot    # GitHub Copilot .agent.md format
swarm convert --to cursor     # Cursor .mdc rules
swarm convert --to codex      # OpenAI Codex prompts
swarm convert --to gemini     # Gemini CLI skills
swarm convert --to opencode   # OpenCode agents
swarm convert --to windsurf   # Windsurf skills

Token Efficiency

Saves 20-35% of tokens on typical parallel orchestrations (measured and shown after every run). Multi-agent context grows linearly without optimization; Swarm Engine applies these compression techniques automatically -- no configuration needed.

| Technique | What it does | Typical savings | |-----------|-------------|-----------------| | Cross-phase filtering | Implement phases only get research/plan outputs. Review phases only get implement/test outputs. File-scope filtering removes irrelevant files. | 10-30% per phase | | Verbatim compaction | Replaces file reads, git diffs, test output, and stack traces with compact references. Runs on shared-context files, not just agent output. | 50-70% on tool-heavy outputs | | Context decay | Recent phases: full fidelity. Older phases: heuristic summary. Oldest: one-line reference. | 5-20x on deep pipelines | | Prompt compression | Strips YAML frontmatter, markdown headers, duplicate prefixes, and excessive indentation from agent prompts. | 10-25% per prompt | | Output schemas | Structured JSON contracts per agent type force compact output instead of verbose prose. | 3-5x on agent outputs | | Tool search deferral | ENABLE_TOOL_SEARCH=true auto-set on Claude backend. Defers tool schema loading until needed. | ~14K tokens per agent session | | Output summarizer | PostToolUse hook injects compact digest after verbose Bash output (>50 lines). Classifies output type, extracts errors and test results. | Reduces model reasoning load on verbose commands |

ACON (Agent Context Optimization) goes further: it records trajectory pairs (full context vs compressed), analyzes failures where compression caused worse outcomes, and iteratively refines compression guidelines. Ships with 8 built-in guidelines covering error messages, file paths, decisions, API contracts, and boilerplate. Gradient-free -- works with any model.

Cache tracking. The runtime captures cacheReadInputTokens and cacheCreationInputTokens end-to-end from the SDK through SQLite. Per-agent-type rolling baselines detect cache cliff events (>50% hit rate drop) and cold-wake gaps (>5min idle). These feed into the benefits table and cost model.

The benefits summary at the end of each orchestration shows exactly what was saved.

Prompt Diet

Control how aggressively the runtime trims agent system prompts (Before-You-Act / Self-Check / Debt / Meta sections) per turn.

# OrchestrationConfig
promptDiet:
  default: balanced            # aggressive | balanced | conservative
  overrides:
    orchestrator: full         # keep full prompt for this agent type
    my-custom-agent: lite

CLI: swarm run --prompt-diet aggressive or swarm orchestrate --prompt-diet aggressive. Env: SWARM_PROMPT_DIET=aggressive swarm ....

Modes:

aggressive: trims all agents (including orchestrator). Biggest savings, highest risk of behavior drift.
balanced (default): trims reviewers, implementers, researchers, testers, debuggers, planners, integrators, and other workers. Keeps orchestrator, grounding, devils-advocate, judge, and refactorer at full.
conservative: no automatic trimming. Only explicit promptTier on individual agents applies (legacy behavior).

Per-agent precedence (strongest first, applied after dietConfig is resolved):

Explicit config.promptTier on the AgentConfig — always wins
dietConfig.overrides[key] where the agent type contains key (case-insensitive substring; longest matching key wins, so security-implementer beats implementer)
mode === 'conservative' → no automatic tiering
maxTurns ≤ 3 → minimal (in balanced and aggressive modes)
mode === 'aggressive' → lite for everyone (including the protected roles)
mode === 'balanced' (default) → protected roles (orchestrator, grounding, devils-advocate, judge, refactorer) stay full; lite-eligible roles → lite; unknown roles → full
Default → full

Config-source precedence (which dietConfig the runtime sees): CLI flag (--prompt-diet) populates orchestrationConfig.promptDiet directly, so orchestrationConfig.promptDiet ?? resolveDietFromEnv() — i.e. CLI/orchestration config wins, then SWARM_PROMPT_DIET env, then balanced default.

Security

v1.50 addressed 14 findings from an internal red-team audit. Key hardening:

Plugin trust model — repo-local plugins no longer auto-load. Explicit opt-in required via config. npm-installed plugins load normally.
MCP/LSP command allowlist — only allowlisted binaries can be launched. Blocks arbitrary command execution through tool server configs.
Prompt injection defenses — cross-phase context uses boundary markers, agent definitions are positioned before user context, and inline redaction strips injection attempts.
Path traversal guard — swarm agents install validates paths to prevent writing outside the agents directory.
Secrets redaction — Anthropic API keys, Slack tokens, npm tokens, and Vercel tokens are pattern-matched and stripped from logs, JSON reports, and vault files.
Backend opt-in — Codex and Gemini backends require explicit --unsafe-backend flag since they execute in less-sandboxed environments.
Config validation — YAML config files are validated on load; dangerous fields (shell, exec, command) are stripped.
Error sanitization — error messages are cleaned before being injected into retry prompts to prevent reflection-based injection.

Memory and Knowledge Graph Intelligence

SQLite-backed knowledge base with full-text search, plus a 3-tier Execution Knowledge Graph that records every orchestration as a persistent, queryable topology. Syncs to Obsidian vault for cross-machine access.

Tier 1 -- Core Graph. ExecutionGraph records orchestration topology. GraphLearner extracts cross-run patterns. GraphContextRouter replaces dump-everything context with relevance-scored assembly. GraphAnalyzer detects god nodes, bottlenecks, and topology risks. Review findings feed back into the graph to refine future context routing.

Tier 2 -- Advanced ML. CausalGraphEngine applies do-calculus to estimate treatment effects and suggest interventions. FailurePropagationPredictor uses a 3-layer GNN to predict which nodes are at risk before execution starts. MetaPatternSelector recommends orchestration patterns via TF-IDF + logistic regression. PredictiveDropout uses active learning to skip redundant agents (saving ~5K tokens per dropped agent).

Tier 3 -- Self-Aware Engine. PatternSynthesizer generates novel orchestration patterns from topology diffs. TrajectoryPredictor forecasts orchestration success mid-run. RuleEvolver proposes and backtests its own replanning rules from historical failures. TaskDiscovery mines the graph for actionable tasks. OrchestrationEmbedder produces topology-based embeddings for similarity search and transfer learning. MetaAdversarialTester red-teams the engine's own ML subsystems.

All ML is implemented in pure TypeScript with zero external ML dependencies. Every optimization is tracked by the BenefitsCollector and surfaced in the post-orchestration summary -- the engine shows its work.

Knowledge Graph vs Claude Memory

Claude Code memory gives every session the same static context. Swarm's execution graph learns from every run:

Context routing — agents get prior-phase outputs scored by relevance (file overlap, recency), not a dump of everything
Pattern learning — tracks success rates per orchestration pattern and recommends what works for your codebase
Failure prediction — estimates which agents are likely to fail based on historical topology, before they run
Cost baselines — each run is compared to historical average so you see if this orchestration was cheap or expensive relative to your norm

First run uses heuristics. Tenth run uses data.

swarm memory search "authentication"
swarm compound stats

Requirements

Node.js 20+
jq (brew install jq)
At least one AI backend:
- Claude Code (recommended)
- Codex CLI (npm i -g @openai/codex)
- Gemini CLI (npm i -g @google/gemini-cli)

Contributing

See CONTRIBUTING.md. Share agents, templates, patterns, and plugins.

Cost and Usage Disclaimer

Swarm Engine orchestrates AI coding tools that consume API tokens from third-party providers (Anthropic, OpenAI, Google, etc.). Each orchestration incurs costs billed directly to your accounts with those providers. You are solely responsible for monitoring and managing your API usage and costs.

Use the --budget flag to set cost caps, --dry-run to preview estimated costs before running, and swarm plan to inspect token estimates. The built-in cost tracking is an estimate and may not reflect exact provider billing.

Swarm Engine is not affiliated with Anthropic, OpenAI, or Google. Ensure your use of each provider's API complies with their respective terms of service.

License

MIT