@lilsaints/foreman

v0.1.15

Published

a day ago

Local control plane for fleets of Claude Code agents: plan DAGs, spawn governed workers, review and merge — from one board.

Downloads

2,347

0High
0Medium
0Low

lilsaints

claude claude-code agents orchestration mcp

Foreman

A local control plane for fleets of Claude Code agents.

Foreman looks like a miniature GitHub — Projects → Milestones → Issues — but the issues are executed by headless claude CLI instances, and the project plan is a dependency graph plus a project charter that together are the living spec of the finished product. The graph and charter are exposed over MCP, so an external "architect" agent (Claude Desktop, another Claude Code session, anything MCP-capable) can create and reshape the plan — including while a run is in progress — and the configuration around it: agent profiles (upsert_profile), project settings + verify commands (update_settings), and pause/resume (set_paused). Changes that would loosen a safety or money rail park on a human approval card first. Internally, a reconciler loop watches state and spawns worker runs (one per ready issue) or orchestrator runs (review/replan after a worker finishes, or when a milestone completes).

Foreman is a control plane, not an agent runtime. It never implements its own LLM loop. All intelligence is delegated to claude CLI processes; Foreman owns durable state, git, scheduling, verification gates, enforcement (hooks + permission brokering), institutional memory, and the UI. V1 is strictly sequential: at most one agent process alive at a time.

Architecture

                                  ┌─────────────────────────────┐
   architect agent               │   foreman serve (one proc)   │           target repo
   (Claude Desktop /            │                              │
    claude code session)         │  ┌────────┐   ┌──────────┐  │   spawn   ┌──────────────┐
        │        MCP (HTTP/stdio)│  │  MCP   │   │ scheduler│──┼──────────▶│ claude -p     │
        ├────────────────────────┼─▶│ server │   │ (tick)   │  │  stdin:   │  worker run   │
        │  get_graph             │  └───┬────┘   └────┬─────┘  │  prompt   │  on issue/N   │
        │  apply_plan_patch      │      │             │        │           └──────┬───────┘
        │  get_changes_since     │      ▼             ▼        │                  │
        │                        │  ┌─────────────────────┐    │   stream-json    │
   you (browser UI)              │  │   service layer +   │◀───┼──────────────────┘
        │        REST/WS         │  │ SQLite + event log  │    │   hooks: guard.js (PreToolUse)
        └────────────────────────┼─▶│  (source of truth)  │    │          stopcheck.js (Stop)
                                  │  └─────────┬───────────┘    │   MCP:   permission_decision
                                  │            │ verify gate,   │          (per-run token)
                                  │            ▼ squash merge   │
                                  │        git (Foreman-owned)  │
                                  └─────────────────────────────┘

One MCP surface, three clients: the external architect (project token), the internal orchestrator runs (scoped per-run tokens), and the UI (same service functions over HTTP). No privileged side-channels for mutating the graph.
Single source of truth = SQLite + an append-only event log. Every mutation goes through one service layer, emits an event, and bumps graph_version (optimistic concurrency for apply_plan_patch). The system survives kill -9 and resumes (WIP commits + crash recovery).
Deterministic gates: an issue is done only after Foreman itself ran the configured verify commands (project-level + per-node verify_extra, with a flake-guard re-run) and squash-merged the issue branch. Agent self-reports are advisory.
Hard rules are hooks, not prompts: no pushes / branch switching / merges, write confinement to the repo (minus .git/ and .foreman/), and the WorkerReport final-message contract are enforced by PreToolUse/Stop hooks delivered via a per-run --settings file — never by editing the target repo's .claude/.
Permission brokering: workers run acceptEdits with a seeded allowlist; anything else routes through --permission-prompt-tool → Foreman policy → an approval card in the UI (Allow / Deny / Allow + add to allowlist), with a configurable timeout action.

Quickstart (Claude Code plugin)

From any Claude Code session — no clone, no build:

/plugin marketplace add LilSaints/Foreman
/plugin install foreman@foreman

To move to a newer release, refresh the cached marketplace clone first — Claude Code does not re-pull it on restart, so the "update" button stays disabled until you do:

/plugin marketplace update foreman
/plugin update foreman@foreman

If the desktop Update button fails with "Failed to update the plugin", check claude plugin list for the install's scope: the button only updates user-scope installs, so a plugin you installed inside a project (local scope) won't update from it — either claude plugin install foreman@foreman --scope user, or update in-project with claude plugin update foreman@foreman --scope local. On Windows, also stop any running foreman up server (and close sessions where the plugin's MCP is active) first — an install directory still in use blocks the swap. Restart foreman up afterward; it relaunches from the new version.

Then, inside the repo you want to run fleets on:

/foreman:init        # conversational registration: proposes foreman.json from the
                     # repo (verify commands, allowlist, areas), commits only with
                     # your OK, brings the server + board UI up
/foreman:plan        # the architect: charter, milestones, a dependency DAG with
                     # per-node spec/acceptance, right-sized agent profiles

The plugin bundles the architect MCP surface (stdio, runs the pinned @lilsaints/foreman via npx — Node.js >= 20 required). Projects register paused: plan first, then resume and watch workers execute ready issues in priority/topo order while the orchestrator reviews, triages discoveries, curates lessons, and advances milestones; you get banners + approval toasts when a human is needed.

The CLI is also installable directly — npm i -g @lilsaints/foreman, then foreman up from inside any git repo: registers it as a project if needed, reuses (or starts) the server, and opens that project's board (http://127.0.0.1:4640/#/p/<project>/board). Run it in a second repo and it joins the same running server — up health-probes the port (GET /api/health: service signature + build version) and refuses to reuse anything stale or foreign instead of clashing. The pieces are still available separately — init [repo-path], serve (--fake drives the UI without burning tokens), and mcp print-config (architect MCP config for claude mcp add); init and mcp default their repo to CLAUDE_PROJECT_DIR, else cwd.

Project identity is repo-anchored: a committed foreman.json carries bootstrap config (name, verify commands, allowlist seeds, areas, profile defaults — seeds at first contact; the DB owns config after that) and .foreman/project-id binds the clone to its project, so a moved repo rebinds on the next up and a fresh clone gets an explicit --adopt <id> | --create-new choice instead of a silent duplicate. Init never commits to your branch silently — it shows what it wrote and asks (--yes / --no-commit to skip).

# the full golden-path demo (§16): scripted architect + fake workers + every rail
foreman demo                                      # free + deterministic (~3 min);
                                                  # --real = actual claude workers (1-2h, real usage)

Contributing / building from source

pnpm install && pnpm -r build && pnpm test        # workspace dev loop
node apps/foreman/bin/foreman.js up               # run the CLI from the workspace
node scripts/probe-hooks.mjs                      # hook-enforcement probe (real claude, 2 short runs)

Releases: bump versions with node scripts/version-lockstep.mjs --set X.Y.Z, tag vX.Y.Z, push — CI verifies lockstep, builds, tests, and publishes @lilsaints/foreman (the plugin pin rides the same tag). Then confirm it landed: npm view @lilsaints/foreman version dist-tags shows the new version as latest. To validate the pipeline without publishing, run the release workflow via workflow_dispatch — publish is guarded to v* tags.

The model

| Concept | What it is | |---|---| | Charter | Cross-cutting invariants (architecture rules, module boundaries, conventions) injected into every worker and orchestrator prompt. | | Issue (node) | Spec + acceptance criteria + priority + declared paths + optional per-node verify_extra. Executed on branch issue/<ref>-<slug>, squash-merged as #<ref>: <title>. | | Blocks edge | Dependency (cycle-checked DAG). Issues schedule only when all blockers are done and their milestone is active. | | Area | Subsystem taxonomy: UI color, default lesson scope, optional default agent profile, and the V2 disjointness predicate. | | Milestone | Ordered phase with an auto or human gate and an optional milestone check command that must pass before the orchestrator replans and activates the next one. | | Lessons | Curated institutional memory. Workers propose candidates in reports; the orchestrator curates (dedupe/generalize/scope/retire); Foreman injects matching lessons into worker prompts and mirrors them into CLAUDE.md via a managed block. Complements Claude Code's native auto memory (left enabled). | | Agent profile | Per-role model/effort/permission settings. New projects seed both roles with the strongest catalog model (claude-opus-4-8 today, after Fable 5's withdrawal) at effort max. The catalog = pinned lineup merged with strictly-newer ids discovered in the installed CLI at serve startup (model dropdown in Settings; GET /api/models). |

Issue lifecycle

draft → ready → queued → running → verifying → needs_review → done (= squash merge)
                           │           │             ├→ changes_requested → queued
                           ├→ failed ←─┘             └→ blocked
                           └→ cancelled

Retries fork the failed session (--resume --fork-session) with the failing verify tail and orchestrator notes appended; maxAttemptsPerNode caps total runs (rate-limit failures refund their attempt and soft-pause the project). Under reviewPolicy: "gated", a green verify with a clean report and an in-scope diff merges without an orchestrator run — the cost lever for trivial chores.

Why workers can't edit the plan (Decision 11)

Workers observe, the orchestrator triages, the architect/human decides. All mid-task discovery — missing prerequisites, obsolete nodes, wrong specs, decisions needed — flows through the structured WorkerReport.plan_feedback; workers have no graph-mutation tools at all (their MCP token only exposes get_node for their own issue plus the permission tool).

The rationale: a worker sees one node's context; granting it graph writes invites self-justified scope changes, duplicate nodes, and (in V2) version churn. The tiny worker MCP scope also doubles as the prompt-injection blast radius: nothing a worker reads from the repo can rewrite the plan. End-of-run reporting loses nothing while V1 is sequential, because the orchestrator only acts between runs anyway; mid-run feedback over MCP is a V2 consideration once parallel lanes could pick up discovered prerequisites early.

The same boundary extends from the graph to config (#19): the settings tools (upsert_profile, update_settings, set_paused) register for architect and local tokens only — never run-scoped ones. A run raising its own cost caps or swapping itself to a pricier model is the same self-justified-scope-change failure. And because the architect is itself a model that reads worker reports, mutations that loosen a rail (cost caps raised/removed; runTimeoutMinutes/approvalTimeoutSec/maxAttemptsPerNode flipped to unlimited; requireSubscriptionAuth off, bypassPermissions, deny-pattern removals, allowlist additions, approval-timeout deny→pause) park on the standard approval card and apply only on Allow; tightening and neutral changes (model/effort swaps, verify edits, finite timeout/attempt edits) apply immediately. Bounded numeric caps accept null = unlimited — a visible first-class state (the Settings UI shows a per-field ∞ toggle), not an implicit unset or a sentinel (#54).

Repository layout

packages/core     domain model, SQLite + migrations, services, state machine, DAG math,
                  plan patches, lessons, prompts, verify gate, GitOps, scheduler
packages/mcp      MCP server: graph tools + permission_decision, scoped tokens, stdio + HTTP
packages/runner   AgentRunner seam, ClaudeCliRunner, FakeRunner, per-run settings generation,
                  hook scripts (guard.js, stopcheck.js), approval broker, RunManager
packages/server   Hono REST + WS, MCP mount at /mcp, static UI hosting
packages/web      React + Vite + Tailwind dark UI (Board, Graph, Runs, Settings, Events)
apps/foreman      CLI: up | init | serve | mcp | demo

pnpm test runs 188 tests; no test ever spawns the real CLI (FakeRunner replays scripted stream-json, including permission and steering round-trips).

Real-CLI smoke checklist

The runner is wired against a verified CLI contract, not docs-from-memory. Re-run this when bumping the pinned Claude Code version (runs set DISABLE_AUTOUPDATER=1; the version is recorded on every run row):

[x] claude --version → 2.1.170 (contract verified against this version)
[x] --help lists: -p, --output-format stream-json, --input-format stream-json, --verbose, --model, --session-id, --resume, --fork-session, --permission-mode (incl. plan), --settings, --allowedTools, --disallowedTools, --append-system-prompt, --mcp-config, --strict-mcp-config, --json-schema, --bare, --effort low|medium|high|xhigh|max, --max-budget-usd
[x] --permission-prompt-tool and --max-turns are hidden but accepted (probed: a real run honored both)
[x] Result envelope carries session_id, num_turns, total_cost_usd, usage.input_tokens/output_tokens, result
[x] node scripts/probe-hooks.mjs — real haiku runs with Foreman-generated per-run settings, one leg per command tool (Bash + PowerShell on Windows): PreToolUse guard denies git push with the reason fed back to the model; Stop hook yields a parseable WorkerReport final message (~$0.06 for both legs)
[x] PowerShell tool guard parity (Windows): the headless CLI exposes a PowerShell tool (confirmed on 2.1.170 — the model invoked it when asked); the guard matcher covers it, the shared deny patterns fire on it, and native PowerShell(git push:*)-style deny mirrors are in the per-run settings
[x] Auto memory engages in -p mode (verified 2026-06-12 on 2.1.170 with CLAUDE_CODE_DISABLE_AUTO_MEMORY=0): turn 1 wrote memory/MEMORY.md + a topic file under the ~/.claude project dir at Stop time, and a separate second -p session recalled the fact. Note: the memory write lands at session end — don't read the dir mid-run.
[x] --model claude-fable-5 --effort max (the new-project default) accepted headless on the pinned CLI under subscription auth — re-probed 2026-06-12 for the readiness scenario (issue #11): a one-turn probe plus every demo/e2e run row (25+ real runs) carried claude-fable-5/max with usage tokens and CLI version 2.1.170 recorded
[x] Real readiness scenario green (issue #11, 2026-06-12): foreman demo --real golden path with real workers, plus node scripts/e2e-real.mjs — a scripted 3-issue mini-project covering induced verify-failure → forked-session retry, a PowerShell-tool push denial ingested as hook_denial, needs_decision → decision-mode orchestrator, per-node verify_override, and a milestone check with verify_timeout_s. Findings and fixes in DECISIONS.md (M-RT)
[ ] --permission-mode plan does not interfere with MCP tool calls for orchestrator runs (orchestrators currently run acceptEdits + an edit-free guard; revisit if plan mode proves compatible)
[ ] Native sandbox settings shape (sandbox: {enabled: true}) on macOS/Linux when settings.sandbox=true (not applicable on Windows, where hooks + brokering are the posture)

Onboarding smoke — plugin channel gate (issue #17)

The whole plugin path on a machine that never saw the monorepo. Endjinn does not onboard until this is green on a clean Windows profile/VM (it needs the plugin on main and the package on npm, so it runs post-merge/post-publish):

Install Claude Code → /plugin marketplace add LilSaints/Foreman → /plugin install foreman@foreman (expect: marketplace + plugin install in well under 5 minutes; no clone, no pnpm).
Sandbox repo: /foreman:init (manifest proposed from the repo, artifacts committed only on your OK) → /foreman:plan a small milestone → resume → at least one real worker run completes through verify + squash merge (#<ref>: <title> on the integration branch); run rows carry the foreman + claude versions.

Every leg of that scenario has been verified on this machine from the installed package (clean npm prefix + isolated CLAUDE_CONFIG_DIR, claude 2.1.170, Windows — 2026-06-12): global install → foreman up with web UI and live WS; marketplace add + plugin install + ✔ Connected plugin MCP (the node launcher wraps cmd /c npx on Windows); /foreman:init and /foreman:plan E2E on real sonnet sessions; foreman probe-hooks PASS both legs; real worker run through verify + squash merge via the installed serve. The clean-VM pass is the remaining checkbox.

Safety rails (V1)

Localhost binding · bearer + per-run scoped MCP tokens · guarded status transitions · spec-lock on running nodes · hook-enforced no-push/no-switch/write-confinement (covering both command tools — Bash and the PowerShell tool on Windows) · permission brokering with timeout policy · wall-clock run timeouts · attempt caps · per-run and per-day cost caps (plus the CLI's own --max-budget-usd) · subscription-only auth (requireSubscriptionAuth, default on: ANTHROPIC_API_KEY is stripped from runs — Foreman never silently bills the API; the UI reports usage in tokens and marks $ as notional) · dirty-integration refusal + WIP preservation · pinned CLI version recorded per run · transcript rotation caps · transcripts never leave disk (the UI marks them local-only).

PowerShell-first repos (Windows)

Guard parity is on by default: the deny patterns apply to the PowerShell tool exactly as to Bash, orchestrator runs additionally lose the write-cmdlet surface (Set-Content, Out-File, file redirection), and new projects seed PowerShell(git …) allowlist mirrors. Build/test commands are repo-specific, so before the first unattended batch either pre-seed them on the worker profile (Settings → allowed tools, e.g. PowerShell(.\scripts\check.ps1:*)) or do one supervised run and use Allow + add to allowlist — it derives narrow PowerShell(<launcher> <subcommand>:*) patterns, never the whole tool.

Two matcher facts verified on 2.1.170 (issue #11 readiness probes): an allowlist pattern must span the entire first command token — PowerShell(.\scripts\check.ps1:*) works, PowerShell(.\scripts:*) never matches — and a powershell -File … wrapper inside the PowerShell tool is a nested interpreter that the CLI always routes to the permission prompt regardless of allowlist. So seed per-script patterns and have the charter mandate bare first-token invocation (.\scripts\check.ps1 [args], no $x = … prefixes or statement chains before the script); models otherwise vary the command shape and re-trigger approvals.

V2 seams (deliberately not built)

Parallel workers via worktrees with path-disjointness scheduling (the area taxonomy is the first-cut predicate) and a merge queue; mid-run plan_feedback over MCP; multiple repos per project; non-Claude executors behind the AgentRunner seam; PR-based flows; packaging the architect side as a Claude Code plugin (bundled MCP config + a planning skill) as the intended distribution channel. Known V2 items: auto memory fragments across worktrees (it is keyed to the working directory), and Claude Code's native agent-teams features may replace parts of the fan-out — evaluate as an alternative AgentRunner backend rather than fighting them. See DECISIONS.md for every build-time decision.