@lilsaints/foreman
v0.1.15
Published
Local control plane for fleets of Claude Code agents: plan DAGs, spawn governed workers, review and merge — from one board.
Downloads
2,347
Maintainers
Readme
Foreman
A local control plane for fleets of Claude Code agents.
Foreman looks like a miniature GitHub — Projects → Milestones → Issues — but the issues are
executed by headless claude CLI instances, and the project plan is a dependency graph plus
a project charter that together are the living spec of the finished product. The graph and
charter are exposed over MCP, so an external "architect" agent (Claude Desktop, another
Claude Code session, anything MCP-capable) can create and reshape the plan — including while a
run is in progress — and the configuration around it: agent profiles (upsert_profile),
project settings + verify commands (update_settings), and pause/resume (set_paused).
Changes that would loosen a safety or money rail park on a human approval card first. Internally, a reconciler loop watches state and spawns worker runs
(one per ready issue) or orchestrator runs (review/replan after a worker finishes, or when
a milestone completes).
Foreman is a control plane, not an agent runtime. It never implements its own LLM loop.
All intelligence is delegated to claude CLI processes; Foreman owns durable state, git,
scheduling, verification gates, enforcement (hooks + permission brokering), institutional
memory, and the UI. V1 is strictly sequential: at most one agent process alive at a time.
Architecture
┌─────────────────────────────┐
architect agent │ foreman serve (one proc) │ target repo
(Claude Desktop / │ │
claude code session) │ ┌────────┐ ┌──────────┐ │ spawn ┌──────────────┐
│ MCP (HTTP/stdio)│ │ MCP │ │ scheduler│──┼──────────▶│ claude -p │
├────────────────────────┼─▶│ server │ │ (tick) │ │ stdin: │ worker run │
│ get_graph │ └───┬────┘ └────┬─────┘ │ prompt │ on issue/N │
│ apply_plan_patch │ │ │ │ └──────┬───────┘
│ get_changes_since │ ▼ ▼ │ │
│ │ ┌─────────────────────┐ │ stream-json │
you (browser UI) │ │ service layer + │◀───┼──────────────────┘
│ REST/WS │ │ SQLite + event log │ │ hooks: guard.js (PreToolUse)
└────────────────────────┼─▶│ (source of truth) │ │ stopcheck.js (Stop)
│ └─────────┬───────────┘ │ MCP: permission_decision
│ │ verify gate, │ (per-run token)
│ ▼ squash merge │
│ git (Foreman-owned) │
└─────────────────────────────┘- One MCP surface, three clients: the external architect (project token), the internal orchestrator runs (scoped per-run tokens), and the UI (same service functions over HTTP). No privileged side-channels for mutating the graph.
- Single source of truth = SQLite + an append-only event log. Every mutation goes through
one service layer, emits an event, and bumps
graph_version(optimistic concurrency forapply_plan_patch). The system surviveskill -9and resumes (WIP commits + crash recovery). - Deterministic gates: an issue is
doneonly after Foreman itself ran the configured verify commands (project-level + per-nodeverify_extra, with a flake-guard re-run) and squash-merged the issue branch. Agent self-reports are advisory. - Hard rules are hooks, not prompts: no pushes / branch switching / merges, write
confinement to the repo (minus
.git/and.foreman/), and the WorkerReport final-message contract are enforced by PreToolUse/Stop hooks delivered via a per-run--settingsfile — never by editing the target repo's.claude/. - Permission brokering: workers run
acceptEditswith a seeded allowlist; anything else routes through--permission-prompt-tool→ Foreman policy → an approval card in the UI (Allow / Deny / Allow + add to allowlist), with a configurable timeout action.
Quickstart (Claude Code plugin)
From any Claude Code session — no clone, no build:
/plugin marketplace add LilSaints/Foreman
/plugin install foreman@foremanTo move to a newer release, refresh the cached marketplace clone first — Claude Code does not re-pull it on restart, so the "update" button stays disabled until you do:
/plugin marketplace update foreman
/plugin update foreman@foremanIf the desktop Update button fails with "Failed to update the plugin", check
claude plugin list for the install's scope: the button only updates user-scope
installs, so a plugin you installed inside a project (local scope) won't update from it —
either claude plugin install foreman@foreman --scope user, or update in-project with
claude plugin update foreman@foreman --scope local. On Windows, also stop any running
foreman up server (and close sessions where the plugin's MCP is active) first — an install
directory still in use blocks the swap. Restart foreman up afterward; it relaunches from the
new version.
Then, inside the repo you want to run fleets on:
/foreman:init # conversational registration: proposes foreman.json from the
# repo (verify commands, allowlist, areas), commits only with
# your OK, brings the server + board UI up
/foreman:plan # the architect: charter, milestones, a dependency DAG with
# per-node spec/acceptance, right-sized agent profilesThe plugin bundles the architect MCP surface (stdio, runs the pinned
@lilsaints/foreman via npx —
Node.js >= 20 required). Projects register paused: plan first, then resume and watch
workers execute ready issues in priority/topo order while the orchestrator reviews, triages
discoveries, curates lessons, and advances milestones; you get banners + approval toasts
when a human is needed.
The CLI is also installable directly — npm i -g @lilsaints/foreman, then foreman up
from inside any git repo: registers it as a project if needed, reuses (or starts) the
server, and opens that project's board (http://127.0.0.1:4640/#/p/<project>/board). Run
it in a second repo and it joins the same running server — up health-probes the port
(GET /api/health: service signature + build version) and refuses to reuse anything stale
or foreign instead of clashing. The pieces are still available separately — init
[repo-path], serve (--fake drives the UI without burning tokens), and mcp
print-config (architect MCP config for claude mcp add); init and mcp default their
repo to CLAUDE_PROJECT_DIR, else cwd.
Project identity is repo-anchored: a committed foreman.json carries bootstrap config
(name, verify commands, allowlist seeds, areas, profile defaults — seeds at first contact;
the DB owns config after that) and .foreman/project-id binds the clone to its project, so
a moved repo rebinds on the next up and a fresh clone gets an explicit
--adopt <id> | --create-new choice instead of a silent duplicate. Init never commits to
your branch silently — it shows what it wrote and asks (--yes / --no-commit to skip).
# the full golden-path demo (§16): scripted architect + fake workers + every rail
foreman demo # free + deterministic (~3 min);
# --real = actual claude workers (1-2h, real usage)Contributing / building from source
pnpm install && pnpm -r build && pnpm test # workspace dev loop
node apps/foreman/bin/foreman.js up # run the CLI from the workspace
node scripts/probe-hooks.mjs # hook-enforcement probe (real claude, 2 short runs)Releases: bump versions with node scripts/version-lockstep.mjs --set X.Y.Z, tag vX.Y.Z,
push — CI verifies lockstep, builds, tests, and publishes @lilsaints/foreman (the plugin
pin rides the same tag). Then confirm it landed: npm view @lilsaints/foreman version
dist-tags shows the new version as latest. To validate the pipeline without publishing,
run the release workflow via workflow_dispatch — publish is guarded to v* tags.
The model
| Concept | What it is |
|---|---|
| Charter | Cross-cutting invariants (architecture rules, module boundaries, conventions) injected into every worker and orchestrator prompt. |
| Issue (node) | Spec + acceptance criteria + priority + declared paths + optional per-node verify_extra. Executed on branch issue/<ref>-<slug>, squash-merged as #<ref>: <title>. |
| Blocks edge | Dependency (cycle-checked DAG). Issues schedule only when all blockers are done and their milestone is active. |
| Area | Subsystem taxonomy: UI color, default lesson scope, optional default agent profile, and the V2 disjointness predicate. |
| Milestone | Ordered phase with an auto or human gate and an optional milestone check command that must pass before the orchestrator replans and activates the next one. |
| Lessons | Curated institutional memory. Workers propose candidates in reports; the orchestrator curates (dedupe/generalize/scope/retire); Foreman injects matching lessons into worker prompts and mirrors them into CLAUDE.md via a managed block. Complements Claude Code's native auto memory (left enabled). |
| Agent profile | Per-role model/effort/permission settings. New projects seed both roles with the strongest catalog model (claude-opus-4-8 today, after Fable 5's withdrawal) at effort max. The catalog = pinned lineup merged with strictly-newer ids discovered in the installed CLI at serve startup (model dropdown in Settings; GET /api/models). |
Issue lifecycle
draft → ready → queued → running → verifying → needs_review → done (= squash merge)
│ │ ├→ changes_requested → queued
├→ failed ←─┘ └→ blocked
└→ cancelledRetries fork the failed session (--resume --fork-session) with the failing verify tail and
orchestrator notes appended; maxAttemptsPerNode caps total runs (rate-limit failures refund
their attempt and soft-pause the project). Under reviewPolicy: "gated", a green verify with
a clean report and an in-scope diff merges without an orchestrator run — the cost lever for
trivial chores.
Why workers can't edit the plan (Decision 11)
Workers observe, the orchestrator triages, the architect/human decides. All
mid-task discovery — missing prerequisites, obsolete nodes, wrong specs, decisions needed —
flows through the structured WorkerReport.plan_feedback; workers have no graph-mutation
tools at all (their MCP token only exposes get_node for their own issue plus the permission
tool).
The rationale: a worker sees one node's context; granting it graph writes invites self-justified scope changes, duplicate nodes, and (in V2) version churn. The tiny worker MCP scope also doubles as the prompt-injection blast radius: nothing a worker reads from the repo can rewrite the plan. End-of-run reporting loses nothing while V1 is sequential, because the orchestrator only acts between runs anyway; mid-run feedback over MCP is a V2 consideration once parallel lanes could pick up discovered prerequisites early.
The same boundary extends from the graph to config (#19): the settings tools
(upsert_profile, update_settings, set_paused) register for architect and local tokens
only — never run-scoped ones. A run raising its own cost caps or swapping itself to a pricier
model is the same self-justified-scope-change failure. And because the architect is itself a
model that reads worker reports, mutations that loosen a rail (cost caps raised/removed;
runTimeoutMinutes/approvalTimeoutSec/maxAttemptsPerNode flipped to unlimited;
requireSubscriptionAuth off, bypassPermissions, deny-pattern removals, allowlist additions,
approval-timeout deny→pause) park on the standard approval card and apply only on Allow;
tightening and neutral changes (model/effort swaps, verify edits, finite timeout/attempt edits)
apply immediately. Bounded numeric caps accept null = unlimited — a visible first-class
state (the Settings UI shows a per-field ∞ toggle), not an implicit unset or a sentinel (#54).
Repository layout
packages/core domain model, SQLite + migrations, services, state machine, DAG math,
plan patches, lessons, prompts, verify gate, GitOps, scheduler
packages/mcp MCP server: graph tools + permission_decision, scoped tokens, stdio + HTTP
packages/runner AgentRunner seam, ClaudeCliRunner, FakeRunner, per-run settings generation,
hook scripts (guard.js, stopcheck.js), approval broker, RunManager
packages/server Hono REST + WS, MCP mount at /mcp, static UI hosting
packages/web React + Vite + Tailwind dark UI (Board, Graph, Runs, Settings, Events)
apps/foreman CLI: up | init | serve | mcp | demopnpm test runs 188 tests; no test ever spawns the real CLI (FakeRunner replays scripted
stream-json, including permission and steering round-trips).
Real-CLI smoke checklist
The runner is wired against a verified CLI contract, not docs-from-memory. Re-run this
when bumping the pinned Claude Code version (runs set DISABLE_AUTOUPDATER=1; the version is
recorded on every run row):
- [x]
claude --version→ 2.1.170 (contract verified against this version) - [x]
--helplists:-p,--output-format stream-json,--input-format stream-json,--verbose,--model,--session-id,--resume,--fork-session,--permission-mode(incl.plan),--settings,--allowedTools,--disallowedTools,--append-system-prompt,--mcp-config,--strict-mcp-config,--json-schema,--bare,--effort low|medium|high|xhigh|max,--max-budget-usd - [x]
--permission-prompt-tooland--max-turnsare hidden but accepted (probed: a real run honored both) - [x] Result envelope carries
session_id,num_turns,total_cost_usd,usage.input_tokens/output_tokens,result - [x]
node scripts/probe-hooks.mjs— real haiku runs with Foreman-generated per-run settings, one leg per command tool (Bash + PowerShell on Windows): PreToolUse guard deniesgit pushwith the reason fed back to the model; Stop hook yields a parseable WorkerReport final message (~$0.06 for both legs) - [x] PowerShell tool guard parity (Windows): the headless CLI exposes a
PowerShelltool (confirmed on 2.1.170 — the model invoked it when asked); the guard matcher covers it, the shared deny patterns fire on it, and nativePowerShell(git push:*)-style deny mirrors are in the per-run settings - [x] Auto memory engages in
-pmode (verified 2026-06-12 on 2.1.170 withCLAUDE_CODE_DISABLE_AUTO_MEMORY=0): turn 1 wrotememory/MEMORY.md+ a topic file under the~/.claudeproject dir at Stop time, and a separate second-psession recalled the fact. Note: the memory write lands at session end — don't read the dir mid-run. - [x]
--model claude-fable-5 --effort max(the new-project default) accepted headless on the pinned CLI under subscription auth — re-probed 2026-06-12 for the readiness scenario (issue #11): a one-turn probe plus every demo/e2e run row (25+ real runs) carried claude-fable-5/max with usage tokens and CLI version 2.1.170 recorded - [x] Real readiness scenario green (issue #11, 2026-06-12):
foreman demo --realgolden path with real workers, plusnode scripts/e2e-real.mjs— a scripted 3-issue mini-project covering induced verify-failure → forked-session retry, a PowerShell-tool push denial ingested ashook_denial,needs_decision→ decision-mode orchestrator, per-nodeverify_override, and a milestone check withverify_timeout_s. Findings and fixes in DECISIONS.md (M-RT) - [ ]
--permission-mode plandoes not interfere with MCP tool calls for orchestrator runs (orchestrators currently runacceptEdits+ an edit-free guard; revisit if plan mode proves compatible) - [ ] Native sandbox settings shape (
sandbox: {enabled: true}) on macOS/Linux whensettings.sandbox=true(not applicable on Windows, where hooks + brokering are the posture)
Onboarding smoke — plugin channel gate (issue #17)
The whole plugin path on a machine that never saw the monorepo. Endjinn does not onboard
until this is green on a clean Windows profile/VM (it needs the plugin on main and the
package on npm, so it runs post-merge/post-publish):
- Install Claude Code →
/plugin marketplace add LilSaints/Foreman→/plugin install foreman@foreman(expect: marketplace + plugin install in well under 5 minutes; no clone, no pnpm). - Sandbox repo:
/foreman:init(manifest proposed from the repo, artifacts committed only on your OK) →/foreman:plana small milestone → resume → at least one real worker run completes through verify + squash merge (#<ref>: <title>on the integration branch); run rows carry the foreman + claude versions.
Every leg of that scenario has been verified on this machine from the installed package
(clean npm prefix + isolated CLAUDE_CONFIG_DIR, claude 2.1.170, Windows — 2026-06-12):
global install → foreman up with web UI and live WS; marketplace add + plugin install +
✔ Connected plugin MCP (the node launcher wraps cmd /c npx on Windows); /foreman:init
and /foreman:plan E2E on real sonnet sessions; foreman probe-hooks PASS both legs;
real worker run through verify + squash merge via the installed serve. The clean-VM pass
is the remaining checkbox.
Safety rails (V1)
Localhost binding · bearer + per-run scoped MCP tokens · guarded status transitions ·
spec-lock on running nodes · hook-enforced no-push/no-switch/write-confinement (covering both
command tools — Bash and the PowerShell tool on Windows) · permission brokering with timeout
policy · wall-clock run timeouts · attempt caps · per-run and per-day cost caps (plus the
CLI's own --max-budget-usd) · subscription-only auth (requireSubscriptionAuth, default
on: ANTHROPIC_API_KEY is stripped from runs — Foreman never silently bills the API; the UI
reports usage in tokens and marks $ as notional) · dirty-integration refusal + WIP
preservation · pinned CLI version recorded per run · transcript rotation caps · transcripts
never leave disk (the UI marks them local-only).
PowerShell-first repos (Windows)
Guard parity is on by default: the deny patterns apply to the PowerShell tool exactly as to
Bash, orchestrator runs additionally lose the write-cmdlet surface (Set-Content, Out-File,
file redirection), and new projects seed PowerShell(git …) allowlist mirrors. Build/test
commands are repo-specific, so before the first unattended batch either pre-seed them on
the worker profile (Settings → allowed tools, e.g. PowerShell(.\scripts\check.ps1:*)) or do
one supervised run and use Allow + add to allowlist — it derives narrow
PowerShell(<launcher> <subcommand>:*) patterns, never the whole tool.
Two matcher facts verified on 2.1.170 (issue #11 readiness probes): an allowlist pattern must
span the entire first command token — PowerShell(.\scripts\check.ps1:*) works,
PowerShell(.\scripts:*) never matches — and a powershell -File … wrapper inside the
PowerShell tool is a nested interpreter that the CLI always routes to the permission prompt
regardless of allowlist. So seed per-script patterns and have the charter mandate bare
first-token invocation (.\scripts\check.ps1 [args], no $x = … prefixes or statement
chains before the script); models otherwise vary the command shape and re-trigger approvals.
V2 seams (deliberately not built)
Parallel workers via worktrees with path-disjointness scheduling (the area taxonomy is the
first-cut predicate) and a merge queue; mid-run plan_feedback over MCP; multiple repos per
project; non-Claude executors behind the AgentRunner seam; PR-based flows; packaging the
architect side as a Claude Code plugin (bundled MCP config + a planning skill) as the
intended distribution channel. Known V2 items: auto memory fragments across worktrees (it is
keyed to the working directory), and Claude Code's native agent-teams features may replace
parts of the fan-out — evaluate as an alternative AgentRunner backend rather than fighting
them. See DECISIONS.md for every build-time decision.
