agenticloom

v0.3.0

Published

7 days ago

Meta-harness compiler — composes Claude Code and Copilot CLI into reproducible multi-step workflows with verification, recovery, and human-in-the-loop as first-class primitives.

Agentic Loom

You design the pattern. The loom weaves it.

Hand an agent a feature and you get code back. What you don't get is trust — so you babysit every step, or you skim a huge diff and hope. Loom lets you engineer a multi-agent orchestration flow instead: you design the process — which agents run, who reviews whom, what must pass, what happens on failure — and loom compiles it into a deterministic script you own. No token tax just to decide what runs next. No giant diff to take on faith — review is built in. You step in only where it matters.

New to Loom? Read the getting-started guide — a ~30-minute walkthrough that builds up a real multi-agent pipeline one primitive at a time. Want to run as you read? Use the starter pack.

Why Loom

Push past a one-shot task and you hit a wall. People try three ways around it — all three break:

Drive every step yourself — you're babysitting an agent whose work you still can't trust.
Put an agent in charge of the others — it drifts off your plan, bills you tokens just to route, and never runs the same way twice.
Hand-script the orchestration — you're left maintaining brittle glue only you can run.

Three walls, one root: the process — who does what, who reviews it, what happens on failure — is never engineered. You improvise it, an LLM improvises it, or you hard-code it and it rots.

Loom makes that process the artifact. You design it once in YAML, and loom compiles it to a deterministic script that drives your agents — clean, scoped context per step, independent review, human gates, and bounded retries, built in. Nothing to babysit, no LLM in the orchestrator's chair, no glue to maintain — one approach that works at all three walls. It's how loom ships its own features: every release is built by a loom pipeline, PRD to PR.

What you get

Every capability is either context engineering — what each agent sees and produces — or harness engineering — the deterministic process around them.

Verification, built in. Independent review is a primitive, not glue: review_loop runs a reviewer that sees only the artifact (never the writer's reasoning, so it can't rubber-stamp); parallel + aggregate split review across specialist lenses and combine their verdicts deterministically. Quality is engineered into the process.
Deterministic orchestration. Routing, gating, and retries compile to plain code — the same flow every run, and zero tokens spent deciding what runs next.
Doomed runs halt early. Bounded loops (max_iters, max_retries) keep nothing running away; run: gates the flow on a real command's exit code and bounces the work back with the actual output as evidence; on_fail / retry_from re-run only the broken segment; --resume-from picks up at the last good output. A run that's gone wrong stops before it burns more money — and work that already passed is never redone.
Humans at the gates you choose. human_gate pauses for a y/N, or drops you into an interactive agent to refine the artifact in place — at the checkpoints you placed, not every step.
The right model for each step. Pin a model per step — a cheap one for boilerplate, a frontier one where it counts — and run the whole pipeline on claude or copilot.
Observability. --save-logs captures each agent's full stdio for inspection, and a rolling window shows the live agent as it works.
A diagram for free. Every compile emits a Mermaid view of the pipeline; reviewers follow the shape in a PR without reading a line of TypeScript.
An artifact you own. The compiled TypeScript is readable, versioned, and portable — it runs anywhere agenticloom is installed, and keeps running if loom ever disappears.

Loom drives the Claude Code (claude) and GitHub Copilot (copilot) CLIs today — more to come.

Install

Prerequisites:

Node.js 20+
claude (Claude Code) and/or copilot (GitHub Copilot CLI) on your PATH, authenticated. Loom shells out to whichever CLI your pipeline declares and uses your existing subscription auth — it makes no API calls of its own, so there are no API keys to manage.

Install in a project:

npm install -D agenticloom
# then: npx loom run <pipeline-name> ...

Or globally:

npm install -g agenticloom
# then: loom run <pipeline-name> ...

The CLI is available as either loom or agenticloom — they're aliases. Most examples in this README use loom run ....

DSL

Eight primitives cover ~95% of pipelines:

| Primitive | What it does | | ------------- | --------------------------------------------------------------------------------- | | step | Run one agent | | run | Run a command (non-agent); gate the flow on its exit code | | review_loop | writer ↔ reviewer cycle with max_iters + JSON verdict | | human_gate | Plain y/N prompt, or spawn an interactive agent REPL | | parallel | Fan out via Promise.all | | branch | If/else on any JS expression | | aggregate | Deterministically combine labeled inputs into one verdict | | foreach | Iterate a runtime-produced JSONL list; per-iteration body in a sealed scratch dir |

Each primitive optionally takes a bind: <name> field (not a primitive itself — a field on most primitives) that saves the output for later steps. Reference a bound variable with $LOOM_<name> in input: fields, or as values in the labeled inputs: map. (As of 0.3.0 every reference carries the $LOOM_ prefix — a bare $x became $LOOM_x, while declarations like bind: stay bare; a stale bare reference now fails compile with a respell hint.)

Agents communicate via files, never via stdout-piped-into-prompts. produces: (and writer_produces: / reviewer_produces: on review_loop) names a workspace-relative path the agent must write. Downstream consumers receive that path as their input and read the file with their own Read tool. A $LOOM_<name> reference resolves to the producer's path.

review_loop writes a JSON verdict file (reviewer_produces:) with shape {status, findings: [{severity, summary, details_md}]}; Loom extracts verdict_field (typically status) and approves when it equals approve_when: (default pass). On fail, the writer is re-invoked with a revise prompt that points at both the previous draft's path and the reviewer's verdict file; the writer reads both and addresses blocker/major findings. aggregate uses the same JSON contract across labeled inputs and returns the overall verdict as an in-memory string (no derived consolidated document — downstream consumers read the per-input files directly).

Loose-pattern retry gates. Both step (via on_fail:) and aggregate (via top-level retry_from: / max_retries: / on_max_exceeded: / revise_with: fields) can act as a retry gate that re-runs an upstream zone. revise_with: is required when retry_from: is set — it tells Loom what feedback to surface to the retry target on revise. Two shapes:

inputs: [$LOOM_<bind>, ...] — Loom auto-builds a revise prompt naming the target's prior draft path plus each referenced file (each must be a file-bound bind). Use this for the canonical "reviewers wrote feedback files, surface them to the writer" shape.
prompt: <string> — Loom uses your prompt verbatim with no auto-scaffolding. Use when you want exact control over the revise message.
Both — your prompt is the leading message; Loom appends the feedback-file list after it.

The loose pattern (step writer + parallel reviewers + aggregate gate with retry_from) is a complete substitute for compound review_loop, with the bonus that per-reviewer binds are naturally accessible to downstream steps (parallel children hoist).

Commands as deterministic actors (run:). Every primitive above drives a stochastic agent. run: drives a command — Loom's first non-agent node and its first deterministic actor. It invokes a shell-string or argv command, streams the output into the same bounded window as a step, and gates the flow on the real exit code: 0 continues, non-zero halts by default or — with on_fail: — bounces a retry zone. A command is CLI-agnostic: it spawns no agent, so it runs identically whether the pipeline's cli: is claude or copilot.

The payoff is gating an agent loop on ground truth instead of an agent's self-report — the exit code is the verdict, the produced file is the evidence fed back on the bounce:

flow:
  - step: implementer
    produces: impl.md # the file the retry overwrites — required on a step retry_from target
    bind: impl
  - run: './run-tests.sh > "$LOOM_PRODUCES" 2>&1' # exit status = verdict; $LOOM_PRODUCES = the evidence path
    produces: test-out.txt
    bind: test_out
    on_fail:
      retry_from: impl # re-run the implementer on a real test failure
      revise_with: { inputs: [$LOOM_test_out] } # hand it the actual test output
      max_retries: 3

Loom never parses or rewrites the command's text: a file-bound reference like $LOOM_test_out arrives as a per-spawn environment variable (the shell expands it — quoting is yours and shellcheck-lintable), and the reserved $LOOM_PRODUCES is the command's own output path. See PRIMITIVES.md for both value shapes, the full $LOOM_ reference contract, and the default-halt / gate / retry-anchor semantics.

Authoring with the `loom-author` skill

The repo ships a skill at skills/loom-author/SKILL.md that knows the YAML grammar, primitive shapes, and common patterns. Same file works for both supported CLIs. Two ways to use it:

As a Claude Code or Copilot CLI skill — copy the skill directory from your agenticloom install to the CLI's skills directory. The skill must live at <skills-dir>/loom-author/SKILL.md for the CLI to discover it:

# Pick the destination for your CLI (change once, applies below)
SKILLS_DIR=~/.claude/skills    # or ~/.copilot/skills for Copilot CLI
mkdir -p "$SKILLS_DIR"

# Globally-installed agenticloom (npm install -g agenticloom):
cp -r "$(npm root -g)/agenticloom/skills/loom-author" "$SKILLS_DIR/"

# Project-local agenticloom (npm install -D agenticloom):
cp -r ./node_modules/agenticloom/skills/loom-author "$SKILLS_DIR/"

Once installed, your CLI will invoke the skill automatically on requests like "write me a Loom pipeline that…" or "add a reviewer to my pipeline".

As a reference — open the file directly. It's a compact field-level cheat sheet plus seven pattern recipes (single-reviewer convergence, multi-reviewer with retry, foreach, fork-rejoin branch, etc.) — enough to bootstrap a new pipeline in one read.

For the canonical field-by-field schema (every primitive, every cross-field rule, every error model), see PRIMITIVES.md — bundled with the package at $(npm root -g)/agenticloom/PRIMITIVES.md for global installs, or ./node_modules/agenticloom/PRIMITIVES.md for project-local installs.

The loom metaphor

A loom is the device that turns separate threads into woven fabric. The metaphor maps onto this tool almost line-for-line:

The warp = your agents. Warp threads run lengthwise on the loom, held under tension. That's your .claude/agents/<name>.md personas — defined once on disk, referenced by pipelines that point at them. The pipeline header tells loom which CLI and which directory to find them in. (A future workspace-level config can declare cross-pipeline defaults; today each pipeline declares its own.)
The weft = your pipeline. The thread that crosses through the warp, over and under, defining the pattern. Each YAML pipeline is a weft passing through your agents in a specific order. Same warp, different weft = different fabric.
The shed = the review loop. When weavers raise some warp threads and lower others to let the weft pass through, that gap is called the shed. review_loop opens a shed between writer and reviewer, passes work back and forth through it until the pattern is right (or max_iters hits), then closes.
The shuttle = the runtime. The shuttle physically carries the weft thread across the loom. That's src/runtime/ — runAgent (agent.ts), reviewLoop (review-loop.ts), humanGate (human-gate.ts). What actually moves work between agents.
The pattern card = the compiled TypeScript. Old Jacquard looms read punched cards encoding which warp threads to raise for each pass. The compiler emits that card — readable TypeScript you can debug, version, or hand-edit if the YAML can't express something.
The finished cloth = the Mermaid view. When you step back from a loom, you see the pattern as a whole. The Mermaid diagram emitted alongside the compiled .ts is that view: same source, just looked at from above.

The metaphor pays off in a few concrete ways:

It resists scope creep. "Does this feature belong on the loom?" is a real test. If the answer is "no, it's an opinion the supervisor agent would hold" — it doesn't belong. A loom follows the pattern card; it doesn't decide.
It gives natural names for future concepts. Reusable pipelines are patterns. A library of pipelines is a tapestry. A primitive type is a weave structure.

Example: AC → spec → multi-reviewer pipeline

A pipeline you might author at <your-repo>/loom/pipelines/multi-review.yaml:

pipeline: multi-review
cli: claude
default_extra_args: ['--model', 'sonnet']
default_timeout: 20
inputs: [ticket]

flow:
  # Stage 1: Acceptance criteria — single-reviewer review_loop.
  - review_loop:
      writer: ac-writer
      reviewer: ac-reviewer
      input: $LOOM_ticket
      max_iters: 2
      writer_produces: ACS.md
      reviewer_produces: ac-review.json
      verdict_field: status
      approve_when: pass
      bind: ac_final

  - human_gate:
      interactive: true
      agent: ac-writer
      input: $LOOM_ac_final
      prompt: |
        ACS.md has passed automated review. Iterate with the user
        — answer open questions, refine wording.

  # Stage 2: Technical spec — compound review_loop. The reviewer
  # subflow runs three reviewers in parallel and aggregates their
  # verdicts. On fail, spec-writer is re-invoked with all three
  # reviewer paths in its revise prompt.
  - review_loop:
      writer: spec-writer
      input: $LOOM_ac_final
      max_iters: 1
      writer_produces: SPEC.md
      approve_when: pass
      bind: spec
      reviewer:
        - parallel:
            - step: security-reviewer
              input: $LOOM_spec
              produces: security-review.json
              bind: sec
            - step: api-design-reviewer
              input: $LOOM_spec
              produces: api-review.json
              bind: api
            - step: edge-case-reviewer
              input: $LOOM_spec
              produces: edge-review.json
              bind: edge
        - aggregate:
            inputs:
              security: $LOOM_sec
              api: $LOOM_api
              edge: $LOOM_edge
            verdict_field: status
            approve_when: pass
            require: all_approved
            bind: spec_verdict

  - human_gate:
      interactive: true
      agent: spec-writer
      input: $LOOM_spec
      prompt: |
        SPEC.md has passed multi-reviewer review. Iterate with the
        user on it before implementation.

Pipeline configuration + agent discovery

The pipeline YAML header declares the cli + optional default CLI flags. Persona files (e.g. .claude/agents/<name>.md) hold the agent's system prompt + frontmatter:

pipeline: multi-review
cli: claude # required: 'claude' or 'copilot'
default_extra_args: ['--model', 'sonnet'] # optional; applied to every agent
default_timeout: 20 # optional; minutes; default kill-after for every spawn
inputs: [ticket]
flow:
  - step: ac-writer # resolves to .claude/agents/ac-writer.md
    extra_args: ['--model', 'haiku'] # optional; replaces default for this step
    input: $LOOM_ticket
    produces: ACS.md

Agent references: persona name or inline agent. Anywhere an agent is named — step:, review_loop.writer, review_loop.reviewer — the value is either a persona name or an inline agent:

A persona name (string) — Loom delegates to the CLI's native --agent <name>, so the CLI loads the agent file and enforces its tools:; Loom no longer inlines the persona body. The persona file must be discoverable via the layered lookup below — missing files and files the CLI would refuse to register are caught at compile time. A bare-cli agent with no body still works as a frontmatter-only file: claude requires frontmatter name: matching the reference plus a non-empty description: (claude refuses to register an agent without a description); copilot requires a string description:, with name: optional (it defaults to the filename stem) and a mismatched name: still loading by filename stem — live-verified on copilot v1.0.61. On claude the persona's tools: bind even under --dangerously-skip-permissions (real least privilege). On copilot the same --agent delegation applies, though copilot's CLI-side enforcement of agent tools: is version-sensitive and not yet in effect as of copilot v1.0.60 — Loom delegates and adds no workaround. The agent-file leaf is cli-aware: .claude/agents/<name>.md for claude, .github/agents/<name>.agent.md for copilot.
An inline agent ({ prompt, name }) — a one-off general agent with no persona file, spawned with all tools. prompt: is the task (required; static text — no $LOOM_ interpolation, so per-invocation data flows via input: / inputs: unchanged). name is required — the agent's identity in logs, window titles, error messages, and mermaid nodes.

A general human_gate agent is expressed by omitting agent: on an interactive gate — the gate's already-required prompt: is the task, run with all tools. See PRIMITIVES.md for the full agent-reference grammar.

Layered agent discovery. The agent's persona directory is driven by the pipeline's cli: field. For each agent name, Loom probes the project layer first, then the user-global layer; project wins on collision:

| Pipeline cli: | Project layer | Global layer | | --------------- | -------------------------------------- | ----------------------------------- | | claude | <cwd>/.claude/agents/<name>.md | ~/.claude/agents/<name>.md | | copilot | <cwd>/.github/agents/<name>.agent.md | ~/.copilot/agents/<name>.agent.md |

(claude itself resolves --agent by walking up from the invocation cwd to the enclosing git root, root-most winning — so prefer running loom from the repo root, where claude's view matches the two layers Loom validates; see PRIMITIVES.md "Agent references" for the full caveat.)

Pipelines are discovered the same way regardless of cli:: loom run <name> looks in <cwd>/loom/pipelines/<name>.yaml first, then ~/.loom/pipelines/<name>.yaml.

Run

Reference pipelines by name — Loom resolves <name> to loom/pipelines/<name>.yaml in the current directory:

# Compile to a JS module you can read/edit/version
loom compile multi-review run.mjs

# Or compile + run in one step
loom run multi-review "JIRA-1234: Add 2FA"

For a stable, predictable workspace location, pass --id:

loom run multi-review ./tickets/RATE-1.md --id RATE-1
# → loom/runs/RATE-1/

When no --id is given, Loom infers one from the first existing-file argument or falls back to <pipeline>-<timestamp>. See "Workspace + outputs" for the resolution rules.

Pass a path (anything ending in .yaml/.yml or containing /) to bypass name resolution — useful when your pipelines live somewhere other than loom/pipelines/:

loom run ./workflows/ship.yaml "JIRA-1234"

The emitted module imports from the agenticloom/runtime package, so it's portable — version it, hand-edit it, or run it on another machine that has agenticloom installed.

Mermaid diagram

Every loom compile writes two files side-by-side: the TypeScript script and a <output>.mermaid diagram of the pipeline shape. Containers (review_loop, parallel, branch, foreach) render as labeled subgraphs; bind: names appear as edge labels.

Here's the diagram loom compile emits for the multi-reviewer example pipeline above:

flowchart TD
    n1[/"ticket"/]
    subgraph n2["review_loop (max_iters: 2, approve_when: pass)"]
        n3(["ac-writer"])
        n4(["ac-reviewer"])
        n3 -->|"writer_produces"| n4
        n4 -.->|"on fail"| n3
    end
    n5{{"human_gate (interactive): ac-writer"}}
    n3 --> n5
    subgraph n6["review_loop (max_iters: 1, approve_when: pass)"]
        n7(["spec-writer"])
        subgraph n8["parallel"]
            n9(["security-reviewer"])
            n10(["api-design-reviewer"])
            n11(["edge-case-reviewer"])
        end
        n12[/"aggregate: security, api, edge"\]
        n9 --> n12
        n10 --> n12
        n11 --> n12
        n7 -->|"writer_produces"| n9
        n7 -->|"writer_produces"| n10
        n7 -->|"writer_produces"| n11
        n12 -.->|"on fail"| n7
    end
    n5 --> n7
    n13{{"human_gate (interactive): spec-writer"}}
    n7 --> n13
    n1 --> n3

GitHub renders Mermaid inline in markdown files and pull requests, so committing the .mermaid next to the .ts gives PR reviewers a visual of the pipeline they don't have to read TypeScript to follow. It's the "finished cloth" view of the Loom metaphor — same source, seen from above.

Workspace + outputs

loom run creates a workspace directory at loom/runs/<id>/ under your current directory and runs the compiled pipeline from there. Agent-produced files (produces: paths, --save-logs output at logs/<agent>.log) land in the workspace; your invocation cwd stays clean. Pipelines themselves live alongside their outputs under the same loom/ parent (loom/pipelines/ for sources, loom/runs/ for runs).

Pass --save-logs to capture each agent's full stdio under loom/runs/<id>/logs/<agent>.log — useful for debugging when an agent's output isn't what you expected.

The workspace <id> is resolved in this order (first match wins):

--id <name> — explicit flag, always wins. loom run multi-review ./smoke_test/ticket-bug.md --id RATE-1 → loom/runs/RATE-1/
Filename basename of the first existing-file arg. loom run multi-review ./tickets/RATE-1.md → loom/runs/RATE-1/ (no flag needed when the ticket file is named after the ticket ID)
<pipeline>-<timestamp> fallback. Safety net so Loom never silently writes into your project root. loom run multi-review some-literal-string → loom/runs/multi-review-1700000000000/

The compiled pipeline's temp .mjs lives in os.tmpdir()/loom-*/ (not your project root) and is removed on exit. A best-effort startup sweep removes orphan loom-* temp dirs older than 24h — mainly a Windows safety net; macOS and Linux periodic cleaners handle this anyway.

Rerunning against the same <id> reuses the existing workspace directory; agents overwrite prior outputs. Rename the dir (mv loom/runs/RATE-1 loom/runs/RATE-1.attempt-1) to stash a prior run.

Resume from a named bind with --resume-from <bind> (requires --id <name> and an existing workspace). Items before the cursor are skipped — their bind values become path string literals to the prior run's produces: files. Items from the cursor onward run normally. The cursor must name a top-level bind; nested binds (inside parallel, branch, or review_loop subflows) are rejected at the CLI. Missing input files fail loud at the first consumer (no upfront validation walk).

loom run multi-review ./tickets/RATE-1.md --id RATE-1 --resume-from spec
# pre-cursor binds (e.g. $LOOM_ac_final) resolve to existing files;
# the `spec` step and everything after run normally.

Project layout

your-project/
├── loom/
│   ├── pipelines/      # YAML pipeline definitions
│   │   └── my-pipeline.yaml
│   └── runs/           # gitignored — workspace dirs created by `loom run`
└── .claude/agents/     # project-layer persona files; per-cli convention
    └── writer.md       # see "Pipeline configuration + agent discovery" above

What a compiled pipeline looks like

The compiler emits readable TS — full output is longer, this shows the structural shape:

// AUTO-GENERATED from loom/pipelines/multi-review.yaml
import { runAgent, reviewLoop, humanGate, parallel, aggregate } from 'agenticloom/runtime';

// Pipeline-header config baked in as module-level constants and threaded
// through every runtime call.
const CLI = 'claude';
const AGENT_DIRS = ['.claude/agents/', '~/.claude/agents/'];
const DEFAULT_EXTRA_ARGS = ['--model', 'sonnet'];

async function main(ticket) {
  const ac_final = await reviewLoop({
    kind: 'single',
    cli: CLI,
    agentDirs: AGENT_DIRS,
    defaultExtraArgs: DEFAULT_EXTRA_ARGS,
    writer: 'ac-writer',
    reviewer: 'ac-reviewer',
    input: ticket,
    maxIters: 2,
    writerProduces: 'ACS.md',
    reviewerProduces: 'ac-review.json',
    verdictField: 'status',
    approveWhen: 'pass',
  });
  await humanGate({
    interactive: true,
    agent: 'ac-writer',
    cli: CLI,
    agentDirs: AGENT_DIRS,
    extraArgs: DEFAULT_EXTRA_ARGS,
    input: ac_final,
    prompt: 'ACS.md has passed automated review. Iterate with the user...',
  });
  const spec = await reviewLoop({
    kind: 'compound',
    cli: CLI,
    agentDirs: AGENT_DIRS,
    defaultExtraArgs: DEFAULT_EXTRA_ARGS,
    writer: 'spec-writer',
    reviewerSubflow: async (spec) => {
      // reviewerSubflow: parallel reviewers + aggregate (see PRIMITIVES.md for the compound review_loop shape)
    },
    input: ac_final,
    maxIters: 1,
    writerProduces: 'SPEC.md',
    approveWhen: 'pass',
  });
  // ... interactive spec gate follows the same humanGate shape as above
}

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme