physarum

v1.1.1

Published

21 days ago

MCP server that reads a git repo as a pheromone field for AI coding agents

0High
0Medium
0Low

premaanshvyas

mcp ai-agent claude-code pheromone stigmergy codebase-analysis signal-field pruning competitive-evaluation

physarum

Everyone is building a smarter ant. Nobody is building better ground.

The experiment that started this

In 2010, researchers placed Physarum polycephalum — a single-celled organism with no brain, no neurons, no central controller — on a map of Japan with oat flakes at the locations of major cities around Tokyo.

Twenty-six hours later, it had built the Tokyo rail network.

Not approximately. A network nearly identical to the one that took human engineers decades to design — fault-tolerant, flow-optimized, redundancy-balanced. Produced by an organism that follows exactly one rule: grow toward food, reinforce what worked, starve what didn't.

The intelligence wasn't in the organism. It was in the ground.

The problem with AI coding agents

| Tool | What it does | |---|---| | Caveman | Compresses agent output — fewer tokens | | Superpowers | Structures agent decision-making — better skills | | ECC | Adds memory and 246 skills to the agent | | context-optimizer | Controls what enters the context window | | Every new tool shipped this week | Makes the individual agent smarter |

Every single one of these invests in the ant.

The slime mold has no instructions. No plan file. No system prompt. The Tokyo rail network didn't emerge from a smarter slime mold — it emerged from a simpler one, following a simpler rule, with the environment doing all the memory work.

Ant colonies have no general giving orders. Individual ants are nearly blind stimulus-response machines. The intelligence isn't in any ant — it's in the pheromone on the ground. You can replace every single ant and the colony continues optimizing, because the substrate carries the state.

Current AI coding agents write to the codebase. No agent reads the codebase as communication from other agents.

That's the missing layer. Physarum builds the ground.

The codebase is already a pheromone field

Every signal is already in your repo. Nobody reads it that way.

| Signal already in your codebase | What it actually means | |---|---| | Failing test | Negative pheromone. Don't reinforce this path. | | High test coverage | Strong positive trail. This ground is trusted. | | TypeScript type error | Wall in the maze. Don't go here. | | Same file touched 40 times in 90 days | Contested territory. Approach with awareness. | | File untouched for 18 months | Dying tube. Low flow, low value. | | TODO / FIXME comment | Weak trail marker. Path exists but isn't trusted yet. | | High cyclomatic complexity + zero coverage | Metabolically expensive path. Candidate for pruning. | | Heavily imported module | High-flow tube. The colony depends on this. Don't prune it. |

Physarum reads all of this, computes a structured pheromone field, and exposes it as MCP tools any AI agent can call before it touches anything.

Does it actually work? (a real test on a real app)

The hardest question for an idea like this: when it judges a real codebase, is it right?

We pointed physarum at satlas — a shipped, 131-file production app — and let it read the field cold, with no human input. Two of its calls:

| File | What physarum saw | Its verdict | |---|---|---| | globe/Globe.ts | 12 modules import it; high flow (6.35) | Protected — load-bearing, don't prune | | api/chat.ts | nothing imports it; complex; churns constantly | Flagged — expensive, isolated, clean-up candidate |

Without the flow signal, the field had Globe.ts ranked as a pruning candidate — it would have flagged the single most-imported file in the entire app for cleanup. Flow corrected that false positive: it recognized the artery and protected it, while leaving the genuinely isolated chat.ts flagged.

Then we asked a different AI agent — one with full knowledge of the satlas codebase, and none of how physarum works — whether the verdict was right. Its answer:

"The flow signal protected every genuine artery and left the expensive orphan correctly flagged. Not a case where the algorithm got lucky — the specific architectural decisions (a serverless-isolation rule making chat.ts a true static orphan, a refactor consolidating search logic giving another file high flow) are exactly what produced the signal. The ground is reading the codebase correctly."

The slime mold found the Tokyo rail map on its own. Physarum found the load-bearing structure of a real app on its own — and an expert agreed.

The gap this exposed — and how it closed: physarum reads structure — who imports what. chat.ts is actually the busiest file at runtime (every user request hits it), but it has zero static importers by design, so static flow saw it as isolated. The runtime_hits signal closes this — and it's now proven, automatic, and zero-code (see The ground breathes, below): on a real app the signal went from dark (−1) to live, then learned to update itself under traffic with no manual step, then to read call frequency straight from standard OpenTelemetry with no tracking code at all. Full writeup: docs/experiments/2026-05-29-flow-rerank-satlas.md.

Three principles

I. The Signal Field

Before an agent writes a single line, it reads the ground.

── FIELD SUMMARY ──────────────────────────────────────
  Total tracked files  : 46
  Avg trust score      : 0.951
  Avg danger score     : 0.050
  Avg metabolic cost   : 0.236
  Hot files (churn>0.7): 1
  Dead/unused files    : 0
  Total type errors    : 0

── TOP DANGER FILES ────────────────────────────────────
  path                               danger  complexity  churn
  src/app/page.tsx                   0.460   1.000       0.800
  src/app/api/translate/route.ts     0.325   0.750       0.500
  src/lib/db.ts                      0.160   0.400       0.200

── MOST TRUSTED FILES (safe ground) ────────────────────
  path                               trust   coverage
  src/lib/email.ts                   0.983   —
  src/app/api/auth/status/route.ts   0.950   —
  src/lib/auth.ts                    0.908   —

Six field reading tools. Call them before you touch anything.

| Tool | Returns | When | |---|---|---| | field_summary | Repo-level health: avg scores, hot/dead counts | First call of every session | | field_read(path?) | FileSignal[] sorted by danger | Before touching any file | | field_hot(limit?) | Top N by churn | Entering contested territory | | field_danger(limit?) | Top N by danger score | Before modifying risky files | | field_trust(limit?) | Top N by trust score | Finding safe ground to build on | | field_dead(limit?) | Top N by dead score | Looking for cleanup opportunities | | field_flow(limit?) | Top N by flow (thick tubes) | Finding load-bearing code |

II. The Pruning Mechanism

The slime mold's real genius isn't the growth phase. It's the pruning phase.

After filling the entire space (Phase 1 — exploration, metabolically expensive), it identifies which paths carry low flow relative to their cost. It starves them. What remains is the optimal network — not by design, but because the pruning criterion is exactly right: metabolic cost versus value delivered.

Vibe coding is Phase 1. Fast, indiscriminate, generative. That's correct — that's what Phase 1 should look like. The industry is panicking about vibe-coded codebases, but that misunderstands the biology.

The problem isn't Phase 1. The problem is there is no Phase 2.

# which files are costing more than they're delivering?
prune_candidates()

[
  {
    "path": "src/app/api/translate/route.ts",
    "metabolic_cost": 0.612,
    "complexity": 0.750,
    "churn": 0.500,
    "import_refs": 0,
    "reason": "High complexity + high churn + zero dependents. Expensive to maintain, nothing depends on it."
  },
  {
    "path": "src/lib/db.ts",
    "metabolic_cost": 0.346,
    "complexity": 0.400,
    "import_refs": 5,
    "reason": "Complex but 5 modules depend on it — value signal discounts cost by 50%."
  }
]

The metabolic cost formula is multiplicative: files that the rest of the codebase depends on get a value discount. A core module imported by 20 files has its cost halved — because starving it would collapse the colony. A file nobody imports gets no discount. This is the slime mold. This is Phase 2.

| Tool | Returns | |---|---| | prune_candidates(limit?) | Top N files by metabolic cost, with reasoning | | prune_report() | Full analysis: estimated complexity reduction, coverage gap | | metabolic_cost(path) | Cost breakdown for a specific file |

III. The Competitive Evaluator — the waggle dance

When honeybees need to choose a new nest site, no queen decides. No orchestrator picks an approach. Scout bees go out independently, assess candidate sites, and return to perform a waggle dance — duration and vigor encode confidence. Other scouts evaluate, visit, counter-dance. The colony commits when a quorum of scouts converge on the same site simultaneously.

The colony almost never commits to the wrong choice, because weak options are eliminated by competitive evaluation before resources are committed.

Current multi-agent systems use an orchestrator: one agent picks an approach, distributes subtasks, everyone executes. Fast, but brittle — if the orchestrator's initial read is wrong, the whole team goes the wrong direction.

Physarum implements the waggle dance. Multiple approaches compete through confidence signaling. The field resolves. No orchestrator.

── COMPETE RESULT ─────────────────────────────────────────
  competition : rate-limiting-strategy
  resolved    : ✓

  A-token-bucket   votes: 0.82   ████████░░
  B-sqlite-backed  votes: 0.30   ███░░░░░░░

  winner  : A-token-bucket
  margin  : 0.52
  reason  : "O(1) per request, zero new deps, continuous refill
             at 1.67 tokens/sec. SQLite approach adds write
             contention on every request — wrong tradeoff here."

This result is from a real session. An agent evaluated two approaches to rate limiting, deposited confidence votes, and the field resolved without any human or orchestrator making the call.

| Tool | What it does | |---|---| | compete_start(id, question, approaches[]) | Open a competition | | compete_vote(id, approach, confidence, reason) | Deposit a self-reported confidence signal | | compete_evidence(id, approach, paths[], tests) | Deposit a grounded vote — confidence computed by the field, not the agent | | compete_status(id) | Read current vote state | | compete_resolve(id) | Resolve when quorum (≥0.6) is reached |

The waggle dance, earned not invented

A bee's waggle intensity is proportional to the food quality it actually measured. So is a grounded vote. compete_evidence doesn't accept a confidence number — it takes the files an approach touched and its test results, then computes confidence from the field's own measurements of that code:

grounded_confidence =
    test_pass_ratio      × 0.40    // does it actually work?
  + (1 − avg_danger)     × 0.25    // did it leave low-danger code?
  + (1 − avg_metabolic)  × 0.20    // cheap to maintain?
  + avg_trust            × 0.15    // built on trusted ground?

The danger / metabolic / trust terms come from the server's analyzers on the real repo. An agent cannot inflate them. The colony cannot lie, because the ground keeps score.

The ground has decided a real question

On 2026-05-29, three coding agents on three different models (Opus, Sonnet, Haiku) — each in its own isolated worktree, each shown only its own approach, none allowed to report a confidence number — independently implemented competing solutions to a real architectural decision in a sandbox API: how should note full-text search be built? (SQLite FTS5 vs optimized LIKE vs in-memory inverted index).

No orchestrator picked the winner. The codebase did. Physarum's analyzers measured the health of the code each agent produced, ran their tests, and computed a grounded confidence for each. compete_resolve returned the verdict.

── EMERGENCE EXPERIMENT — note search ──────────────────────
  fts5    (opus)    grounded: 0.9283   ██████████   tests 9/9
  like    (sonnet)  grounded: 0.9283   ██████████   tests 9/9
  memory  (haiku)   grounded: 0.8485   ████████░░   tests 10/10

  field rejected the in-memory index (complexity 0.525 vs 0.275)
  — consistent with what a senior engineer would defend.

The field correctly ranked the in-memory index last, and held the two SQL approaches in a tie — a verdict consistent with expert judgment, reached with no human and no central planner in the decision loop. It also surfaced an honest limitation (with no coverage data, cyclomatic complexity was the only discriminator) that names the next research step.

→ Full writeup, signals, and analysis: docs/experiments/2026-05-29-emergence-search.md

Not a smarter ant — a better ground that keeps score so the colony cannot lie.

The colony: division of labor with no orchestrator

Reading the ground is one agent smelling the trail. The real thesis is a colony — many agents, no boss, coordinating only through the environment. So we ran it: three agents on three models (Opus, Sonnet, Haiku), each given an identical, taskless prompt — no assignments, no plan. The only thing each could do was read the field, claim the highest-need unclaimed file (an atomic pheromone mark), improve it, and deposit the result.

── COLONY RUN — 2 rounds × 3 scouts, no orchestrator ───────
  round 1   opus   → services/notes.ts      (#1 need)   claim OK
            sonnet → services/notes.ts  TAKEN → routes/notes.ts (#2)
            haiku  → middleware/validate.ts (#3)
  round 2   opus → index.ts   sonnet → db/client.ts   haiku → mycelium.ts

  collisions     : 0        — no two agents ever held one file
  field-directed : yes      — every claim was a top field need
  orchestrator   : none     — byte-identical taskless prompts
  avg danger     : 0.0323 → 0.0058   (−82%)

Sonnet wanted the #1 file; the ground told it TAKEN; it moved to #2 — separation, with no message ever passing between agents. Labor self-partitioned across six files with zero collisions because the environment carried the coordination state, exactly as it does for ants. The 2026 research wall — "scaling team size is net negative due to coordination overhead" — is the ant's solved problem: push coordination into the ground.

→ Full writeup: docs/experiments/2026-05-30-colony-demo-notes.md

The ground breathes: runtime flow, automatic and zero-code

Static structure can't see that chat.ts is the busiest file at runtime. So the ground learned to read runtime — and then to do it on its own, in three steps:

It lit up. runtime_hits was a dark signal (−1 everywhere) until a real app emitted call counts. routes/notes.ts went −1 → 11,001, and its metabolic cost dropped as the field recognized the load-bearing front door it had been blind to.
It became automatic. No manual step: the server flushes counts on a timer; physarum watches and refreshes. Under live traffic, runtime_hits climbed −1 → 26 → 90 → 150 with nobody running a single command. The slime mold's flow redistributes on its own.
It became zero-code. We deleted the hand-written emitter entirely. The app now emits standard OpenTelemetry; physarum ingests the spans and maps them to source files. Same climb — −1 → 150 under traffic — with zero tracking code in the app. Any OTel-instrumented service feeds the ground for free.

→ Writeups: runtime lit up · self-updating · OpenTelemetry

Quick start

# 1. initialize in your repo
npx physarum init

# 2. register with Claude Code (user-wide, works in any project)
claude mcp add --scope user my-project -- npx physarum start /path/to/your/repo

# 3. install the skill so agents know the session protocol
npx physarum install

That's it. Restart Claude Code. The ground exists.

The session protocol

Every agent session should follow this pattern. The physarum skill (installed via npx physarum install) wires this in automatically.

SESSION START
─────────────
session_start(task, agent_id)      → get session_id
sessions_recent(3)                 → read last 3 sessions in full
field_summary()                    → current repo health
decisions_recent(20)               → what past agents reasoned

← Agent now has: what was built, what was decided, what was deferred, what to avoid.

DURING SESSION
──────────────
field_read(path)                   → before touching any file
sessions_on_path(path)             → before modifying a file someone else touched
session_mark(path, 'working')      → when starting on a file
session_decide(decision, reason)   → for EVERY significant choice
session_mark(path, 'completed')    → when done with a file

SESSION END
───────────
session_end(outcome, next_session_notes)

next_session_notes is the pheromone trail for the next agent. What was built. What's fragile. Where to start. What must not be changed. Write it like you're leaving breadcrumbs for someone who just walked into the room cold — because you are.

All MCP tools

| Tool | Input | Returns | |---|---|---| | field_summary | — | Repo-level rollup | | field_read | path? | FileSignal[] by danger | | field_hot | limit? | Top N by churn | | field_dead | limit? | Top N by dead score | | field_danger | limit? | Top N by danger score | | field_trust | limit? | Top N by trust score | | field_flow | limit? | Top N by flow (thick tubes) |

| Tool | Input | Returns | |---|---|---| | session_start | task, agent_id | session_id | | session_mark | session_id, path, type, reason?, confidence? | ok | | session_decide | session_id, decision, reason, alternatives?, confidence? | ok | | session_end | session_id, outcome, next_session_notes | ok |

| Tool | Input | Returns | |---|---|---| | sessions_recent | limit? | Last N full session logs | | sessions_on_path | path | All sessions that touched this file | | decisions_recent | limit? | Last N decisions across all sessions |

| Tool | Input | Returns | |---|---|---| | prune_candidates | limit? | Top N files by metabolic cost | | prune_report | — | Full repo pruning analysis | | metabolic_cost | path | Cost breakdown for one file |

| Tool | Input | Returns | |---|---|---| | compete_start | id, question, approaches[] | ok | | compete_vote | id, approach, confidence, reason | ok | | compete_evidence | id, approach, agent_id, paths[], tests_passed, tests_total | grounded_confidence computed from the field | | compete_status | id | Current vote state | | compete_resolve | id | Winner (avg confidence), margin, quorum state, full vote log |

The signals

Each tracked file gets a FileSignal:

interface FileSignal {
  path: string;
  signals: {
    churn: number;              // 0–1: commit frequency last 90d
    coverage: number;           // 0–1: line coverage (-1 = no data)
    complexity: number;         // 0–1: cyclomatic complexity, normalized
    last_touched_days: number;  // days since last commit
    type_errors: number;        // tsc error count
    todos: number;              // TODO / FIXME / HACK count
    dead_score: number;         // 0–1: knip confidence this is unused
    import_refs: number;        // raw count of files that import this one
    flow: number;               // churn-weighted import in-degree = Σ churn(importer); flow ≤ import_refs
    runtime_hits: number;       // observed call count — hand-emitter or OpenTelemetry (-1 = no data)

    trust_score: number;        // high coverage + no errors + low complexity + stable
    danger_score: number;       // type errors + high complexity + high churn
    metabolic_cost: number;     // cost-to-maintain vs value-delivered, discounted by import_refs + flow + runtime
  };
  agent_marks: AgentMark[];     // pheromone deposits from every session
  updated_at: number;
}

The field updates automatically — a chokidar watcher detects file changes and refreshes signals within 2 seconds. It also watches runtime data (emitter flush or OpenTelemetry spans) and folds live call counts into the field with no manual step. The ground breathes.

Compatible with

| Agent | How | |---|---| | Claude Code | claude mcp add + physarum install | | Codex CLI | physarum install writes skill to ~/.codex/skills/ | | Antigravity | physarum install writes skill to ~/.antigravity/skills/ |

Any MCP-compatible agent can connect. The protocol is agent-agnostic by design — the ground doesn't care which ant is reading it.

Why "physarum"

Physarum polycephalum is the slime mold that built the Tokyo rail network. It is the organism that proved optimal global behavior can emerge from a single local rule applied by a brainless agent reading a shared environment.

The tool is named after it because that's what it tries to be: not a smarter agent, but a better ground.

Built with

Node.js · TypeScript · SQLite · @modelcontextprotocol/sdk · simple-git · ts-morph · chokidar · knip

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

physarum

The experiment that started this

The problem with AI coding agents

The codebase is already a pheromone field

Does it actually work? (a real test on a real app)

Three principles

I. The Signal Field

II. The Pruning Mechanism

III. The Competitive Evaluator — the waggle dance

The waggle dance, earned not invented

The ground has decided a real question

The colony: division of labor with no orchestrator

The ground breathes: runtime flow, automatic and zero-code

Quick start

The session protocol

All MCP tools

The signals

Compatible with

Why "physarum"

Built with