@mwxb/graveyard

v0.3.3

Published

a month ago

Your projects folder, audited overnight. Kill, revive, refactor, ship, or extract — with receipts. Ships both an autonomous CLI and an MCP server for any AI client.

0High
0Medium
0Low

mwxb

repo-triage code-audit claude mcp model-context-protocol anthropic

Graveyard

A typed verdict for every git repository under a directory — SHIP, REVIVE, REFACTOR, EXTRACT, KILL, INSUFFICIENT_DATA — backed by file:line and commit-sha evidence.

Rendered morning-brief.md showing the portfolio summary table and a sample SHIP verdict card with rationale and evidence.

Output from the canonical 3-repo example — full text in docs/sample-output.md.

Built for / Not for

Built for — solo indie devs auditing a directory of stalled side projects. People who want a fast, reproducible, evidence-linked decision per repo before they'll trust it enough to show a friend or an employer.

Not for — team workflows with multiple contributors, repos with mature CI/CD already in place, or deep code reviews. Graveyard triages upstream of human review; it doesn't replace it.

Batch, reproducible, cost-bounded

Batch — point Graveyard at a directory of 5–30 repositories, launch, and walk away. The audit completes unattended and lands a finished morning-brief.md on disk; you don't babysit a chat. Useful when you'd rather sleep through the run than supervise two hours of token streaming.

Reproducible — same repo state in, same evidence packet out. The deterministic preprocessing (commit-shas, file:line locations, test counts) gives the LLM agents identical context on every run. Verdicts can drift between runs because LLMs aren't perfectly deterministic — but the evidence underpinning each verdict is reproducible to the byte, so you can re-audit in three months and tell whether it's the project that changed or the model that wobbled.

Cost-bounded — the per-repo cost is known before you launch. With default Sonnet+Opus mix, expect around $1–$2 per repo audited, billed against your own Anthropic API key with the total reported in each morning brief. No subscription, no monthly minimum, no surprise overage.

Graveyard isn't a replacement for Claude Code's interactive use — it's the tool when you need the audit run in batch, reproduced months later, and budgeted in advance.

What the verdicts mean

| Verdict | Meaning | |---|---| | SHIP | Ready or near-ready for release | | REVIVE | Was alive, now dormant — worth restarting | | REFACTOR | Working but architecturally bleeding | | EXTRACT | A reusable piece is hiding inside | | KILL | Archive and move on | | INSUFFICIENT_DATA | Not enough signal for a confident call |

Quickstart

Graveyard ships as two bins from one package — graveyard for direct CLI runs, graveyard-mcp for use as a Claude Code or Claude Desktop tool. The CLI section below shows the canonical batch workflow; for MCP setup, see MCP integration.

# Run once with no install (CLI form, requires Anthropic API key)
export GRAVEYARD_API_KEY="sk-ant-..."     # or ANTHROPIC_API_KEY
npx @mwxb/graveyard scan ~/projects

# Or install globally
npm install -g @mwxb/graveyard
graveyard scan ~/projects

Requires Node.js 20+. Results land in ~/.graveyard/runs/<iso-timestamp>/.

Expect ~$1–$2 per repo with default Sonnet+Opus, billed against your Anthropic key. A 30-repo run takes ~5 minutes — fine to launch before bed.

Example workflow

Scan three repos. Two are alive, one was abandoned in 2024.

$ graveyard scan ~/projects/sandbox

⚰️  Graveyard v0.3.2
Scanning: /Users/me/projects/sandbox

▶ Discovering repositories...
  Found 3 repositories

▶ Profiling repositories...
  Profiled 3 repositories

▶ Extracting evidence...
  arbitra...
  designscout...
  prototype-v1...
  Evidence extracted for 3 repositories

▶ Preflight API check...
  API connection OK

▶ Running agent analysis...
  arbitra...
    🚢 SHIP (high) — 47s
  designscout...
    🔁 REVIVE (medium) — 53s
  prototype-v1...
    ⚰️  KILL (high) — 41s

  Verdicts complete for 3 repositories
  Tokens: 184,231 (in: 142,019, out: 42,212)
  Estimated cost: $3.61
  Duration: 142s

Verdict Summary:
  🚢 arbitra — SHIP (high, verified)
    Phases 4-5 complete per STATE.md, matched in code. Phase 6 declared pending,…
  🔁 designscout — REVIVE (medium, verified)
    Active until Aug 2024 then 8 months silent. Architecture sound, scraping…
  ⚰️  prototype-v1 — KILL (high, verified)
    No commits in 18 months. README contradicts every file. Two competing auth…

Results saved to /Users/me/.graveyard/runs/2026-04-27T08-00-00.000Z/
Morning brief: /Users/me/.graveyard/runs/2026-04-27T08-00-00.000Z/morning-brief.md

A full sample of the generated morning-brief.md and verdicts.json lives in docs/sample-output.md.

CLI commands

graveyard scan <path>                      # Full pipeline (discovery → agents → verdict → report)
graveyard scan <path> --preprocess-only        # Only deterministic layers; no LLM calls, no API key needed
graveyard scan <path> --skip-verify            # Skip the adversarial self-verification step (faster, less safe)
graveyard scan <path> --json                   # Machine-readable JSON to stdout (still persists artifacts)
graveyard scan <path> --verbose                # Per-repo timing, signal counts
graveyard scan <path> --max-repos <n>          # Limit to first N discovered repos (alphabetical order)
graveyard scan <path> --include <pattern>      # Only audit repos whose basename contains <pattern>; repeatable
graveyard scan <path> --exclude <pattern>      # Skip repos whose basename contains <pattern>; repeatable
graveyard scan <path> --output-dir <dir>       # Write run artifacts to <dir> instead of ~/.graveyard
graveyard scan <path> --no-persist             # Skip writing artifacts; print results to terminal only
graveyard --version
graveyard --help

--include and --exclude use case-insensitive substring matching against each repo's directory basename — no globs, no regex. Both flags are repeatable: multiple --include values union (a repo passes if it matches any include pattern), multiple --exclude values also union. With no --include flag every discovered repo is initially in scope; --exclude then removes matches. Filters are applied before --max-repos, so --include react --include server --max-repos 1 first narrows to repos matching either pattern and then keeps the first alphabetically. If filters wipe out every repo, the run exits non-zero. Substring matching means --include test matches both test-server and latest-thing; pass narrower patterns or combine with --exclude to disambiguate.

About the API key

GRAVEYARD_API_KEY is a Graveyard-specific alias for an Anthropic API key. It is not a separate Graveyard cloud API key — Graveyard has no cloud service of its own. The alias exists so you can keep an Anthropic key dedicated to Graveyard runs without colliding with Claude Code's OAuth or other tools that read ANTHROPIC_API_KEY.

If GRAVEYARD_API_KEY is unset or blank, Graveyard falls back to ANTHROPIC_API_KEY. If neither is set, the CLI's preflight check fails with a clear message.

Model selection (advanced)

Graveyard uses two model roles: Sonnet (fast agents: Git Forensics, Code Health) and Opus (synthesis agents: Intent Drift, Architecture, Fusion, Verification). Defaults track the latest stable family aliases — pin a dated snapshot via env if you need exact reproducibility:

| Env var | Default | |---|---| | GRAVEYARD_MODEL_SONNET | claude-sonnet-4-6 | | GRAVEYARD_MODEL_OPUS | claude-opus-4-7 |

Set either variable to override; blank or whitespace values fall back to the default. If you don't have Opus access, collapse both roles to Sonnet:

export GRAVEYARD_MODEL_OPUS=claude-sonnet-4-6

The preflight check tests both resolved model IDs before any batch run — a clear error names the role and the ID if one is unavailable.

MCP integration

The MCP server exposes the deterministic operations as tools and the agent system prompts + data schemas as resources. Your AI client orchestrates the pipeline; the server does no LLM work and needs no API key.

Claude Code (`.mcp.json`)

The package ships two bins (graveyard for the CLI and graveyard-mcp for the MCP server), so npx needs the explicit bin name:

{
  "mcpServers": {
    "graveyard": {
      "command": "npx",
      "args": ["-y", "-p", "@mwxb/graveyard", "graveyard-mcp"]
    }
  }
}

(Don't write "args": ["-y", "@mwxb/graveyard"] — that runs the CLI bin, which then waits on stdin and confuses the MCP client.)

Once attached, the server announces a usage guide via the MCP instructions field — clients that surface this (e.g. Claude Desktop, Cursor) will display it inline before any tool calls.

Cursor / Cline / Windsurf

Same command / args — drop into the client's MCP config (location varies by client).

Asking the client to run a full audit

Once attached, ask Claude (or any MCP-capable assistant):

Use the graveyard MCP to audit ~/projects. For each repo:
Call graveyard_discover on the root path.
For each profiled repo, call graveyard_extract_evidence.
Read the four sub-agent prompt resources (graveyard://prompts/{git-forensics,code-health,intent-drift,architecture}) and play those four roles to produce findings matching graveyard://schemas/findings.
Call graveyard_assess_completeness on the findings to get a deterministic confidence floor.
Read graveyard://prompts/fusion to synthesize a verdict.
Read graveyard://prompts/verify for an adversarial self-check.
Call graveyard_generate_report with the resulting verdicts to produce the morning brief.

Minimal first call

The simplest useful trip through the MCP server — discover what's there, then format a brief from verdicts you already have (e.g. from a previous CLI run):

// Step 1: discover all repos under a path
graveyard_discover({ path: "~/projects" })
→ structuredContent: {
    scannedPath: "/Users/me/projects",
    count: 3,
    profiles: [
      { name: "arbitra",      path: "...", languages: ["TypeScript"], hasTests: true, hasCi: true, ... },
      { name: "designscout",  path: "...", languages: ["Python"],     hasTests: true, hasCi: false, ... },
      { name: "prototype-v1", path: "...", languages: ["Python"],     hasTests: false, hasCi: false, ... }
    ]
  }

// Step 2: format verdicts into morning-brief.md (outputDir:null = return markdown, no disk write)
graveyard_generate_report({
  verdicts: [ /* Verdict[] — see graveyard://schemas/verdict */ ],
  scannedPath: "~/projects",
  startedAt: "2026-04-27T08:00:00Z",
  completedAt: "2026-04-27T08:02:22Z",
  outputDir: null
})
→ structuredContent: {
    markdown: "# Graveyard — Morning Brief\n...",
    filePath: null
  }

For the full evidence-extraction + LLM pipeline see docs/sample-output.md §5.

Tools

| Tool | Title | Reads disk | Writes disk | Idempotent | Purpose | |---|---|---|---|---|---| | graveyard_discover | Discover git repositories | yes | no | yes | Walk a path; return profiled list of git repos | | graveyard_extract_evidence | Extract preprocessed evidence | yes | no | yes | For one repo, run all 4 deterministic extractors (git, health, intent, arch) | | graveyard_assess_completeness | Assess evidence completeness | no | no | yes | Score finding completeness; derive a deterministic confidence (no LLM) | | graveyard_generate_report | Generate morning-brief.md | no | yes (default ~/.graveyard/runs/<iso>/) | no | Format Verdict[] as markdown morning brief; optional disk write |

Each tool declares an outputSchema and returns structuredContent per MCP spec 2025-06-18.

Resources

| URI | MIME | Role | |---|---|---| | graveyard://prompts/git-forensics | text/markdown | Sub-agent: interpret deterministic git activity signals into cadence + hygiene findings | | graveyard://prompts/code-health | text/markdown | Sub-agent: assess test/dep/secret signals and produce a health score | | graveyard://prompts/intent-drift | text/markdown | Sub-agent: compare CLAUDE.md / STATE.md / ROADMAP claims against the actual code | | graveyard://prompts/architecture | text/markdown | Sub-agent: detect god files, AI-generated code smells, and security smells | | graveyard://prompts/fusion | text/markdown | Synthesise the 4 sub-agent outputs into one Verdict + confidence | | graveyard://prompts/verify | text/markdown | Adversarial self-check; can override the fused verdict to INSUFFICIENT_DATA when evidence is too thin. Note: the orchestrator may also emit INSUFFICIENT_DATA upstream when sub-agents fail — see Failure handling. | | graveyard://schemas/evidence | application/schema+json | JSON Schema for PreprocessedEvidence (deterministic extractor output) | | graveyard://schemas/findings | application/schema+json | JSON Schema for RepoFindings (4-agent bundle; input to fusion) | | graveyard://schemas/verdict | application/schema+json | JSON Schema for { verdict, runResult } (canonical persisted shape) |

Most MCP clients require resources to be explicitly attached by the user; the prompts above will not flow into the LLM context automatically.

Safety model

Read-only by design. All shell commands route through a strict allowlist — git is the only permitted binary, and it runs in argv form (no shell). Destructive git subcommands (push, pull, commit, merge, rebase, reset, checkout, switch, restore, stash, clean, rm, mv, add, init, clone) are additionally blocked at the runner.
No secret values are captured. The health extractor records only file:line (pattern) for detected secret patterns; raw values never leave the repo.
MCP write surface is minimal and bounded. Only graveyard_generate_report writes, and only under $HOME, $TMPDIR, or /tmp (resolved via realpath() so symlink escapes are blocked). All other tools are read-only.
Repo paths are validated server-side. Non-existent paths, files-not-directories, and (for extract_evidence) non-git directories are rejected before any subprocess runs.
stdout is the JSON-RPC wire. All MCP server diagnostics go through stderr; console.log is forbidden in src/mcp/.

Direct test coverage for the allowlist lives in tests/preprocessing/runner.test.ts; symlink-escape coverage lives in tests/mcp/util/paths.test.ts.

Data sent to Anthropic

The CLI's full pipeline transmits deterministic extracted evidence to your configured Anthropic API endpoint over TLS — bring-your-own-key, no third-party intermediary. Per scanned repo, the payload includes:

repo metadata: name, languages, package managers, hasTests / hasCi / hasDeployConfig flags
absolute local paths (repo root and per-file paths)
the last 20 commit subjects, each with ISO date and short SHA (subject text only; commit bodies are not transmitted)
intent doc excerpts truncated at 3000 characters per doc — looked up at CLAUDE.md, STATE.md, ROADMAP.md, TODO.md, docs/STATE.md, docs/ROADMAP.md, .planning/STATE.md, .planning/ROADMAP.md, and ROADMAP_*.md variants in the repo root
file metrics: total source files, estimated total LOC, the top 10 largest source files by line count (path + LOC only)
secret-pattern locations (file:line (pattern)); the matched secret value is never read out of the file and never transmitted
architecture summary: top-level directory structure and detected smells

Raw source file contents are not transmitted. Branch names are not transmitted. The MCP server makes no LLM calls and transmits nothing to Anthropic — only the CLI does.

If a secret accidentally appears in a commit subject or in an intent doc within the 3000-char excerpt window, it will be in the data sent to Anthropic — the secret-pattern scanner only suppresses values found inside source files.

Dependency advisories

A fresh install will surface one moderate npm audit advisory: GHSA-p7fg-763f-g4gf on @anthropic-ai/sdk (Insecure Default File Permissions in the Local Filesystem Memory Tool). Graveyard uses only messages.create() and never instantiates the Memory Tool feature, so this advisory is not reachable from this package. The fix is a semver-major SDK bump that will land in a later release. Other transitive advisories from the MCP SDK's HTTP-transport dependency tree (hono, fast-uri, ip-address, express-rate-limit) are kept current via npm audit fix at each release; graveyard's MCP server is stdio-only and does not exercise those code paths.

Failure handling

Sub-agent calls go through a two-step recovery model so a single flaky agent doesn't sink a whole overnight batch:

Single retry on recoverable errors. If a sub-agent call fails with a recoverable class — rate limit (HTTP 429), transient status (408 / 500 / 502 / 503 / 504 / 529), connection timeout, or an output-shape mismatch — the agent is retried once.
Degraded mode if the retry is exhausted. Rather than aborting the run, the orchestrator skips fusion and verify for that repo only and emits a verdict of INSUFFICIENT_DATA with confidence: low, evidenceCompleteness: thin, selfVerifyState: skipped, and the failure surfaced in selfVerifyNotes as [Agent failure] <agent>(<tag>): <message>. The next-action is Re-run after agent failures resolve. Other repos in the same batch are unaffected.

Non-recoverable failures (authentication, missing model access, malformed prompts, schema-incompatible API responses) are not degraded — they still abort the run with a clear error so misconfiguration is loud, not silent.

See docs/sample-output.md §"Scenario: sub-agent failure (degraded mode)" for the exact format users see in the brief.

Output artifacts

After a full CLI run, ~/.graveyard/runs/<iso-timestamp>/ contains (use --output-dir <dir> to redirect to a custom root, or --no-persist to skip disk writes entirely — neither flag affects --preprocess-only):

| File | Type | Purpose | |---|---|---| | morning-brief.md | Markdown | Human-readable portfolio summary + per-repo verdict cards | | verdicts.json | JSON | Typed Verdict[] for tooling / piping | | latest (symlink) | Pointer | Always points at the newest run, useful for demo scripts |

A complete realistic example is in docs/sample-output.md.

Breaking change (0.3.0): verdicts.json replaces selfVerifyPassed: boolean with selfVerifyState: "verified" | "challenged" | "skipped" and adds originalVerdict (the pre-override fusion verdict, or null). Pre-0.3.0 run files will not parse against the current schema.

Migration from pre-0.3.0 artifacts

Old verdicts.json files written by Graveyard 0.2.x cannot be passed back to graveyard_generate_report — the tool will return an actionable error explaining the schema change. The simplest fix is to re-run graveyard scan to regenerate. If you choose to hand-migrate, the closest mechanical mapping is: drop selfVerifyPassed, add selfVerifyState: "verified" (if it was true) or "challenged" (if false), and add originalVerdict: null. selfVerifyNotes is unchanged. This mapping is approximate — Graveyard 0.2.x did not track override history, so originalVerdict is genuinely lost.

Development

npm install
npm run build         # tsc → dist/
npm test              # vitest run (one-shot)
npm run test:watch    # vitest TUI
npm run typecheck     # tsc --noEmit
npm run dev           # CLI in dev mode (tsx)
npm run dev:mcp       # MCP server in dev mode (tsx; blocks on stdio)

Tests cover discovery, preprocessing, runner allowlist, path validation, MCP server registration, mocked-orchestrator pipeline, and end-to-end CLI smoke. See DESIGN.md for architecture details.

Scope

Designed for

Local-only execution. Reads your local clone only — no GitHub/GitLab API calls, no remote branch inspection.
English-only reasoning. Prompts assume English repo content; verdicts on non-English text may be lower quality.
Single-user audits. No multi-user state or shared run history.
One-shot by default. Each run is independent; comparison across runs is on the roadmap, not built-in today.
MCP delegates orchestration. The CLI guarantees the full pipeline; the MCP server exposes deterministic tools and lets the client LLM run the recipe.

Not on the roadmap

Hosted/multi-tenant deployment
Real-time collaboration features
Cross-repo dependency analysis

On the roadmap

Snapshot tests for the morning brief format
Static HTML dashboard alongside the markdown brief
Run-over-run drift detection (compare against a previous audit)
Optional GitHub integration to surface stale PRs and abandoned issues per repo

Releases

See CHANGELOG.md for a per-release breakdown.

License

MIT — Moran W.