vigiles
v5.1.0
Published
Lint & test the harness your AI agent runs on — verify the references in your CLAUDE.md / AGENTS.md and test that your hooks and skills actually work.
Maintainers
Readme
Agent = Model + Harness. You'd never ship an app without a linter and a test
suite — yet the harness steering your agent runs on vibes. vigiles[^name] is the
deterministic layer for it, and does two independent things — adopt either, or both:
| | | | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | 🔎 Lint | Every file path, script, code symbol, and linter rule your CLAUDE.md cites is checked against reality — so a renamed file or a disabled rule can't silently mislead the agent. → | | 🧪 Test | Hooks, skills, and subagents are code. vigiles tests they do their job — and almost all of it is deterministic, no API key; the real-model evals run on your Claude subscription, not metered tokens. → |
Pick the one that hurts today. Works with Claude Code and Codex
(vigiles/codex), and you can
teach it your own harness.
Quick start
Paste into Claude Code or Codex:
Set up vigiles in this repo with good defaults (lint + test, non-interactive).
Verify my CLAUDE.md / AGENTS.md references and show me what's stale, then write
and run a harness test for one of my hooks or skills. Ask me first before gating
it in CI, adding a real-model eval, or enforcing strictly (--strict).Or do it yourself:
npx vigiles init # sets up lint + test: spec + harness test + CI + pluginIt's interactive in a terminal and non-interactive for agents/CI (or with
--yes), so "set up vigiles" from a Claude Code / Codex prompt Just Works — and
it installs a model-invocable test-harness skill, so afterward you can just
tell your agent "test my skills" and it picks the tier and writes the test.
Both lint and test by default; scope with
--lint/--test(one or both).Adds
vigilesto yourdevDependencies.Installs the Claude Code plugin (skills + hooks) via the marketplace — globally, never vendored into your repo.
Wires CI as a
zernie/vigiles@v1workflow (a composite over the same CLI):- uses: actions/checkout@v4 - uses: zernie/vigiles@v1 # lints by default; posts a sticky PR comment + a `valid` output
Prefer to write tests yourself? They can be JS or TS
(*.harness.{mjs,ts}) — run them with npx vigiles test.
① Lint — your CLAUDE.md lies to your agent
Your CLAUDE.md points the agent at src/auth/login.ts and tells it to run
npm run check. But the file moved to src/auth/session.ts six commits ago, and
the script was renamed. The agent trusts the stale claim and acts on fiction.
npx vigiles lint resolves every reference against reality:
CLAUDE.md:
✗ src/auth/login.ts — no such file (renamed or moved?)
✗ npm run check — not in package.json. Did you mean: "check:types"?
✓ @typescript-eslint/no-floating-promises — exists and enabled in eslint configFile paths, scripts, and code symbols — plus linter rules across 7 catalogs
(the rule exists and is enabled). Start with one inline comment, no new files;
step up to a typed .spec.ts (compiled to CLAUDE.md, compiler-grade) when you want
it. Full guide →
Same cross-reference, any plugin. npx vigiles scan checks a plugin's
contracts — every subagent tool, mcp__server__tool, mcp_tool hook, hook
event, and script path actually exists and resolves, not just parses (valid
YAML ≠ a tool that's real). A superset of Anthropic's claude plugin validate,
no key. Audit any plugin →
② Test — does your harness do its job?
A hook can be wired wrong. A skill's description can fail to trigger — or hijack unrelated prompts. Injected context can never reach the model. All of it passes a naive "did it run?" check. vigiles tests the assembled harness for real:
import { runHook } from "vigiles/testing";
const r = runHook(guard, {
hook_event_name: "PreToolUse",
tool_name: "Bash",
tool_input: { command: "git commit --no-verify" },
});
assert(r.blocked); // a red ✗ means your guard silently lets it throughIt goes well past "did it fire?":
- Hooks block what they must —
runHook, or the real agent CLI viarunHarnessTest. - Skills trigger on the right prompts and stay quiet on the wrong ones — recall and precision (
measureTriggerRate). - Behaviour is good — score a skill's output directly, or A/B it on-vs-off for the real lift over no-skill (
measure/runEval, with significance testing). - Safety holds — the agent didn't push to the wrong branch or hit a paid API;
interceptToolscatches the attempt so the side effect never happens.
The eval you can actually afford. Almost every tier runs with no model and
no API key — milliseconds, on every commit. The rest drive your own claude CLI:
| | Runs on | Cost | | ---------------------- | ----------------------- | ------------------------------------------- | | promptfoo, DeepEval, … | metered API SDK | billed per token, every run | | vigiles | your Claude Pro/Max sub | $0 extra — and most tiers need no model |
That's why you can eval your harness on every change, not just once. How it works → · Why it's affordable → · Safety model →
More
- Plugin health leaderboard → — point
scanat a marketplace (e.g.wshobson/agents) and it ranks every plugin by structural health (0–100, A–F), worst issues first — still no key. Add--triggerfor the model-gated column: do the skills actually fire? - CLI & GitHub Action → — every command, the Action (inputs / output / versioning), the Claude Code plugin, and the
lintrules. - Skills → — consumer skills installed as a Claude Code plugin:
/plugin marketplace add zernie/vigilesthen/plugin install vigiles@vigiles(or letvigiles initdo it). The model-invocable ones (test-harness,strengthen,edit-spec) fire on their own — ask "test my skills", "strengthen my rules", or "add a rule to CLAUDE.md" and the agent reaches for them;migrate-to-specandlinter-docsare user-invoked. - Docs index → · Research → · Related tools → (ast-grep, Dependency Cruiser, Ruler, rulesync).
- Companion to Feedback Loop Is All You Need.
License
[^name]: vigiles — the watchmen of ancient Rome, who guarded the city (and fought its fires) by night. Quis custodiet ipsos custodes? — "who watches the watchmen?" (Juvenal, Satire VI).
