ruleprobe-ai
v1.7.0
Published
Test whether your repo AI instructions actually survive execution.
Maintainers
Readme
RuleProbe
AI coding rules are documentation until you test them.
RuleProbe is a CLI that turns AI instruction files (CLAUDE.md, AGENTS.md, .cursor/rules, Copilot instructions) into executable compliance tests. It extracts rules, generates disposable sandbox scenarios, runs an AI provider against each one, and produces a scored JSON/Markdown/HTML report.
What it tests
| Signal | How |
|---|---|
| Package manager compliance | Detects npm/yarn when pnpm is required, etc. |
| Forbidden commands | Checks that blocked commands (git commit, pnpm test) are not invoked |
| Required commands | Verifies that required validation steps run before the final response |
| Protected file changes | Catches writes to src/generated/**, package.json, etc. |
| Forbidden/required code patterns | Inspects changed file content for any, class, Uint8Array, etc. |
| Final-answer phrasing | Checks that response text contains/excludes required phrases |
Does not measure: full multi-turn workflow replay, subjective code quality, or "is this a good rule".
Quick start
# Zero install — try it now (no API key needed)
npx ruleprobe-ai run examples/strict --demo
# Or install globally
npm install -g ruleprobe-ai
# pnpm add -g ruleprobe-ai
# Realistic demo: PASS/FAIL mix, no API key
ruleprobe run examples/strict --demo
# Real provider (Gemini)
GEMINI_API_KEY=... ruleprobe run . --provider gemini --extractor hybrid --fail-below 70From source:
git clone https://github.com/canblmz1/ruleProb
cd ruleProb
pnpm install && pnpm build
pnpm dev run examples/strict --provider mockCommands
| Command | Description |
|---|---|
| ruleprobe run [dir] | Run all compliance tests and write reports |
| ruleprobe list-rules [dir] | Preview extracted rules (no sandbox); use --show-scenarios to preview generated test scenarios |
| ruleprobe analyze [dir] | AI extraction only — emit JSON candidates, no evaluation |
| ruleprobe compare [dir] | Deterministic vs hybrid extraction diff, or branch vs base ref |
| ruleprobe doctor | Local diagnostics: Node, pnpm, git, claude, dist, env keys |
| ruleprobe providers | Show provider capability matrix |
| ruleprobe clear-cache | Wipe AI extraction cache at .ruleprobe/cache/ |
| ruleprobe init [dir] | Write a starter ruleprobe.config.json; use --from-claude to auto-detect instruction files |
| ruleprobe report | Show latest report path |
| ruleprobe badge | Generate score and trend SVG badges |
Common flags
--provider <name> mock | dry-run | openrouter | gemini | claude-code | opencode-go
--providers <list> Comma-separated providers for side-by-side comparison
--extractor <type> deterministic | ai-assisted | hybrid
--model <model> Override model for the extraction provider
--fail-below <score> Exit 1 if score < N (default: off)
--debug-extractor Print per-file extraction diagnostics
--no-cache Disable AI extraction cache for this run
--provider-timeout-ms <ms> Override the default provider timeout
--keep-sandbox Do not delete sandbox after run
--watch Watch instruction files and re-run on changes
--badge Generate SVG score and trend badges after runExamples
| Example | Description | Rules |
|---|---|---|
| examples/basic | Minimal starter — package manager + one forbidden file | 6 |
| examples/minimal | 3-rule zero-friction intro (package manager, forbidden command, required command) | 3 |
| examples/strict | Full-coverage showcase — all rule categories, deliberate failures | 17 |
# Try the strict example (no API key)
npx ruleprobe-ai list-rules examples/strict
npx ruleprobe-ai run examples/strict --provider mock --fail-below 0Report output
Every run writes .ruleprobe/report.{json,md,html}. The Markdown report opens with a shareable proof block:
RuleProbe Compliance Report
Provider: gemini Extractor: hybrid
Score: 85/100 (severity-weighted: 78/100)
Rules tested: 12 PASS=9 PARTIAL=1 FAIL=2 SKIPPED=0
Instruction files: CLAUDE.md, AGENTS.md
Top issues:
- [FAIL] (high/forbidden_command) Forbidden command boundary: git commit
- [FAIL] (medium/required_command) Required validation command: pnpm typecheck
Known limitations:
- Results are based on generated sandbox scenarios, not a replay of the full repository workflow.
Report: .ruleprobe/report.mdThe severity-weighted score uses high=3 / medium=2 / low=1.
Interactive HTML dashboard
The HTML report is now a fully interactive dashboard powered by Chart.js:
- Doughnut chart — overall pass/partial/fail/skipped distribution
- Stacked bar chart — results broken down by category
- Search & filter — filter results by keyword, status, or severity
- Expand/collapse all — quickly navigate large result sets
- Score trend line — when history is available, shows score evolution over time
Open .ruleprobe/report.html in your browser after any run.
Providers
ruleprobe providers| Provider | Extraction | Runtime | Notes |
|---|---|---|---|
| mock | — | Simulated (mixed PASS/FAIL/SKIPPED) | CI smoke; not real model behavior |
| dry-run | — | None | Inspects flow only |
| openrouter | Yes | Sandboxed action bridge | Quality depends on model and quota |
| gemini | Yes | Sandboxed action bridge | JSON-mode extraction + runtime |
| opencode-go | Yes | Experimental action bridge | Requires OPENCODE_GO_API_KEY + OPENCODE_GO_MODEL |
| claude-code | — | Real local CLI | Inferred from transcript; not comparable with action-bridge providers |
Full capability matrix: docs/provider-capabilities.md
Environment variables
Copy .env.example and fill in the keys you need:
cp .env.example .envOPENROUTER_API_KEY=
OPENROUTER_MODEL=openrouter/free
GEMINI_API_KEY=
GEMINI_MODEL=gemini-2.5-flash
OPENCODE_GO_API_KEY=
OPENCODE_GO_MODEL=opencode-go/kimi-k2.6
OPENCODE_GO_AUTH_HEADER_MODE=bearer # or x-api-keyCustom Providers
Implement the Provider interface to connect any AI model:
import type { Provider, ProviderInput, ProviderResult } from 'ruleprobe-ai';
export class MyProvider implements Provider {
name = 'my-provider';
async run(input: ProviderInput): Promise<ProviderResult> {
// ... call your model, return structured result
}
}Full guide: docs/custom-providers.md
Extraction modes
deterministic — regex/heuristic extraction from instruction text. Fast, no API key needed, works well for common patterns.
ai-assisted — sends instruction files to the configured provider and asks it to classify rules as structured JSON. Requires an API-capable provider (gemini, openrouter, opencode-go).
hybrid — runs both and merges, deduplicating by normalized signature. Recommended when you have an API key.
AI extraction results are hash-keyed and cached at .ruleprobe/cache/. Use --no-cache or ruleprobe clear-cache to bust it.
Comparing extraction modes / branches
# Deterministic vs hybrid for the same file
ruleprobe compare . --provider gemini
# Branch vs base ref (useful in CI to detect rule regressions)
ruleprobe compare . --base origin/main --extractor hybridMulti-provider comparison
Compare how different AI providers perform against the same rule set in a single run:
ruleprobe run . --providers mock,gemini --report-dir .ruleprobe-compareThis generates a side-by-side Markdown comparison report (e.g., .ruleprobe-compare/comparison-{id}.md) showing which scenarios each provider passes or fails.
Watch mode
Automatically re-run tests when instruction files change:
ruleprobe run . --provider gemini --watchRuleProbe watches the directories containing your instructionFiles and triggers a full re-run on any change.
Score history & trends
RuleProbe automatically tracks scores across runs in .ruleprobe/history.json. The HTML report renders a trend line chart when history exists, and the CLI prints a summary of best, worst, and average scores.
History entries include:
- timestamp, score, weighted score
- provider and extractor used
- git branch and commit (when available)
Badge generation
Generate SVG badges for your README or CI dashboards:
# Auto-generate after a run
ruleprobe run . --provider gemini --badge
# Or generate manually
ruleprobe badge --score 85 --weighted-score 78Outputs:
.ruleprobe/badge-score.svg— current score badge.ruleprobe/badge-trend.svg— trend direction badge (up/down/stable).ruleprobe/badge.json— shields.io endpoint JSON (auto-generated on every run/badge)
Use them in your README:
Shields.io dynamic badge
Host .ruleprobe/badge.json at a public URL (e.g. commit it, or serve via GitHub Pages), then use the shields.io endpoint:
[](https://github.com/YOUR/REPO)The JSON format follows the shields.io endpoint spec. Fields: schemaVersion, label, message, color, style.
CI integration
Official GitHub Action (zero-config)
- uses: canblmz1/[email protected]
with:
provider: mock # no API key needed
fail-below: '70' # block PR if score drops below 70With Gemini for real evaluation:
- uses: canblmz1/[email protected]
with:
provider: gemini
extractor: hybrid
fail-below: '70'
env:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}Full reference: docs/github-actions.md
Pre-commit hook
Block commits that would drop compliance below your threshold.
Husky:
# .husky/pre-commit
npx ruleprobe-ai run . --provider dry-run --extractor deterministic --fail-below 0lefthook (lefthook.yml):
pre-commit:
commands:
ruleprobe:
run: npx ruleprobe-ai run . --provider dry-run --extractor deterministic --fail-below 0Full examples: examples/hooks/
VS Code Integration
RuleProbe writes .ruleprobe/report.sarif after each run. Open it with the SARIF Viewer extension to see compliance failures as inline squiggles on your instruction files.
# Install recommended extension (prompted automatically if you clone this repo)
# Or search VS Code extensions: MS-SarifVSCode.sarif-viewerRun via the built-in VS Code task (Ctrl+Shift+P → Tasks: Run Task → RuleProbe: Run (mock)), or open the SARIF file manually:
Command Palette →
SARIF: Open SARIF file→.ruleprobe/report.sarif
See docs/extensions.md for full setup.
Configuration
ruleprobe.config.json (auto-generated by ruleprobe init):
{
"provider": "mock",
"extractor": "deterministic",
"instructionFiles": [
"CLAUDE.md",
"AGENTS.md",
".cursor/rules/*.mdc",
".github/copilot-instructions.md"
],
"reportDir": ".ruleprobe",
"failBelow": 70,
"keepSandbox": false
}Safety model
RuleProbe creates disposable sandboxes and blocks:
- Path traversal and absolute-path writes
- Writes to
.git,.ruleprobe,node_modules - Destructive shell commands (
rm,git reset,git push, package publishes) - Long-running commands via action timeouts
API key and data privacy: When using real providers (gemini, openrouter, opencode-go), your instruction file contents are sent to the provider API for extraction and/or scenario evaluation. Do not include secrets, personal data, or proprietary information in your instruction files when using third-party providers.
Recommended: add .ruleprobe/ to your .gitignore to avoid committing reports, cache, badges, and history files that may contain sensitive rule details:
echo '.ruleprobe/' >> .gitignoreUse real providers only with repositories and credentials you are comfortable testing.
Troubleshooting
API key not found: Ensure you copied .env.example to .env and filled in the required keys. Run ruleprobe doctor to verify key presence.
Provider returns no rules: Try --extractor deterministic first to verify extraction works, then add --debug-extractor for verbose output.
Typecheck fails after install: Ensure you are using pnpm (not npm or yarn). Run pnpm install --frozen-lockfile.
Windows path issues: RuleProbe normalizes paths internally. If you see path separator issues in sandbox output, report them with the full error message and your OS/Node version.
Score below threshold / exit code 1: Use --fail-below 0 to disable the threshold check while debugging.
Development
pnpm install
pnpm build # tsup ESM + DTS
pnpm test # vitest (175 tests)
pnpm typecheck # tsc --noEmit
pnpm dev doctor # local diagnostics
# Benchmark extraction corpus
pnpm dev benchmark --fixtures-onlySee CONTRIBUTING.md for contribution guidelines and ROADMAP.md for planned work.
License
MIT — see LICENSE
