ruleprobe-ai

v2.12.0

Published

2 months ago

Test whether your repo AI instructions actually survive execution.

RuleProbe

Your AI coding rules are documentation until you test them.

RuleProbe Demo

RuleProbe is a CLI that turns AI instruction files (CLAUDE.md, AGENTS.md, .cursor/rules, Copilot instructions) into executable compliance tests. It extracts rules, generates disposable sandbox scenarios, runs an AI provider against each one, and produces a scored JSON/Markdown/HTML report.

What it tests

| Signal | How | |---|---| | Package manager compliance | Detects npm/yarn when pnpm is required, etc. | | Forbidden commands | Checks that blocked commands (git commit, pnpm test) are not invoked | | Required commands | Verifies that required validation steps run before the final response | | Protected file changes | Catches writes to src/generated/**, package.json, etc. | | Forbidden/required code patterns | Inspects changed file content for any, class, Uint8Array, etc. | | Final-answer phrasing | Checks that response text contains/excludes required phrases |

Does not measure: full multi-turn workflow replay, subjective code quality, or "is this a good rule".

Quick start

# Zero install — try it now (no API key needed)
npx ruleprobe-ai run examples/strict --demo

# Or install globally
npm install -g ruleprobe-ai
# pnpm add -g ruleprobe-ai

# Realistic demo: PASS/FAIL mix, no API key
ruleprobe run examples/strict --demo

# Real provider (Gemini)
GEMINI_API_KEY=... ruleprobe run . --provider gemini --extractor hybrid --fail-below 70

From source:

git clone https://github.com/canblmz1/ruleProb
cd ruleProb
pnpm install && pnpm build
pnpm dev run examples/strict --provider mock

Commands

| Command | Description | |---|---| | ruleprobe run [dir] | Run all compliance tests and write reports | | ruleprobe list-rules [dir] | Preview extracted rules (no sandbox); use --show-scenarios to preview generated test scenarios | | ruleprobe analyze [dir] | AI extraction only — emit JSON candidates, no evaluation | | ruleprobe compare [dir] | Deterministic vs hybrid extraction diff, or branch vs base ref | | ruleprobe doctor | Local diagnostics: Node, pnpm, git, claude, dist, env keys | | ruleprobe providers | Show provider capability matrix | | ruleprobe clear-cache | Wipe AI extraction cache at .ruleprobe/cache/ | | ruleprobe init [dir] | Write a starter ruleprobe.config.json; use --from-claude to auto-detect instruction files | | ruleprobe report | Show latest report path | | ruleprobe badge | Generate score and trend SVG badges |

Common flags

--provider <name>           mock | dry-run | openrouter | gemini | claude-code | opencode-go
--providers <list>          Comma-separated providers for side-by-side comparison
--extractor <type>          deterministic | ai-assisted | hybrid
--model <model>             Override model for the extraction provider
--fail-below <score>        Exit 1 if score < N (default: off)
--debug-extractor           Print per-file extraction diagnostics
--no-cache                  Disable AI extraction cache for this run
--provider-timeout-ms <ms>  Override the default provider timeout
--keep-sandbox              Do not delete sandbox after run
--watch                     Watch instruction files and re-run on changes
--badge                     Generate SVG score and trend badges after run

Examples

| Example | Description | Rules | |---|---|---| | examples/basic | Minimal starter — package manager + one forbidden file | 6 | | examples/minimal | 3-rule zero-friction intro (package manager, forbidden command, required command) | 3 | | examples/strict | Full-coverage showcase — all rule categories, deliberate failures | 17 | | examples/nextjs-app | Realistic Next.js project — pnpm, typecheck, file protection, code patterns | 14 | | examples/rust-project | Rust/cargo rules — clippy, fmt, file protection, no unwrap | 11 | | examples/security-focused | Security-heavy enforcement — audit, no eval, no hardcoded secrets | 10 | | examples/unverifiable | Shows unverifiable rule detection — 3 testable + 5 unverifiable | 3 testable + 5 |

# Try the strict example (no API key)
npx ruleprobe-ai list-rules examples/strict
npx ruleprobe-ai run examples/strict --provider mock --fail-below 0

Report output

Every run writes .ruleprobe/report.{json,md,html}. The Markdown report opens with a shareable proof block:

RuleProbe Compliance Report
Provider: gemini  Extractor: hybrid
Score: 85/100  (severity-weighted: 78/100)
Rules tested: 12  PASS=9  PARTIAL=1  FAIL=2  SKIPPED=0
Instruction files: CLAUDE.md, AGENTS.md
Top issues:
- [FAIL] (high/forbidden_command) Forbidden command boundary: git commit
- [FAIL] (medium/required_command) Required validation command: pnpm typecheck
Known limitations:
- Results are based on generated sandbox scenarios, not a replay of the full repository workflow.
Report: .ruleprobe/report.md

The severity-weighted score uses high=3 / medium=2 / low=1.

Interactive HTML dashboard

The HTML report is now a fully interactive dashboard powered by Chart.js:

Doughnut chart — overall pass/partial/fail/skipped distribution
Stacked bar chart — results broken down by category
Search & filter — filter results by keyword, status, or severity
Expand/collapse all — quickly navigate large result sets
Score trend line — when history is available, shows score evolution over time

Open .ruleprobe/report.html in your browser after any run.

Language Bindings

Thin HTTP clients built on ruleprobe serve:

# Python
pip install ruleprobe-client

from ruleprobe_client import RuleProbeClient
client = RuleProbeClient("http://localhost:3000")
print(f"Score: {client.score('.')}/100")

// Go
import "github.com/canblmz1/ruleprobe-go/ruleprobe"
client := ruleprobe.New("")
score, _ := client.Score(".", "mock")

See clients/python/ and clients/go/ for full docs.

Providers

ruleprobe providers

| Provider | Extraction | Runtime | Notes | |---|---|---|---| | mock | — | Simulated (mixed PASS/FAIL/SKIPPED) | CI smoke; not real model behavior | | dry-run | — | None | Inspects flow only | | openrouter | Yes | Sandboxed action bridge | Quality depends on model and quota | | gemini | Yes | Sandboxed action bridge | JSON-mode extraction + runtime | | opencode-go | Yes | Experimental action bridge | Requires OPENCODE_GO_API_KEY + OPENCODE_GO_MODEL | | claude-code | — | Real local CLI | Inferred from transcript; not comparable with action-bridge providers |

Full capability matrix: docs/provider-capabilities.md

Environment variables

Copy .env.example and fill in the keys you need:

cp .env.example .env

OPENROUTER_API_KEY=
OPENROUTER_MODEL=openrouter/free

GEMINI_API_KEY=
GEMINI_MODEL=gemini-2.5-flash

OPENCODE_GO_API_KEY=
OPENCODE_GO_MODEL=opencode-go/kimi-k2.6
OPENCODE_GO_AUTH_HEADER_MODE=bearer   # or x-api-key

Custom Providers

Implement the Provider interface to connect any AI model:

import type { Provider, ProviderInput, ProviderResult } from 'ruleprobe-ai';

export class MyProvider implements Provider {
  name = 'my-provider';
  async run(input: ProviderInput): Promise<ProviderResult> {
    // ... call your model, return structured result
  }
}

Full guide: docs/custom-providers.md

Extraction modes

deterministic — regex/heuristic extraction from instruction text. Fast, no API key needed, works well for common patterns.

ai-assisted — sends instruction files to the configured provider and asks it to classify rules as structured JSON. Requires an API-capable provider (gemini, openrouter, opencode-go).

hybrid — runs both and merges, deduplicating by normalized signature. Recommended when you have an API key.

AI extraction results are hash-keyed and cached at .ruleprobe/cache/. Use --no-cache or ruleprobe clear-cache to bust it.

Comparing extraction modes / branches

# Deterministic vs hybrid for the same file
ruleprobe compare . --provider gemini

# Branch vs base ref (useful in CI to detect rule regressions)
ruleprobe compare . --base origin/main --extractor hybrid

Multi-provider comparison

Compare how different AI providers perform against the same rule set in a single run:

ruleprobe run . --providers mock,gemini --report-dir .ruleprobe-compare

This generates a side-by-side Markdown comparison report (e.g., .ruleprobe-compare/comparison-{id}.md) showing which scenarios each provider passes or fails.

Watch mode

Automatically re-run tests when instruction files change:

ruleprobe run . --provider gemini --watch

RuleProbe watches the directories containing your instructionFiles and triggers a full re-run on any change.

Score history & trends

RuleProbe automatically tracks scores across runs in .ruleprobe/history.json. The HTML report renders a trend line chart when history exists, and the CLI prints a summary of best, worst, and average scores.

History entries include:

timestamp, score, weighted score
provider and extractor used
git branch and commit (when available)

Badge generation

Generate SVG badges for your README or CI dashboards:

# Auto-generate after a run
ruleprobe run . --provider gemini --badge

# Or generate manually
ruleprobe badge --score 85 --weighted-score 78

Outputs:

.ruleprobe/badge-score.svg — current score badge
.ruleprobe/badge-trend.svg — trend direction badge (up/down/stable)
.ruleprobe/badge.json — shields.io endpoint JSON (auto-generated on every run/badge)

Use them in your README:

![RuleProbe Score](.ruleprobe/badge-score.svg)

Shields.io dynamic badge

Host .ruleprobe/badge.json at a public URL (e.g. commit it, or serve via GitHub Pages), then use the shields.io endpoint:

[![RuleProbe](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/YOUR/REPO/main/.ruleprobe/badge.json)](https://github.com/YOUR/REPO)

The JSON format follows the shields.io endpoint spec. Fields: schemaVersion, label, message, color, style.

CI integration

Official GitHub Action (zero-config)

- uses: canblmz1/[email protected]
  with:
    provider: mock      # no API key needed
    fail-below: '70'    # block PR if score drops below 70

With Gemini for real evaluation:

- uses: canblmz1/[email protected]
  with:
    provider: gemini
    extractor: hybrid
    fail-below: '70'
  env:
    GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}

Full reference: docs/github-actions.md

Pre-commit hook

Block commits that would drop compliance below your threshold.

Husky:

# .husky/pre-commit
npx ruleprobe-ai run . --provider dry-run --extractor deterministic --fail-below 0

lefthook (lefthook.yml):

pre-commit:
  commands:
    ruleprobe:
      run: npx ruleprobe-ai run . --provider dry-run --extractor deterministic --fail-below 0

Full examples: examples/hooks/

VS Code Integration

RuleProbe writes .ruleprobe/report.sarif after each run. Open it with the SARIF Viewer extension to see compliance failures as inline squiggles on your instruction files.

# Install recommended extension (prompted automatically if you clone this repo)
# Or search VS Code extensions: MS-SarifVSCode.sarif-viewer

Run via the built-in VS Code task (Ctrl+Shift+P → Tasks: Run Task → RuleProbe: Run (mock)), or open the SARIF file manually:

Command Palette → SARIF: Open SARIF file → .ruleprobe/report.sarif

See docs/extensions.md for full setup.

Configuration

ruleprobe.config.json (auto-generated by ruleprobe init):

{
  "provider": "mock",
  "extractor": "deterministic",
  "instructionFiles": [
    "CLAUDE.md",
    "AGENTS.md",
    ".cursor/rules/*.mdc",
    ".github/copilot-instructions.md"
  ],
  "reportDir": ".ruleprobe",
  "failBelow": 70,
  "keepSandbox": false
}

Safety model

RuleProbe creates disposable sandboxes and blocks:

Path traversal and absolute-path writes
Writes to .git, .ruleprobe, node_modules
Destructive shell commands (rm, git reset, git push, package publishes)
Long-running commands via action timeouts

API key and data privacy: When using real providers (gemini, openrouter, opencode-go), your instruction file contents are sent to the provider API for extraction and/or scenario evaluation. Do not include secrets, personal data, or proprietary information in your instruction files when using third-party providers.

Recommended: add .ruleprobe/ to your .gitignore to avoid committing reports, cache, badges, and history files that may contain sensitive rule details:

echo '.ruleprobe/' >> .gitignore

Use real providers only with repositories and credentials you are comfortable testing.

Troubleshooting

API key not found: Ensure you copied .env.example to .env and filled in the required keys. Run ruleprobe doctor to verify key presence.

Provider returns no rules: Try --extractor deterministic first to verify extraction works, then add --debug-extractor for verbose output.

Typecheck fails after install: Ensure you are using pnpm (not npm or yarn). Run pnpm install --frozen-lockfile.

Windows path issues: RuleProbe normalizes paths internally. If you see path separator issues in sandbox output, report them with the full error message and your OS/Node version.

Score below threshold / exit code 1: Use --fail-below 0 to disable the threshold check while debugging.

Development

pnpm install
pnpm build        # tsup ESM + DTS
pnpm test         # vitest (175 tests)
pnpm typecheck    # tsc --noEmit
pnpm dev doctor   # local diagnostics

# Benchmark extraction corpus
pnpm dev benchmark --fixtures-only

See CONTRIBUTING.md for contribution guidelines and ROADMAP.md for planned work.

License

MIT — see LICENSE