guardrails-ref

v1.3.6

Published

2 months ago

Validate and manage Agent Guardrails (GUARDRAIL.md) — init, add, remove, setup, validate, check, upgrade, diff, list, why, scaffold, test, health, watch, presets, suggest-pack, explain, report, attack-test, drift, snapshot

0High
0Medium
0Low

abramelin

agent guardrails ai cursor claude copilot vscode

guardrails-ref

CLI for Agent Guardrails — init, add, remove, setup, validate, check, upgrade, diff, list, why, scaffold, test, health, watch, presets, suggest-pack, explain, report, attack-test, drift, and snapshot for GUARDRAIL.md files.

Why?

AI coding agents (Cursor, Claude Code, etc.) don't remember across sessions. Guardrails give them persistent constraints: "never do this." Write rules once, they apply every chat.

Quick start

npx guardrails-ref init

Creates .agents/guardrails/, adds the no-plaintext-secrets example, and configures Cursor, Claude Code, VS Code Copilot, Windsurf, Continue, and JetBrains to read your guardrails. No global install needed.

Note: IDEs don't yet recognize guardrails natively. The setup command adds a rule so the AI reads them. Once IDEs add support, this won't be needed.

User-level guardrails: Use --user or path ~ to work with ~/.agents/guardrails/ (applies across all projects). Example: npx guardrails-ref init --user.

Security model

The guardrails-ref CLI is designed to be predictable and supply-chain friendly:

Network usage is explicit — the CLI does not make network requests except for attack-test, which sends requests only to the --target URL you provide.
Process spawning is minimal and explicit — watch spawns the current Node executable to re-run check and test (no arbitrary command execution).
Local-only filesystem writes — writes are limited to:
- Project-level .agents/guardrails/ (or user-level ~/.agents/guardrails/)
- IDE configuration files in the current project (e.g. .cursor/rules/agent-guardrails.md, .claude/instructions.md, .github/copilot-instructions.md, .windsurfrules, .continue/rules/agent-guardrails.md, .aiassistant/rules/agent-guardrails.md, .junie/guidelines.md)
Opt-in user scope — user-level guardrails are only written when you pass --user or use the ~ path explicitly.
Dry-run for write operations — commands that may modify files support --dry-run to show what would change without writing.
Read-only mode — set GUARDRAILS_REF_READONLY=1 (or true) in the environment to force all commands into a non-writing mode; the CLI behaves as if --dry-run is enabled wherever applicable.
Audit mode — set GUARDRAILS_REF_DEBUG=1 (or true/yes) or use --debug to log every filesystem read/write path to stderr.

Commands

| Command | Description | |---------|-------------| | npx guardrails-ref init [path] | Create .agents/guardrails/, add no-plaintext-secrets, configure IDEs | | npx guardrails-ref init --preset default [path] | Add preset instead of single example (e.g. default, security) | | npx guardrails-ref init --minimal [path] | Create .agents/guardrails/ only (no example, no setup) | | npx guardrails-ref init --user | Create ~/.agents/guardrails/ (user-level; setup is project-specific) | | npx guardrails-ref init --policy default [path] | Also write a starter .guardrails-ref.json policy file using the recommended defaults and, when no --preset is passed, install a recommended preset bundle based on the detected stack (same logic as suggest-pack); respects --dry-run/GUARDRAILS_REF_READONLY and does not overwrite an existing .guardrails-ref.json | | npx guardrails-ref init --dry-run [path] | Preview what would be created without writing | | npx guardrails-ref add <name> [name2 ...] [path] | Add example guardrail(s) — pass multiple names to add several at once (names must be lowercase kebab-case: letters/numbers/hyphens) | | npx guardrails-ref add --preset default | Add default preset (4 guardrails) | | npx guardrails-ref add --preset default,frontend | Add multiple presets (comma-separated) | | npx guardrails-ref add --preset security | Add security preset (15 guardrails) | | npx guardrails-ref add --preset quality | Add quality preset (11 guardrails) | | npx guardrails-ref add --preset frontend | Add frontend preset (7 guardrails) | | npx guardrails-ref add --preset api | Add API preset (5 guardrails) | | npx guardrails-ref add --preset backend | Add backend preset (backend services: secrets, access control, DB, versioning, logging) | | npx guardrails-ref add --preset data | Add data preset (data/analytics: PII, secrets, leaks, rate limiting, logging) | | npx guardrails-ref add --preset production | Add production preset (12 guardrails) | | npx guardrails-ref add <name> --user or add <name> ~ | Add to user-level ~/.agents/guardrails/ | | npx guardrails-ref add --dry-run <name> | Preview what would be added without writing | | npx guardrails-ref remove <name> [path] | Remove a guardrail (names must be lowercase kebab-case: letters/numbers/hyphens) | | npx guardrails-ref remove <name> --user or remove <name> ~ | Remove from user-level | | npx guardrails-ref remove <name> --dry-run [path] | Preview what would be removed without writing | | npx guardrails-ref setup [path] | Add the guardrail rule to Cursor, Claude Code, VS Code Copilot, Windsurf, Continue, JetBrains | | npx guardrails-ref setup --remove [path] | Remove the guardrail rule from IDE configs | | npx guardrails-ref setup --pre-commit [path] | Add guardrails check to pre-commit hook (Husky or pre-commit) | | npx guardrails-ref setup --ide <name> [path] | Target IDE: cursor, claude, copilot, windsurf, continue, jetbrains, junie, or auto | | npx guardrails-ref setup --dry-run [path] | Show what would be added/removed without writing files | | npx guardrails-ref setup --check [path] | Show which IDEs are configured and whether they have the rule | | npx guardrails-ref setup --check --fail-if-missing [path] | Exit 1 if configured IDE lacks rule (CI) | | npx guardrails-ref validate [path] | Validate GUARDRAIL.md files (use --json for JSON, --format diagnostic for gcc-style editor output, --strict to fail on warnings, --fix to apply fixes, --require-guardrails to fail when none are found) | | npx guardrails-ref validate --fix --dry-run [path] | Preview which files would be fixed without writing | | npx guardrails-ref validate --user or validate ~ | Validate user-level guardrails | | npx guardrails-ref check [path] | Validate with minimal output (CI-friendly; use --format diagnostic for Problems-tab output, --strict to fail on warnings, --require-guardrails to fail when none are found, --explain to print remediation hints; in diagnostic mode, explain output is printed to stderr to keep stdout parseable) | | npx guardrails-ref upgrade [path] | Update installed guardrails to latest templates (use --dry-run to preview, --diff to show changes) | | npx guardrails-ref upgrade --user or upgrade ~ | Upgrade user-level guardrails | | npx guardrails-ref diff [path] | Show diff between installed guardrails and latest templates | | npx guardrails-ref drift [path] | Detect drift between installed guardrails and reference templates (use --diff to show patches; uses config.test.requireGuardrails as expected baseline by default) | | npx guardrails-ref list [path] | List discovered guardrails (use --json for JSON, --compact for one per line) | | npx guardrails-ref list --user or list ~ | List user-level guardrails | | npx guardrails-ref why <name> | Show guardrail template content (use --json for machine-readable; names must be lowercase kebab-case) | | npx guardrails-ref suggest-pack [path] | Suggest presets and a starter .guardrails-ref.json policy based on detected stack (frontend/backend/data) | | npx guardrails-ref explain [path] | Summarize installed guardrails in human-readable form (use --json for JSON; supports --user) | | npx guardrails-ref report [path] | Create a markdown report scaffold that you can append test --report markdown --badge output to (supports --dry-run / GUARDRAILS_REF_READONLY) | | npx guardrails-ref scaffold <name> | Create a new guardrail skeleton with frontmatter and Trigger/Instruction/Reason sections (use --interactive to prompt for fields) | | npx guardrails-ref snapshot [path] | Write a deterministic JSON snapshot of installed guardrails + effective config + current test results (use --json to print, --out to write, --dry-run to preview) | | npx guardrails-ref test [path] | Run safety checks; prints score (e.g. 5/8, 62%). Use --json for scorePercent and attackCoverage; supports --min-score, --require-categories, --require-guardrails, --report markdown|html, --badge, and --explain (with --json, adds a remediation object on failure) | | npx guardrails-ref attack-test --target <url> | Run adversarial attack suite against an agent HTTP endpoint (expects POST JSON { "prompt": string }). Use --suite basic|secrets|prompt or --suite-file to load a local JSON suite. | | npx guardrails-ref watch [path] | Watch guardrails and rerun check and test --suggest when GUARDRAIL files change (watches .agents/guardrails/ and root GUARDRAILS.md) | | npx guardrails-ref --debug <command> | Log every filesystem read/write path (for auditing); or GUARDRAILS_REF_DEBUG=1 | | npx guardrails-ref presets | List all preset bundles and their guardrails (use --json for JSON) | | npx guardrails-ref health [path] | Combined health check: setup status + guardrails check + safety test (use --json for CI/dashboards) |

Supported IDEs

Cursor — via .cursor/rules/ or .cursorrules
Claude Code — via .claude/instructions.md
VS Code Copilot — via .github/copilot-instructions.md
Windsurf — via .windsurfrules
Continue — via .continue/rules/agent-guardrails.md
JetBrains AI Assistant — via .aiassistant/rules/agent-guardrails.md
JetBrains Junie — via .junie/guidelines.md

CI/CD

Use check --strict --require-guardrails in GitHub Actions to fail on warnings and when no guardrails are present:

- name: Validate guardrails
  run: npx guardrails-ref check . --strict --format diagnostic --require-guardrails

Or with full output or JSON:

- name: Validate guardrails
  run: npx guardrails-ref validate . --json

Run safety checks in CI (exit 0 if all pass; JSON includes scorePercent 0–100 and attackCoverage; use thresholds and required guardrails to enforce budgets and policy):

- name: Safety checks
  run: npx guardrails-ref test . --json --min-score 80 --require-categories secrets,destructive,prompt --require-guardrails no-plaintext-secrets,no-destructive-commands,no-prompt-leaks,tools-permissions

Note: list (without --json/--compact) exits with code 1 when no guardrails are found. list --json and list --compact exit 0 and return an empty list when none are found.

Project policy (`.guardrails-ref.json`)

You can configure defaults for validate, check, and test by adding .guardrails-ref.json at the project root. For example:

{
  "validate": {
    "requireGuardrails": true
  },
  "check": {
    "requireGuardrails": true
  },
  "test": {
    "minScorePercent": 80,
    "requireCategories": ["presence", "secrets", "destructive", "tools", "prompt"],
    "requireGuardrails": [
      "no-plaintext-secrets",
      "no-destructive-commands",
      "no-new-deps-without-approval",
      "require-commit-approval",
      "no-prompt-leaks",
      "tools-permissions",
      "require-logging-standards"
    ]
  }
}

You can also inherit from shared policy packs using the optional top-level extends field. Each entry should point to a directory that contains its own .guardrails-ref.json:

{
  "extends": ["../.org-guardrails/policy"],
  "validate": {
    "requireGuardrails": true
  },
  "check": {
    "requireGuardrails": true
  },
  "test": {
    "minScorePercent": 80
  }
}

loadConfig will load and deep-merge configs from each extended directory (packs) before applying local overrides, so org-wide policies and per-repo tweaks compose cleanly.

Profiles (dev / ci / prod)

You can define per-environment overlays in the same config file:

{
  "test": { "minScorePercent": 60 },
  "profiles": {
    "ci": { "test": { "minScorePercent": 80 } },
    "prod": { "test": { "minScorePercent": 90 } }
  }
}

Select a profile with --profile ci (global flag) or GUARDRAILS_REF_PROFILE=ci. Resolution order is:

base config (with extends) → profile overlay → CLI flags

Lint rules (optional)

For extra static quality checks on guardrail bodies, use --lint-rules with validate or check. Lints are surfaced as warnings in default output, diagnostic output, and --json.

Examples

# Project-level (default)
npx guardrails-ref init
npx guardrails-ref add no-destructive-commands no-hardcoded-urls
npx guardrails-ref add no-new-deps-without-approval
npx guardrails-ref why no-destructive-commands
npx guardrails-ref validate .
npx guardrails-ref list .
npx guardrails-ref test .

# User-level (~/.agents/guardrails/)
npx guardrails-ref init --user
npx guardrails-ref add no-plaintext-secrets --user
npx guardrails-ref list --user
npx guardrails-ref validate ~

Runtime limits helper (loop & cost protection)

For agent runtimes, you can use a small helper to enforce limits on reasoning steps, tool calls, and cost:

// Import from the compiled dist path when using guardrails-ref as a dependency:
import { createAgentLimits } from "guardrails-ref/dist/runtime-limits.js";

const limits = createAgentLimits({
  maxSteps: 20,
  maxToolCalls: 10,
  maxCost: 2.0, // e.g. $2.00
});

for (;;) {
  limits.step();
  // ... agent reasoning ...

  limits.toolCall();
  // ... call a tool ...

  limits.addCost(estimateCostForThisStep());
}

If any configured limit is exceeded, the helper throws an error such as:

Guardrails: maxToolCalls exceeded (11 > 10)

Available guardrails (add command)

40 reference guardrails; add with npx guardrails-ref add <name> or use presets (e.g. add --preset security for 15 guardrails).

| Name | What it prevents | |------|------------------| | no-plaintext-secrets | Logging or committing credentials | | no-pii-in-output | Unredacted PII in logs, API responses, or reports | | resist-instruction-override | Complying with "ignore instructions" or prompt-injection overrides | | no-placeholder-credentials | Fake or placeholder API keys instead of asking for real values | | no-silent-error-handling | Catching errors without surfacing them to the user | | require-access-control | Exposing sensitive data or admin actions without role checks | | artifact-verification | Destructive ops without plan.md and audit log | | context-rotation | Continuing in polluted context; reset when 80% full or 10+ errors | | database-migrations | Direct schema changes instead of migrations | | no-destructive-commands | rm -rf, DROP TABLE, TRUNCATE without approval | | no-eval-or-dynamic-code | eval(), new Function(), or dynamic code execution | | no-new-deps-without-approval | New packages without approval | | privilege-boundaries | Touching node_modules, .git, lockfiles, .env without approval | | require-commit-approval | git commit or push without explicit user approval | | no-hardcoded-urls | Hardcoded API URLs, base URLs, endpoints | | no-sudo-commands | sudo/su/root commands without approval | | rate-limiting | Runaway tool calls and API loops | | no-console-in-production | console.log in production code | | require-tests | Merging code without tests | | prefer-existing-code | Reimplementing when existing code or helpers exist | | no-inline-styles | Inline style= in HTML/JSX | | no-raw-sql | Raw SQL without parameterization | | no-magic-numbers | Unexplained numeric literals | | no-modifying-git-history | git push --force, destructive rebase without approval | | no-deprecated-apis | Suggesting deprecated or obsolete APIs | | no-unsafe-env-assumptions | Assuming env vars exist without validation | | no-hardcoded-user-facing-strings | Hardcoded labels, messages, errors in UI | | require-accessibility | Missing alt text, ARIA, keyboard support, or contrast in UI | | require-api-resilience | API calls without timeouts, retries, or error handling | | require-documentation-updates | Changing behavior without updating README, docs, or changelog | | no-breaking-changes-without-versioning | Breaking public APIs without semver bump or migration path | | no-path-traversal | User-controlled paths without validation (.., symlinks outside base) | | no-unsafe-html-injection | Raw dangerouslySetInnerHTML or unsanitized HTML (XSS) | | no-client-only-access-control | Authorization only in the client; server must re-validate | | require-loading-and-error-states | Async UI without loading and error states | | require-form-validation | Forms without validation, field-level errors, or preserved input on error | | require-design-tokens | Hardcoded colors, spacing, or typography instead of design tokens | | no-prompt-leaks | Leaking internal prompts, system messages, or guardrails into code/logs/docs | | require-logging-standards | Logging without structure, clear levels, or protection against secrets/PII | | tools-permissions | Unsafe or overly powerful tools without allow lists, thresholds, or approvals |

Use npx guardrails-ref add --list to see all available guardrails. Use npx guardrails-ref why <name> to show a guardrail's full content (from templates).

Troubleshooting

"Unknown guardrail" — Run npx guardrails-ref add --list to see available guardrail names
Setup not working — Try npx guardrails-ref setup --remove then npx guardrails-ref setup again

License

MIT — GitHub

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

guardrails-ref

Why?

Quick start

Security model

Commands

Supported IDEs

CI/CD

Project policy (.guardrails-ref.json)

Profiles (dev / ci / prod)

Lint rules (optional)

Examples

Runtime limits helper (loop & cost protection)

Available guardrails (add command)

Troubleshooting

Links

License

Project policy (`.guardrails-ref.json`)