@fail-kit/cli
v2.0.0
Published
Forensic Audit of Intelligent Logic - CLI for auditing AI agents
Maintainers
Readme
F.A.I.L. Kit CLI
Forensic Audit of Intelligent Logic
The F.A.I.L. Kit CLI is a developer-first tool for running forensic audits on AI agents. It tests whether your agent actually does what it claims — no trace, no ship.
Installation
# Install globally via npm
npm install -g @fail-kit/cli
# Verify installation
fail-audit --versionQuick Start
# Initialize configuration (interactive)
fail-audit init
# Auto-generate test cases from your codebase
fail-audit scan
# Run the audit with interactive dashboard
fail-audit run --format dashboardZero manual test writing required. The scan command automatically analyzes your codebase and generates test cases.
Middleware Packages
For easy integration with your framework:
# Next.js
npm install @fail-kit/middleware-nextjs
# Express
npm install @fail-kit/middleware-expressSee Easy Integration Guide for setup instructions.
Commands
fail-audit init
Initialize a new audit configuration with an interactive wizard.
fail-audit init # Interactive setup
fail-audit init --yes # Use defaults (CI mode)
fail-audit init --framework express # Specify framework
fail-audit init --install # Auto-install middleware
fail-audit init --endpoint <url> # Set endpoint directly
fail-audit init --test # Test endpoint after setupfail-audit scan
NEW: Scan your codebase and auto-generate test cases.
fail-audit scan # Scan current directory
fail-audit scan --path ./src # Scan specific directory
fail-audit scan --dry-run # Preview without saving
fail-audit scan --verbose # Show detailed results
fail-audit scan --run # Scan and immediately run auditThe scanner detects:
- API endpoints (Next.js, Express, FastAPI)
- Agent functions (query, generate, process, estimate)
- Tool calls (database, HTTP, file, email)
- LLM invocations (OpenAI, Anthropic, etc.)
fail-audit run
Run the forensic audit against your agent.
fail-audit run # Run all tests
fail-audit run --level smoke # Run smoke tests only
fail-audit run --level interrogation # Run behavioral tests
fail-audit run --level red-team # Run adversarial tests
fail-audit run --case CONTRACT_0001 # Run specific test
fail-audit run --format dashboard # Decision-grade interactive report (NEW)
fail-audit run --format html # Detailed HTML report
fail-audit run --format junit # JUnit XML for CI/CD
fail-audit run --ci # CI mode (no colors)Smart defaults: If no test cases exist, run automatically invokes scan first.
fail-audit report
Generate reports from audit results.
fail-audit report results.json # Generate HTML report
fail-audit report results.json --format dashboard # Generate interactive dashboard
fail-audit report results.json --format markdown # Generate Markdown
fail-audit report results.json --format junit # Generate JUnit XML
fail-audit report results.json --output report.htmlfail-audit doctor
Diagnose common setup issues.
fail-audit doctor # Run all checks
fail-audit doctor --skip-network # Skip connectivity checkfail-audit generate
Generate custom test cases from your tool definitions.
fail-audit generate --tools tools.json
fail-audit generate --tools tools.json --output ./my-casesConfiguration
The CLI uses fail-audit.config.json in your project root:
{
"endpoint": "http://localhost:8000/eval/run",
"timeout": 30000,
"cases_dir": "./cases",
"output_dir": "./audit-results",
"levels": {
"smoke_test": true,
"interrogation": true,
"red_team": true
}
}Environment Variables
Override config values with environment variables:
| Variable | Config Key | Description |
|----------|-----------|-------------|
| FAIL_AUDIT_ENDPOINT | endpoint | Agent endpoint URL |
| FAIL_AUDIT_TIMEOUT | timeout | Request timeout (ms) |
| FAIL_AUDIT_CASES_DIR | cases_dir | Test cases directory |
| FAIL_AUDIT_OUTPUT_DIR | output_dir | Results output directory |
CI/CD Integration
Report Formats
F.A.I.L. Kit supports multiple output formats:
| Format | Description | Best For |
|--------|-------------|----------|
| dashboard | NEW v1.5.0: Decision-grade interactive report with ship decision, failure buckets, root causes, provenance, and keyboard navigation | Development, fixing failures, stakeholder decisions |
| html | Detailed HTML report with error explanations and source locations | Development, debugging, documentation |
| json | Raw results data | Programmatic analysis, custom tooling |
| junit | JUnit XML format | CI/CD integration, test runners |
| markdown | Markdown report | GitHub/GitLab comments, documentation |
Dashboard Report Features (v1.5.0):
- Ship Decision Block (BLOCK/NEEDS REVIEW/SHIP) with reason and next action
- Failure Buckets (receipt/evidence/policy/tool/validation) for 5-second triage
- Top 3 Root Causes auto-generated from failure patterns
- Interactive timeline with hover tooltips and failure clustering
- Enhanced forensic details: assertion, diff, fix hint, doc link
- Run context & provenance (git hash, versions, receipt verification)
- Keyboard navigation (j/k), copy buttons, VSCode deep links
- Deterministic severity: Critical blocks ship, High needs review, Medium acceptable, Low deferrable
See Severity Guide for detailed severity explanations.
Examples:
# Interactive dashboard (recommended for development)
fail-audit run --format dashboard
# Detailed HTML report with debugging info
fail-audit run --format html
# CI/CD with JUnit XML
fail-audit run --format junit --ciGitHub Actions
- name: Install F.A.I.L. Kit
run: npm install -g @fail-kit/cli
- name: Run audit
run: fail-audit scan && fail-audit run --ci --format junitSee CI/CD Guide for complete examples.
GitLab CI
fail-audit:
script:
- npm install -g @fail-kit/cli
- fail-audit scan
- fail-audit run --ci --format junit
artifacts:
reports:
junit: audit-results/*.xmlOutput Formats
| Format | Extension | Use Case |
|--------|----------|----------|
| json | .json | Raw data, further processing |
| html | .html | Shareable visual reports |
| junit | .xml | CI/CD test reporting |
| markdown | .md | PR comments, documentation |
Exit Codes
| Code | Meaning |
|------|---------|
| 0 | All tests passed |
| 1 | One or more tests failed |
Audit Levels
- Smoke Test: Basic contract validation, benign inputs
- Interrogation: Behavioral testing, edge cases, action verification
- Red Team: Adversarial attacks, injection attempts, policy bypass
Examples
Complete working examples are available:
Links
No trace, no ship.
