agent-gauntlet
v0.2.0
Published
A CLI tool for testing AI coding agents
Readme
Agent Gauntlet
Don't just review the agent's code — put it through the gauntlet.
Agent Gauntlet is a configurable “feedback loop” runner for AI-assisted development workflows.
You configure which paths in your repo should trigger which validations — shell commands like tests and linters, plus AI-powered code reviews. When files change, Gauntlet automatically runs the relevant validations and reports results.
For AI reviews, it uses the CLI tool of your choice: Gemini, Codex, Claude Code, GitHub Copilot, or Cursor.
Features
- Agent validation loop: Keep your coding agent on track with automated feedback loops. Detect problems — deterministically and/or non-deterministically — and let your agent fix and Gauntlet verify.
- Multi-agent collaboration: Enable one AI agent to automatically request code reviews from another. For example, if Claude made changes, Gauntlet can request a review from Codex or Gemini — spreading token usage across your subscriptions instead of burning through one.
- Leverage existing subscriptions: Agent Gauntlet is free and tool-agnostic, leveraging the AI CLI tools you already have installed.
- Easy CI setup: Define your checks once, run them locally and in GitHub.
Usage Patterns
Agent Gauntlet supports three primary usage patterns, each suited for different development workflows:
- Run CLI:
agent-gauntlet run - Run agent command:
/gauntlet - Automatically run after agent completes task
The use cases below illustrate when each of these patterns may be used.
1. Planning Mode
Use case: Generate and review high-level implementation plans before coding.
Problem Gauntlet solves: Catch architectural issues and requirement misunderstandings before coding to avoid costly rework.
Workflow:
- Create a plan document in your project directory
- Run
agent-gauntlet runfrom the terminal - Gauntlet detects the new or modified plan and invokes configured AI CLIs to review it
- (Optional) Ask your assistant to refine the plan based on review feedback
Note: Review configuration and prompts are fully customizable. Example prompt: "Review this plan for completeness and potential issues."
2. AI-Assisted Development
Use case: Pair with an AI coding assistant to implement features with continuous quality checks.
Problem Gauntlet solves: Catch AI-introduced bugs and quality issues through automated checks and multi-LLM review.
Workflow:
- Collaborate with your assistant to implement code changes
- Run
/gauntletfrom chat - Gauntlet detects changed files and runs configured checks (linter, tests, type checking, etc.)
- Simultaneously, Gauntlet invokes AI CLIs for code review
- Assistant reviews results, fixes identified issues, and runs
agent-gauntlet rerun - Gauntlet verifies fixes and checks for new issues
- Process repeats automatically (up to 3 reruns) until all gates pass
3. Agentic Implementation
Use case: Delegate well-defined tasks to a coding agent for autonomous implementation.
Problem Gauntlet solves: Enable autonomous agent development with built-in quality gates, eliminating the validation gap when humans aren't in the loop.
Workflow:
- Configure your agent to automatically run
/gauntletafter completing implementation:- Rules files: Add to
.cursorrules,AGENT.md, or similar - Custom commands: Create a
/my-dev-workflowthat includes gauntlet - Git hooks: Use pre-commit hooks to trigger gauntlet
- Agent hooks: Leverage platform features (e.g., Claude's Stop event)
- Rules files: Add to
- Assign the task to your agent and step away
- When you return: the task is complete, reviewed by a different LLM, all issues fixed, and CI checks passing
Benefit: Fully autonomous quality assurance without manual intervention.
Quick Start
- Install:
bun add -g agent-gauntlet - Initialize:
agent-gauntlet init - Run:
agent-gauntlet run
For basic usage and configuration guide, see the Quick Start Guide.
Documentation
- Quick Start Guide — installation, basic usage, and config layout
- User Guide — full usage details
- Configuration Reference — all configuration fields + defaults
- CLI Invocation Details — how we securely invoke AI CLIs
- Development Guide — how to build and develop this project
