agentwatch-ai

v1.8.0

Published

17 days ago

Detect failures, investigate root causes, and hill-climb fixes for AI agents. Uses your Claude Code subscription.

0High
0Medium
0Low

samuelchien821

claude-code ai-agent observability eval hill-climbing prompt-engineering agent-reliability

AgentWatch

Detect failures, investigate root causes, and hill-climb fixes for AI agents.

All AI operations use your Claude Code subscription. No API keys needed.

Install

npm install -g agentwatch-ai

Quick Start

# Scan your Claude Code sessions for issues
agentwatch quickstart

Pipeline

# 1. Scan sessions, detect issues, get health score
agentwatch scan

# 2. Root-cause investigation (uses claude CLI)
agentwatch investigate <scanId>

# 3. Generate eval test cases
agentwatch eval generate <scanId>

# 4. Run baseline eval
agentwatch eval run <scanId> --baseline

# 5. Generate a fix (CLAUDE.md rules or skill files)
agentwatch revise <scanId>

# 6. Run eval with fix applied
agentwatch eval run <scanId> --revision <revisionId>

# 7. Multi-round hill-climb optimization
agentwatch hillclimb <scanId> --rounds 3 --variants 3

# 8. Publish results to dashboard
agentwatch publish <scanId>

What It Detects

Bash exploration waste - using Bash for file reading instead of Read tool
Doom loops - consecutive identical tool calls, polling loops
Edit-before-Read failures - editing files without reading them first
High exploration ratio - too much Bash relative to actual edits
Polling waste - sleep loops instead of background execution
Low completion rate - sessions that never commit

How It Works

Scan reads your local Claude Code sessions (~/.claude/projects/)
Detect rules identify behavioral anti-patterns from session metrics
Investigate uses Claude to analyze root causes of each issue
Eval auto-generates test cases that measure agent behavior
Revise generates CLAUDE.md rules or skill files to fix issues
Hill-climb runs multiple rounds of revision + eval to optimize

Requirements

Node.js 20+
Claude Code CLI (claude command)
Claude Code subscription (for investigation, eval, and revision steps)

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme