autoai-agentwatch
v1.1.0
Published
AI Agent Observability & Security — MCP server for tracing reasoning chains, tracking costs, detecting hallucinations, and monitoring agent behavior
Maintainers
Readme
AgentWatch
AI Agent Observability & Security -- MCP server for tracing reasoning chains, tracking costs, detecting hallucinations, and monitoring agent behavior.
88% of organizations deploying AI agents have experienced security incidents, yet no tool monitors what agents are actually thinking. AgentWatch fills that gap.
Features
- Reasoning chain tracing -- Capture every step of agent thinking (thoughts, tool calls, decisions, outputs)
- Token cost tracking -- Per-agent, per-model, per-task cost tracking with daily/weekly trends
- Hallucination detection -- Compare agent outputs against source data to verify groundedness
- Behavioral anomaly detection -- Z-score analysis detects when agents deviate from learned baselines
- Performance metrics -- Latency p50/p95/p99, success rate, error classification, cost efficiency
- Alerting -- Configure thresholds for cost spikes, error rates, hallucination rates, latency
- Dashboard -- Aggregated observability view across all agents
- Baseline learning -- Gets smarter over time as behavioral patterns accumulate
- 100% local -- All data stored in SQLite on your machine. Zero external dependencies.
Install
npm install @autoailabs/agentwatchOr clone and build:
git clone https://github.com/autoailabadmin/agentwatch.git
cd agentwatch
npm install
npm run buildQuick Start -- MCP Server
Add to your Claude Code or Cursor MCP config:
{
"mcpServers": {
"agentwatch": {
"command": "npx",
"args": ["-y", "@autoailabs/agentwatch"],
"description": "AgentWatch — AI agent observability with reasoning traces, cost tracking, and hallucination detection"
}
}
}That's it. No signup. No API key. No data leaves your machine.
Or if installed locally from source:
{
"mcpServers": {
"agentwatch": {
"command": "node",
"args": ["path/to/agentwatch/dist/index.js"]
}
}
}Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| AGENTWATCH_DB_PATH | ~/.agentwatch/agentwatch.db | Custom SQLite database path |
MCP Tools
watch_trace
Trace agent reasoning chains. Start a trace, append steps, complete, or query history.
Action: start | append | complete | get | listExample -- trace an agent task:
watch_trace action=start agentId=my-agent taskId=task-123 model=claude-sonnet-4
watch_trace action=append traceId=tr_xxx stepType=thought stepContent="Analyzing input..." stepTokens=50
watch_trace action=append traceId=tr_xxx stepType=tool_call stepContent="search_docs(query='observability')"
watch_trace action=complete traceId=tr_xxx status=completed totalCostUsd=0.05watch_costs
Track token costs per agent/model/task with trend analysis.
Action: record | summary | timelineExample -- view cost breakdown:
watch_costs action=record traceId=tr_xxx agentId=my-agent model=claude-sonnet-4 inputTokens=5000 outputTokens=2000 taskId=task-123
watch_costs action=summary period=7d
watch_costs action=timeline days=30watch_hallucination_check
Verify agent outputs against source data. Returns groundedness score and flagged claims.
Action: check | statsExample -- verify an output:
watch_hallucination_check action=check agentId=my-agent agentOutput="Revenue was $10M in 2024" sourceData=["Annual report shows $10M revenue for fiscal year 2024"]
watch_hallucination_check action=stats agentId=my-agent period=7dwatch_performance
Agent performance metrics: latency percentiles, success rate, cost efficiency.
Required: agentId
Optional: period (1h | 24h | 7d | 30d)watch_anomaly
Check for behavioral anomalies vs learned baseline. Uses z-score analysis.
Required: agentId
Optional: windowMinutes (default: 60)Example -- check for anomalies:
watch_anomaly agentId=my-agent windowMinutes=120watch_alert
Configure alerting thresholds for key metrics.
Action: create | list | check | delete | toggle
Metrics: cost_per_hour | error_rate | latency_p95 | hallucination_rate | token_usage_per_request | success_rateExample -- set up cost alerting:
watch_alert action=create name="Cost Spike" metric=cost_per_hour condition=above threshold=5.0 windowMinutes=60
watch_alert action=checkwatch_dashboard
Generate aggregated observability dashboard data.
Optional: period (1h | 24h | 7d | 30d)watch_baseline
View or update agent behavior baselines. Baselines track normal patterns and improve over time.
Action: get | compute | listExample -- build a baseline:
watch_baseline action=compute agentId=my-agent windowDays=14
watch_baseline action=get agentId=my-agentExample Scenarios
Scenario 1: Monitor a multi-agent pipeline
You have 5 agents processing customer support tickets. Use AgentWatch to:
- Trace each agent's reasoning to debug unexpected outputs
- Track per-agent costs to identify which agent is most expensive
- Set alerts for cost spikes (e.g., if an agent enters a loop)
- Check hallucination rates on the summarization agent
Scenario 2: Detect a misbehaving agent
An agent starts producing longer, more expensive outputs. AgentWatch will:
- Detect the token usage anomaly via baseline comparison
- Flag the cost spike through configured alerts
- Show the behavioral shift in the dashboard
- Let you drill into specific traces to understand why
Scenario 3: Verify output quality
Before sending agent outputs to users, verify groundedness:
- Run hallucination checks against source documents
- Track groundedness scores over time
- Alert if hallucination rate exceeds threshold
- Use stats to compare agents and models
Data Storage
All data is stored in a SQLite database at ~/.agentwatch/agentwatch.db (configurable via AGENTWATCH_DB_PATH). The database uses WAL mode for concurrent read performance.
Tables:
traces-- Reasoning chain traces with step-by-step datacost_entries-- Token cost records per requesthallucination_checks-- Verification resultsperformance_snapshots-- Latency, success, token usage per requestalert_configs-- Alert threshold configurationsbaselines-- Learned behavioral baselines per agent
Development
npm install
npm run dev # Run with tsx (hot reload)
npm run build # Compile TypeScript
npm test # Run test suite
npm run lint # Type checkLicense
Apache 2.0 -- see LICENSE.
