npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

autoai-agentwatch

v1.1.0

Published

AI Agent Observability & Security — MCP server for tracing reasoning chains, tracking costs, detecting hallucinations, and monitoring agent behavior

Readme

AgentWatch

AI Agent Observability & Security -- MCP server for tracing reasoning chains, tracking costs, detecting hallucinations, and monitoring agent behavior.

88% of organizations deploying AI agents have experienced security incidents, yet no tool monitors what agents are actually thinking. AgentWatch fills that gap.

Features

  • Reasoning chain tracing -- Capture every step of agent thinking (thoughts, tool calls, decisions, outputs)
  • Token cost tracking -- Per-agent, per-model, per-task cost tracking with daily/weekly trends
  • Hallucination detection -- Compare agent outputs against source data to verify groundedness
  • Behavioral anomaly detection -- Z-score analysis detects when agents deviate from learned baselines
  • Performance metrics -- Latency p50/p95/p99, success rate, error classification, cost efficiency
  • Alerting -- Configure thresholds for cost spikes, error rates, hallucination rates, latency
  • Dashboard -- Aggregated observability view across all agents
  • Baseline learning -- Gets smarter over time as behavioral patterns accumulate
  • 100% local -- All data stored in SQLite on your machine. Zero external dependencies.

Install

npm install @autoailabs/agentwatch

Or clone and build:

git clone https://github.com/autoailabadmin/agentwatch.git
cd agentwatch
npm install
npm run build

Quick Start -- MCP Server

Add to your Claude Code or Cursor MCP config:

{
  "mcpServers": {
    "agentwatch": {
      "command": "npx",
      "args": ["-y", "@autoailabs/agentwatch"],
      "description": "AgentWatch — AI agent observability with reasoning traces, cost tracking, and hallucination detection"
    }
  }
}

That's it. No signup. No API key. No data leaves your machine.

Or if installed locally from source:

{
  "mcpServers": {
    "agentwatch": {
      "command": "node",
      "args": ["path/to/agentwatch/dist/index.js"]
    }
  }
}

Environment Variables

| Variable | Default | Description | |----------|---------|-------------| | AGENTWATCH_DB_PATH | ~/.agentwatch/agentwatch.db | Custom SQLite database path |

MCP Tools

watch_trace

Trace agent reasoning chains. Start a trace, append steps, complete, or query history.

Action: start | append | complete | get | list

Example -- trace an agent task:

watch_trace action=start agentId=my-agent taskId=task-123 model=claude-sonnet-4
watch_trace action=append traceId=tr_xxx stepType=thought stepContent="Analyzing input..." stepTokens=50
watch_trace action=append traceId=tr_xxx stepType=tool_call stepContent="search_docs(query='observability')"
watch_trace action=complete traceId=tr_xxx status=completed totalCostUsd=0.05

watch_costs

Track token costs per agent/model/task with trend analysis.

Action: record | summary | timeline

Example -- view cost breakdown:

watch_costs action=record traceId=tr_xxx agentId=my-agent model=claude-sonnet-4 inputTokens=5000 outputTokens=2000 taskId=task-123
watch_costs action=summary period=7d
watch_costs action=timeline days=30

watch_hallucination_check

Verify agent outputs against source data. Returns groundedness score and flagged claims.

Action: check | stats

Example -- verify an output:

watch_hallucination_check action=check agentId=my-agent agentOutput="Revenue was $10M in 2024" sourceData=["Annual report shows $10M revenue for fiscal year 2024"]
watch_hallucination_check action=stats agentId=my-agent period=7d

watch_performance

Agent performance metrics: latency percentiles, success rate, cost efficiency.

Required: agentId
Optional: period (1h | 24h | 7d | 30d)

watch_anomaly

Check for behavioral anomalies vs learned baseline. Uses z-score analysis.

Required: agentId
Optional: windowMinutes (default: 60)

Example -- check for anomalies:

watch_anomaly agentId=my-agent windowMinutes=120

watch_alert

Configure alerting thresholds for key metrics.

Action: create | list | check | delete | toggle
Metrics: cost_per_hour | error_rate | latency_p95 | hallucination_rate | token_usage_per_request | success_rate

Example -- set up cost alerting:

watch_alert action=create name="Cost Spike" metric=cost_per_hour condition=above threshold=5.0 windowMinutes=60
watch_alert action=check

watch_dashboard

Generate aggregated observability dashboard data.

Optional: period (1h | 24h | 7d | 30d)

watch_baseline

View or update agent behavior baselines. Baselines track normal patterns and improve over time.

Action: get | compute | list

Example -- build a baseline:

watch_baseline action=compute agentId=my-agent windowDays=14
watch_baseline action=get agentId=my-agent

Example Scenarios

Scenario 1: Monitor a multi-agent pipeline

You have 5 agents processing customer support tickets. Use AgentWatch to:

  1. Trace each agent's reasoning to debug unexpected outputs
  2. Track per-agent costs to identify which agent is most expensive
  3. Set alerts for cost spikes (e.g., if an agent enters a loop)
  4. Check hallucination rates on the summarization agent

Scenario 2: Detect a misbehaving agent

An agent starts producing longer, more expensive outputs. AgentWatch will:

  1. Detect the token usage anomaly via baseline comparison
  2. Flag the cost spike through configured alerts
  3. Show the behavioral shift in the dashboard
  4. Let you drill into specific traces to understand why

Scenario 3: Verify output quality

Before sending agent outputs to users, verify groundedness:

  1. Run hallucination checks against source documents
  2. Track groundedness scores over time
  3. Alert if hallucination rate exceeds threshold
  4. Use stats to compare agents and models

Data Storage

All data is stored in a SQLite database at ~/.agentwatch/agentwatch.db (configurable via AGENTWATCH_DB_PATH). The database uses WAL mode for concurrent read performance.

Tables:

  • traces -- Reasoning chain traces with step-by-step data
  • cost_entries -- Token cost records per request
  • hallucination_checks -- Verification results
  • performance_snapshots -- Latency, success, token usage per request
  • alert_configs -- Alert threshold configurations
  • baselines -- Learned behavioral baselines per agent

Development

npm install
npm run dev       # Run with tsx (hot reload)
npm run build     # Compile TypeScript
npm test          # Run test suite
npm run lint      # Type check

License

Apache 2.0 -- see LICENSE.