npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@nicofains1/agentwatch

v0.3.1

Published

Multi-agent observability: cascade failure detection, heartbeats, and forensic replay

Readme

@nicofains1/agentwatch

Observability for multi-agent systems. Track heartbeats, trace cross-agent actions, detect cascade failures, and replay what went wrong.

Built for teams running fleets of AI agents (CrewAI, AutoGen, LangGraph, PocketFlow, custom) who need to understand why Agent B failed after Agent A timed out.

Try it in 30 seconds

No install needed. Run this and see a full cascade failure traced across 5 agents:

npx @nicofains1/agentwatch demo

Output:

AgentWatch Fleet Dashboard
============================================================
Agents: 5 total | 3 healthy | 1 degraded | 1 error | 0 offline

Cascade Failure (4 steps, root cause: scheduler/dispatch-batch)
============================================================
[ROOT] scheduler/dispatch-batch [ok] 15ms
       |
[  1 ] fetcher/call-api [error] 30000ms
       TIMEOUT after 30000ms
       |
[  2 ] processor/transform [error] 120ms
       Error: input is null - expected array from fetcher
       |
[FAIL] notifier/send-alert [error] 8ms
       Error: no processed data to report

Install

npm install @nicofains1/agentwatch

Quick Start

import { AgentWatch } from '@nicofains1/agentwatch';

const aw = new AgentWatch(); // creates agentwatch.db

// 1. Report heartbeats from your agents
aw.report('agent-a', 'healthy');
aw.report('agent-b', 'healthy');

// 2. Trace actions across agents
const traceId = aw.createTraceId();

const e1 = aw.trace(traceId, 'agent-a', 'fetch-data',
  'url=https://api.example.com', 'rows=150');

const e2 = aw.trace(traceId, 'agent-b', 'process',
  JSON.stringify({ rows: 150 }), 'Error: out of memory', {
    parentEventId: e1.id,
    status: 'error',
    durationMs: 4200,
  });

// 3. Find the root cause
const chain = aw.correlate(e2.id);
console.log(chain?.root_cause);
// -> { agent: 'agent-a', action: 'fetch-data', ... }

// 4. Fleet dashboard
console.log(aw.dashboardText());

Features

Heartbeat registration - Track agent health status over time. Detect stale or offline agents based on configurable thresholds.

Cross-agent tracing - Link actions across agents with trace IDs and parent event references. When agent-c fails because agent-b sent bad data that it got from agent-a, the trace shows the full chain.

Cascade failure detection - Walk backward from any failure to find the root cause across your agent fleet. correlate(failureEventId) returns the full chain from root cause to final failure.

Alert de-duplication - Same alert type from the same agent within a time window gets collapsed into one alert with an incrementing count. Severity auto-escalates: info (1x) -> warning (3x) -> critical (10x).

Fleet dashboard - One-line summary of your entire fleet: which agents are healthy, degraded, erroring, or offline. Uptime percentages and active alert counts per agent.

Forensic replay - Given a trace ID, replay all cascade chains to understand the full failure sequence.

OpenTelemetry export - Export traces as OTEL spans with GenAI semantic conventions. Plug into Jaeger, Grafana, or any OTEL-compatible backend.

MCP Server

AgentWatch works as an MCP server, so any MCP-compatible editor (Claude Code, Cursor, etc.) can use it as a tool. Add it to your MCP config:

{
  "mcpServers": {
    "agentwatch": {
      "command": "npx",
      "args": ["@nicofains1/agentwatch", "mcp"],
      "env": {
        "AGENTWATCH_DB": "/path/to/agentwatch.db"
      }
    }
  }
}

This exposes 13 tools: agentwatch_dashboard, agentwatch_report_heartbeat, agentwatch_trace, agentwatch_cascade, agentwatch_replay, agentwatch_get_alerts, agentwatch_get_failures, agentwatch_get_trace, agentwatch_fleet_health, agentwatch_create_trace_id, agentwatch_alert, agentwatch_resolve_alert, and agentwatch_dashboard_text.

CLI

npx @nicofains1/agentwatch demo                   # See it in action with sample data
npx @nicofains1/agentwatch dashboard              # Fleet health overview
npx @nicofains1/agentwatch cascade <event-id>     # Trace cascade from a failure
npx @nicofains1/agentwatch failures [agent]       # List recent failures
npx @nicofains1/agentwatch alerts [agent]         # List active alerts
npx @nicofains1/agentwatch replay <trace-id>      # Replay all cascades in a trace
npx @nicofains1/agentwatch mcp                    # Start MCP server (stdio)

Set AGENTWATCH_DB to point to your database file (default: agentwatch.db).

API

new AgentWatch(config?)

const aw = new AgentWatch({
  db_path: 'agentwatch.db',       // SQLite file path
  alert_window_minutes: 30,        // De-dup window for alerts
  heartbeat_stale_minutes: 30,     // When to mark agents as offline
});

Heartbeats

aw.report(agent, status, context?)     // status: 'healthy' | 'degraded' | 'error' | 'offline'
aw.getLatestHeartbeat(agent)           // -> Heartbeat | undefined
aw.getFleetHealth()                    // -> AgentHealth[]

Tracing

aw.createTraceId()                                // -> string (UUID)
aw.trace(traceId, agent, action, input, output, {
  parentEventId?: number,                         // link to parent event
  status?: 'ok' | 'error',                        // default: 'ok'
  durationMs?: number,                            // execution time
})                                                // -> TraceEvent
aw.getTraceEvents(traceId)                        // -> TraceEvent[]
aw.getRecentFailures(agent?, limit?)              // -> TraceEvent[]

Cascade Detection

aw.correlate(failureEventId)    // -> CascadeChain | null (walk back to root cause)
aw.replay(traceId)              // -> CascadeChain[] (all cascades in a trace)

Alerts

aw.alert(agent, alertType, message)    // auto-deduplicates within window
aw.resolveAlert(alertId)
aw.activeAlerts(agent?)                // -> Alert[]

Dashboard

aw.dashboard()      // -> DashboardOutput (structured)
aw.dashboardText()  // -> string (formatted for terminal)

OpenTelemetry Export

// Requires optional peer deps: @opentelemetry/api, @opentelemetry/sdk-trace-base
await aw.exportTraceToOtel(traceId, { serviceName: 'my-agents' });
await aw.exportRecentToOtel(1); // last 1 hour

Storage

Uses SQLite via better-sqlite3. The database file is created automatically on first use. WAL mode is enabled for concurrent reads.

Tables: heartbeats, trace_events, alerts - all with proper indexes.

License

MIT