npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

node-eventloop-watchdog

v1.1.1

Published

Production watchdog that detects Node.js event loop stalls and can trigger recovery actions before your app silently freezes

Readme

node-eventloop-watchdog

Why This Exists

Most Node monitoring tells you the event loop is slow. That is useful, but it does not answer the production question:

If the event loop is blocked, then what happens?

node-eventloop-watchdog is a small production safety layer for that exact moment. It can log, emit events, call your handler, post a webhook, exit, or terminate a stuck process so a supervisor such as Kubernetes, systemd, PM2, Docker, or a platform runtime can restart it.

What Makes It Different

| Tool category | What it usually does | Limitation | |---|---|---| | Event loop metrics | Tracks lag, averages, percentiles | Tells you something is wrong, but does not act | | Native watchdogs | Kill or supervise the process | Often require native dependencies or separate setup | | Simple timers | Detect lag after the loop resumes | Cannot handle a loop that never comes back | | node-eventloop-watchdog | Detects stalls, adds context, and can act | Zero runtime dependencies, opt-in recovery |

Ecosystem

node-eventloop-watchdog is part of a small Node.js observability ecosystem you can adopt independently or together:

  • node-actuator-lite — Spring Boot-style /actuator/health, /info, /metrics, /env, /threaddump, /heapdump, and /prometheus endpoints.
  • node-eventloop-watchdog — Detects event-loop stalls, captures stack traces and hotspots, and triggers recovery.
  • node-request-trace — Per-request timelines, browser dashboard, and CLI without OpenTelemetry.

When all three are installed:

  • This watchdog automatically registers /actuator/eventloop, /actuator/eventloop/history, /actuator/eventloop/hotspots, and /actuator/eventloop/metrics under node-actuator-lite.
  • Block events include the active request id, route, and method captured by node-request-trace.

Runnable example: node-actuator-lite/examples/ecosystem.

Quickest setup: Use node-observability-lite to wire the three packages together with production-safe presets in one line.

const observability = require('node-observability-lite');
observability.express(app, {
  preset: 'production',
  auth: req => req.get('authorization') === `Bearer ${process.env.OPS_TOKEN}`,
});

Install

npm install node-eventloop-watchdog

CommonJS and bundled TypeScript declarations are included.

const watchdog = require('node-eventloop-watchdog');

Quick Start: Observe Mode

Use start() when you want safe, backwards-compatible monitoring. It logs blocked event loop events and keeps history, metrics, hotspots, and request context.

const watchdog = require('node-eventloop-watchdog');

watchdog.start();

When a block crosses the threshold, you get a structured event:

[node-eventloop-watchdog] [WARN] Event Loop Blocked
  Duration: 142ms
  Severity: warning
  Threshold: 50ms
  Action: log
  Route: POST /checkout

  Suspected Blocking Operation
  JSON.stringify

  Location
  checkoutService.js:84

Production Mode: Protect

Use protect() when you want opinionated production behavior. It enables recovery defaults designed for apps already managed by a process supervisor.

const watchdog = require('node-eventloop-watchdog');

watchdog.protect();

Default protection behavior:

| Trigger | Default action | |---|---| | Event loop lag >= 100ms | Log warning, record metrics, emit block event | | Event loop lag >= 500ms | Mark event critical and terminate with SIGTERM | | Main event loop never resumes for 1000ms | Worker-backed hard watchdog terminates with SIGTERM |

The intended production pattern is simple: the watchdog terminates the unhealthy process, and your supervisor restarts it.

watchdog.protect({
  recovery: {
    action: 'kill',
    signal: 'SIGTERM',
    hardTimeout: 1000
  }
});

Brutal Demo

This demo intentionally freezes the main event loop forever. A normal timer-based monitor cannot recover from this because the timer callback never runs. protect() also starts a worker-backed hard watchdog, so the process can still be terminated.

node examples/brutal-demo.js
const watchdog = require('node-eventloop-watchdog');

watchdog.protect({
  criticalThreshold: 100,
  recovery: {
    enabled: true,
    action: 'kill',
    hardTimeout: 500,
    signal: 'SIGTERM'
  }
});

setTimeout(() => {
  while (true) {}
}, 2000);

Expected output:

Watchdog armed. This process will freeze in 2 seconds.
Expected result: the hard watchdog logs the stall and terminates the process.
[node-eventloop-watchdog] [ERROR] Event loop hard-stalled for 500ms. Action: kill
Terminated: 15

Trigger To Action

You can choose the action that matches your runtime:

| Action | What happens | Good for | |---|---|---| | log | Record and log the event only | Local dev, dashboards, low-risk rollout | | callback | Call recovery.handler(event) | Custom alerting or diagnostics | | webhook | POST the event as JSON | Alertmanager, incident bots, automation | | exit | Stop the monitor and call process.exit(exitCode) | Graceful process-manager restart | | kill | Send a signal to the process | Kubernetes, systemd, PM2, Docker restart | | abort | Hard watchdog aborts the process | Core dumps and severe failure analysis |

watchdog.start({
  warningThreshold: 100,
  criticalThreshold: 500,
  recovery: {
    enabled: true,
    minSeverity: 'critical',
    action: 'webhook',
    webhookUrl: 'https://alerts.example.com/event-loop-block'
  }
});
watchdog.start({
  recovery: {
    enabled: true,
    action: 'callback',
    handler(event) {
      pagerDuty.alert({
        summary: `Event loop blocked for ${event.duration}ms`,
        route: event.request?.route,
        location: event.location
      });
    }
  }
});

Real Problems This Solves

  • Infinite loops that leave a Node process alive but useless.
  • CPU-heavy synchronous code blocking requests.
  • Large JSON serialization or parsing on hot paths.
  • Synchronous filesystem, crypto, compression, or child-process calls in request handlers.
  • Stuck production servers that pass process liveness checks but stop serving traffic.
  • Incidents where you need recent block history, request correlation, and likely hotspots after recovery.

API

watchdog.start(config?)

Starts observe mode. This is the safest default for adding visibility without changing process lifecycle behavior.

watchdog.start({
  warningThreshold: 50,
  criticalThreshold: 100,
  captureStackTrace: true,
  historySize: 50,
  enableMetrics: true,
  detectBlockingPatterns: true,
  checkInterval: 20,
  logLevel: 'warn',
  jsonLogs: false,
  onBlock: null,
  recovery: false
});

watchdog.protect(config?)

Starts protect mode with opinionated recovery defaults.

watchdog.protect({
  warningThreshold: 100,
  criticalThreshold: 500,
  recovery: {
    action: 'kill',
    hardTimeout: 1000,
    signal: 'SIGTERM'
  }
});

watchdog.stop()

Stops monitoring and disables the hard watchdog worker.

watchdog.on('block', listener)

Subscribe to block events.

watchdog.on('block', (event) => {
  console.log(event.duration, event.severity, event.action.type);
});

watchdog.getStats()

Returns runtime state, lag metrics, memory snapshot, and active mode.

watchdog.getStats();
// {
//   avgLag: 12,
//   maxLag: 121,
//   minLag: 1,
//   totalBlocks: 14,
//   blocksLastMinute: 6,
//   running: true,
//   config: { mode: 'protect', warningThreshold: 100, criticalThreshold: 500, recoveryAction: 'kill' },
//   memory: { heapUsed: 42, heapTotal: 64, rss: 91, external: 2, arrayBuffers: 1 }
// }

watchdog.getRecentBlocks(count?)

Returns the most recent blocking events.

watchdog.getBlockingHotspots(limit?)

Returns best-effort user-code locations captured when blocks were detected.

watchdog.getBlockingHotspots();
// [
//   { file: 'reportService.js', line: 142, blocks: 18, maxLag: 221, avgLag: 145 },
//   { file: 'orderController.js', line: 51, blocks: 7, maxLag: 94, avgLag: 62 }
// ]

watchdog.middleware()

Returns Connect / Express-style middleware for request correlation.

const express = require('express');
const watchdog = require('node-eventloop-watchdog');

const app = express();

watchdog.start();
app.use(watchdog.middleware());

app.post('/checkout', (req, res) => {
  res.json({ ok: true });
});

Configuration

| Option | Type | Default | Description | |---|---|---|---| | mode | 'observe' \| 'protect' | 'observe' | Runtime posture | | warningThreshold | number | 50 | Lag in ms before warning | | criticalThreshold | number | 100 | Lag in ms before critical event | | captureStackTrace | boolean | true | Capture best-effort stack context | | historySize | number | 50 | Max blocking events retained | | enableMetrics | boolean | true | Collect lag and memory metrics | | detectBlockingPatterns | boolean | true | Identify likely sync blocking patterns | | checkInterval | number | 20 | Poll interval in ms | | logLevel | string | 'warn' | debug, info, warn, error, or silent | | jsonLogs | boolean | false | Emit JSON logs | | onBlock | function | null | Callback for every block | | recovery.enabled | boolean | false | Enable recovery actions | | recovery.action | string | 'log' | log, callback, webhook, exit, kill, or abort | | recovery.minSeverity | string | 'critical' | Minimum severity before action runs | | recovery.hardTimeout | number | 0 | Worker-backed timeout for never-returning stalls | | recovery.signal | string | 'SIGTERM' | Signal for kill action | | recovery.exitCode | number | 1 | Exit code for exit action | | recovery.webhookUrl | string | null | URL for webhook action | | recovery.handler | function | null | Function for callback action |

Blocking Pattern Hints

The watchdog looks for common synchronous patterns in captured stack context:

| Pattern | Category | |---|---| | JSON.stringify / JSON.parse | Serialization | | fs.readFileSync, fs.writeFileSync, etc. | Sync filesystem | | crypto.pbkdf2Sync, crypto.scryptSync, crypto.createHash | Sync crypto | | zlib.*Sync | Sync compression | | child_process.execSync, spawnSync | Sync child process | | RegExp.exec | Regex backtracking |

Important Attribution Note

Timer-based lag detection runs after the event loop resumes. Stack traces, location, userFrame, and hotspots are therefore best-effort context captured around detection time, not guaranteed blame for the exact blocking line.

For a loop that never resumes, enable recovery.hardTimeout through protect() or explicit recovery config. The hard watchdog runs in a worker thread and can terminate the process even when the main event loop is permanently stuck.

Integrations

JSON Logs

watchdog.start({ jsonLogs: true });

node-request-trace

If node-request-trace is installed, active request data is automatically attached to block events.

node-actuator-lite

If node-actuator-lite is installed, these endpoints are registered automatically:

| Endpoint | Description | |---|---| | GET /actuator/eventloop | Status, metrics, top hotspots | | GET /actuator/eventloop/history | Recent blocking events | | GET /actuator/eventloop/hotspots | Hotspot ranking | | GET /actuator/eventloop/metrics | Lag and memory metrics |

Operational Guidance

  • Use start() first when rolling out to an existing app.
  • Use protect() when the app runs under a supervisor that restarts failed processes.
  • Keep hardTimeout comfortably above normal CPU spikes to avoid killing legitimate long work.
  • Prefer SIGTERM for graceful runtime restarts; use abort only when you need crash diagnostics.
  • Run npm run bench in your own workload if overhead matters.

Development

npm ci
npm run lint
npm run typecheck
npm test
npm run test:coverage:check

The CI gate requires at least 90% coverage across statements, branches, functions, and lines.

License

MIT