node-runtime-guardian

v0.2.0

Published

3 months ago

Production-grade runtime health and protection engine for Node.js applications

Downloads

0High
0Medium
0Low

zeeshan2k1

nodejs runtime monitoring performance health event-loop memory gc thread-pool

Node Runtime Guardian is a lightweight, dependency-free runtime diagnostics and protection layer for Node.js. It runs inside your process to observe event loop health, memory behavior, garbage collection pressure, and thread pool saturation — and can optionally shed load when your application is under stress.

This project is intentionally calm, explicit, and production-focused. It is built for backend developers who want to understand how the Node.js runtime behaves under real load, not just what dashboards report after the fact.

Why This Exists

Most Node.js production issues don't show up as errors.

They show up as:

Latency spikes with no obvious cause
Memory usage slowly creeping upward
GC pauses causing jittery response times
Pods restarting due to OOM without clear signals
Services degrading under burst traffic

By the time traditional monitoring alerts fire, the system is often already unhealthy.

Node Runtime Guardian was built to observe these problems from inside the Node.js runtime itself — where the event loop, garbage collector, and thread pool actually live.

When Should You Use This?

Node Runtime Guardian is useful if:

You suspect event loop blocking or synchronous CPU work
Your service slows down under load without throwing errors
Memory usage grows steadily and heap snapshots aren't obvious
You want runtime insight without a full APM stack
You want a last-line defense against cascading failures

It is especially well-suited for:

High-traffic API servers
Background workers and job processors
Memory-constrained containers
High-throughput Node.js services

Features

Event Loop Monitoring: Track event loop delay using perf_hooks
Memory Monitoring: Detect memory leaks and drift patterns
GC Pressure Estimation: Monitor garbage collection behavior
Thread Pool Saturation Detection: Infer thread pool queue buildup
Load Shedding: Automatic request rejection when thresholds are exceeded
Metrics Endpoint: HTTP endpoint for runtime metrics
Alerting: Push warnings and anomalies to Slack, PagerDuty, or any webhook
Plugin System: Extensible architecture for custom integrations
Worker Thread Pool: Managed worker pool utility
Zero Dependencies: Uses only Node.js built-in modules

Installation

npm install node-runtime-guardian

Quick Start

import { Guardian } from 'node-runtime-guardian';

// Initialize Guardian
const guardian = new Guardian({
  eventLoop: {
    thresholdMs: 100,
  },
  memory: {
    rssLimit: 512 * 1024 * 1024, // 512MB
    driftDetectionEnabled: true,
  },
  protection: {
    enabled: true,
    loadShedding: {
      enabled: true,
      eventLoopThreshold: 200,
    },
  },
  metricsServer: {
    enabled: true,
    port: 9090,
  },
});

// Start monitoring
guardian.start();

// Listen for warnings
guardian.on('warning', (warning) => {
  console.warn('Guardian warning:', warning);
});

// Get current metrics
const metrics = guardian.getMetrics();
console.log('Health score:', guardian.getHealthScore());

💡 Tip: For more detailed examples and configuration options, see the Usage Documentation.

Documentation

📖 For comprehensive usage guides, configuration options, integration patterns, and best practices, see the Usage Documentation.

The Usage Documentation provides detailed information on:

Configuration Examples: Development, production, memory-sensitive, and CPU-intensive setups
Integration Patterns: Express.js, Fastify, Koa.js middleware integration
Usage Patterns: Background job monitoring, microservices health reporting
Best Practices: Configuration tuning, event handling, request tracking
Troubleshooting: Common issues and solutions
Advanced Topics: Custom health scoring, APM integration, metrics export

API Documentation

Guardian Class

Main class that orchestrates all monitoring and protection.

Constructor

new Guardian(config?: GuardianConfig)

Methods

init(config?: GuardianConfig): void - Initialize and start monitoring
start(): void - Start all monitors
stop(): void - Stop all monitors
getMetrics(): RuntimeMetrics - Get current aggregated metrics
getHealthScore(): number - Get health score (0-1, where 1 is healthy)
isProtectionActive(): boolean - Check if protection is active
shouldRejectRequest(): boolean - Check if a request should be rejected
getRejectionResponse(): { statusCode: number; message: string } - Get rejection response
trackRequest(): void - Track incoming request
trackRequestComplete(): void - Track completed request
getAlertManager(): AlertManager | null - Get the alert manager instance

Events

metric - Emitted when metrics are collected
warning - Emitted when thresholds are exceeded
eventLoopBlocked - Emitted when event loop delay exceeds threshold
memoryDrift - Emitted when memory drift is detected
gcPressure - Emitted when GC pressure is high
threadPoolSaturated - Emitted when thread pool is saturated
protectionActivated - Emitted when protection is activated
protectionDeactivated - Emitted when protection is deactivated
alertSent - Emitted when an alert is delivered successfully
alertFailed - Emitted when an alert delivery fails

Configuration

interface GuardianConfig {
  eventLoop?: {
    enabled?: boolean;
    thresholdMs?: number; // Default: 100
    intervalMs?: number; // Default: 1000
  };
  memory?: {
    enabled?: boolean;
    rssLimit?: number;
    heapLimit?: number;
    externalLimit?: number;
    driftDetectionEnabled?: boolean; // Default: true
    driftThreshold?: number; // Default: 0.1
  };
  gc?: {
    enabled?: boolean;
    pressureThreshold?: number; // Default: 0.7
  };
  threadPool?: {
    enabled?: boolean;
    saturationThreshold?: number; // Default: 0.8
  };
  protection?: {
    enabled?: boolean;
    loadShedding?: {
      enabled?: boolean;
      eventLoopThreshold?: number; // Default: 200
      memoryThreshold?: number; // Default: 0.9
      responseCode?: number; // Default: 503
      responseMessage?: string;
    };
  };
  metricsServer?: {
    enabled?: boolean;
    port?: number; // Default: 9090
    path?: string; // Default: '/guardian/metrics'
  };
  workerPool?: {
    enabled?: boolean;
    poolSize?: number; // Default: 4
    maxQueueSize?: number; // Default: 100
  };
  alerting?: {
    enabled?: boolean;
    cooldownMs?: number; // Default: 60000
    channels: AlertChannelConfig[];
  };
}

Usage Examples

📚 See Usage Documentation for more comprehensive examples including Express, Fastify, Koa.js integrations, and advanced patterns.

Basic Monitoring

import { Guardian } from 'node-runtime-guardian';

const guardian = new Guardian({
  eventLoop: { thresholdMs: 100 },
  memory: { driftDetectionEnabled: true },
});

guardian.start();

guardian.on('warning', (warning) => {
  console.error('Warning:', warning.message);
});

Load Shedding Integration

import express from 'express';
import { Guardian } from 'node-runtime-guardian';

const app = express();
const guardian = new Guardian({
  protection: {
    enabled: true,
    loadShedding: {
      enabled: true,
      eventLoopThreshold: 200,
    },
  },
});

guardian.start();

// Middleware to check protection
app.use((req, res, next) => {
  guardian.trackRequest();

  if (guardian.shouldRejectRequest()) {
    const response = guardian.getRejectionResponse();
    guardian.trackRequestComplete();
    return res.status(response.statusCode).json({ error: response.message });
  }

  res.on('finish', () => {
    guardian.trackRequestComplete();
  });

  next();
});

app.get('/api/data', (req, res) => {
  res.json({ data: 'Hello World' });
});

Metrics Endpoint

import { Guardian, MetricsServer } from 'node-runtime-guardian';

const guardian = new Guardian();
guardian.start();

const metricsServer = new MetricsServer(guardian, {
  port: 9090,
  path: '/guardian/metrics',
});

metricsServer.start();

// Access metrics at http://localhost:9090/guardian/metrics

Plugin System

import {
  Guardian,
  GuardianPlugin,
  RuntimeMetrics,
} from 'node-runtime-guardian';

class CustomLoggerPlugin implements GuardianPlugin {
  name = 'custom-logger';

  onMetricUpdate(data: RuntimeMetrics): void {
    console.log('Metrics:', {
      eventLoopDelay: data.eventLoopDelay.mean,
      memory: data.memory.heapUsed,
      health: data.healthScore,
    });
  }
}

const guardian = new Guardian();
guardian.start();

// Register plugin (would need plugin registry integration)
// This is a conceptual example

Alerting

import { Guardian } from 'node-runtime-guardian';

const guardian = new Guardian({
  eventLoop: { thresholdMs: 100 },
  memory: { driftDetectionEnabled: true },
  alerting: {
    enabled: true,
    cooldownMs: 60000,
    channels: [
      {
        type: 'slack',
        webhookUrl: 'https://hooks.slack.com/services/T.../B.../xxx',
        severityFilter: ['high', 'critical'],
      },
      {
        type: 'pagerduty',
        routingKey: 'your-integration-key',
        severityFilter: ['critical'],
      },
      {
        type: 'webhook',
        url: 'https://your-api.com/alerts',
      },
    ],
  },
});

guardian.start();

Worker Pool

import { WorkerPool } from 'node-runtime-guardian';

const pool = new WorkerPool('./worker.js', {
  poolSize: 4,
  maxQueueSize: 100,
});

// Run a task
const result = await pool.run({ task: 'process-data' });

// Get pool statistics
const stats = pool.getStats();
console.log('Active tasks:', stats.activeTasks);
console.log('Queue size:', stats.queueSize);

Metrics

The metrics endpoint returns the following data:

{
  "eventLoopDelay": {
    "mean": 2.5,
    "max": 10.2,
    "min": 0.1,
    "p50": 2.0,
    "p95": 8.5,
    "p99": 9.8
  },
  "memory": {
    "heapUsed": 52428800,
    "heapTotal": 67108864,
    "rss": 134217728,
    "external": 1024000,
    "arrayBuffers": 512000
  },
  "gc": {
    "estimatedCycles": 15,
    "minorGCCount": 12,
    "majorGCCount": 3,
    "gcPressure": 0.45
  },
  "threadPool": {
    "saturationLevel": 0.2,
    "estimatedQueueSize": 0.8,
    "avgLatency": 0.5
  },
  "cpu": {
    "user": 125.5,
    "system": 45.2
  },
  "activeRequests": 5,
  "healthScore": 0.85,
  "isProtectionActive": false,
  "timestamp": 1704067200000
}

Development

# Install dependencies
npm install

# Build
npm run build

# Run tests
npm test

# Lint
npm run lint

# Format code
npm run format

Contributing

Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built with TypeScript
Uses Node.js built-in modules only
Inspired by production monitoring needs

If this project helped you or sparked an idea, consider dropping a star or a kind word. It quietly keeps me motivated.