@techalmondsai/nodejs-monitoring

v1.1.0

Published

11 days ago

A comprehensive monitoring service for Node.js applications with built-in health probes and metrics collection

Downloads

182

0High
0Medium
0Low

techalmond

monitoring health-check metrics nodejs express apm observability

Node.js Monitoring Service

A comprehensive, container-aware monitoring service for Node.js applications with built-in health probes, metrics collection, Kubernetes-native endpoints, and graceful shutdown support. A lightweight alternative to New Relic.

Features

Easy Integration - One-line setup for Express applications
Container-Aware - Reads cgroup v1/v2 for accurate memory and CPU metrics in Docker/Kubernetes
Kubernetes-Native - Separate /healthz (liveness) and /readyz (readiness) endpoints
Built-in Metrics - Memory, CPU, heap, uptime, request tracking
Health Probes - Memory, CPU, response time, error rate, disk space, uptime
Graceful Shutdown - SIGTERM handling, interval cleanup, shutdown-aware probes
Performance Tracking - Request/response time monitoring (5xx errors only)
Custom Probes - Add your own health checks with configurable timeouts
REST API - Built-in endpoints for health, metrics, liveness, and readiness

Installation

npm install @techalmondsai/nodejs-monitoring

Quick Start

Basic Setup

import express from "express";
import { setupMonitoring } from "@techalmondsai/nodejs-monitoring";

const app = express();

// Setup monitoring with one line
const monitoring = setupMonitoring(app);

// Your routes
app.get("/api/users", (req, res) => {
  res.json({ users: [] });
});

app.listen(3000, () => {
  console.log("Server running on port 3000");
  console.log("Health check: http://localhost:3000/health");
  console.log("Liveness: http://localhost:3000/healthz");
  console.log("Readiness: http://localhost:3000/readyz");
});

Advanced Configuration

import express from "express";
import { setupMonitoring } from "@techalmondsai/nodejs-monitoring";

const app = express();

const monitoring = setupMonitoring(app, {
  healthRoutePath: "/api/health",
  metricsInterval: 15000,
  enableRequestTracking: true,
  enableErrorTracking: true,
  livenessPath: "/api/liveness",
  readinessPath: "/api/readiness",
  probeTimeout: 5000,
  alertThresholds: {
    memoryUsage: 85,
    cpuUsage: 90,
    responseTime: 2000,
    errorRate: 5,
  },
});

// Add custom health probe
monitoring.addProbe({
  name: "database_connection",
  check: async () => {
    try {
      await database.ping();
      return {
        status: "healthy",
        message: "Database connection successful",
      };
    } catch (error) {
      return {
        status: "critical",
        message: `Database connection failed: ${error.message}`,
      };
    }
  },
  interval: 30000,
});

app.listen(3000);

API Endpoints

Once integrated, your application will have the following endpoints:

Health Check (Full Report)

GET /health

Returns comprehensive health information including all metrics and probe results. Returns 200 when healthy, 503 when any probe is critical or warning.

{
  "status": "healthy",
  "timestamp": 1640995200000,
  "uptime": 3600,
  "hostname": "my-pod-abc123",
  "metrics": {
    "timestamp": 1640995200000,
    "uptime": 3600,
    "memory": {
      "used": 256,
      "total": 2048,
      "percentage": 12,
      "heap": {
        "used": 80,
        "total": 1584,
        "percentage": 5
      },
      "isContainerAware": true
    },
    "cpu": {
      "usage": 15.5,
      "loadAverage": [0.5, 0.3, 0.2],
      "effectiveCpus": 0.5,
      "isContainerAware": true
    },
    "process": {
      "pid": 1,
      "version": "v18.17.0",
      "platform": "linux",
      "arch": "x64",
      "hostname": "my-pod-abc123"
    },
    "requests": {
      "total": 150,
      "active": 2,
      "averageResponseTime": 120
    },
    "errors": {
      "total": 5,
      "rate": 0.2
    }
  },
  "probes": {
    "memory_usage": {
      "status": "healthy",
      "message": "Memory usage is 12%",
      "value": 12
    },
    "cpu_usage": {
      "status": "healthy",
      "message": "CPU usage is 15.5%",
      "value": 15.5
    },
    "response_time": {
      "status": "healthy",
      "message": "Avg response time is 120ms",
      "value": 120
    },
    "error_rate": {
      "status": "healthy",
      "message": "Error rate is 0.2/min",
      "value": 0.2
    },
    "uptime": {
      "status": "healthy",
      "message": "Application has been running for 1h 0m",
      "value": 3600
    },
    "disk_space": {
      "status": "healthy",
      "message": "Disk usage is 45%",
      "value": 45
    }
  },
  "version": "1.0.0"
}

Kubernetes Liveness Probe

GET /healthz

Returns 200 if the process is alive and not shutting down. Does not check downstream dependencies — prevents unnecessary pod restarts.

{
  "status": "alive",
  "uptime": 3600,
  "hostname": "my-pod-abc123"
}

Kubernetes Readiness Probe

GET /readyz

Returns 200 if the service is ready to accept traffic. Returns 503 only when a probe is critical — warnings are treated as still ready.

{
  "status": "ready",
  "hostname": "my-pod-abc123"
}

Metrics History

GET /health/metrics?limit=50

Returns historical metrics data (up to 100 entries, clamped):

{
  "metrics": [],
  "count": 50,
  "latest": { }
}

Kubernetes Configuration

livenessProbe:
  httpGet:
    path: /healthz
    port: 3000
  initialDelaySeconds: 10
  periodSeconds: 15
  failureThreshold: 3
  timeoutSeconds: 5

readinessProbe:
  httpGet:
    path: /readyz
    port: 3000
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 3
  timeoutSeconds: 3

The paths are configurable via livenessPath and readinessPath in the config.

Built-in Probes

| Probe | Default Threshold | Description | |-------|------------------|-------------| | Memory Usage | 80% | Container-aware (cgroup v1/v2). Warning at 80% of threshold, critical above. | | CPU Usage | 80% | Normalized against cgroup CPU quota. Warning at 80% of threshold, critical above. | | Response Time | 5000ms | Average response time across last 100 requests. | | Error Rate | 10/min | 5xx errors per minute (4xx excluded). Adjusts window for short uptimes. | | Uptime | - | Application uptime. Always healthy. | | Disk Space | 85% | Actual disk usage via statfsSync or async df. Warning at 80% of threshold. |

Container-Aware Metrics

On Kubernetes/Docker, the library automatically detects the container environment and reads real resource limits:

Memory: Reads from /sys/fs/cgroup/memory.max (v2) or /sys/fs/cgroup/memory/memory.limit_in_bytes (v1) instead of os.totalmem()
CPU: Normalizes usage against cgroup CPU quota (cpu.max / cpu.cfs_quota_us) instead of host CPU count
Heap: Uses v8.getHeapStatistics().heap_size_limit for the real V8 heap ceiling

On bare metal/EC2 without containers, it falls back to standard OS metrics.

Configuration Options

interface MonitoringConfig {
  enableHealthRoute?: boolean;     // Enable /health endpoint (default: true)
  healthRoutePath?: string;        // Health route path (default: '/health')
  enableMetricsCollection?: boolean; // Enable metrics collection (default: true)
  metricsInterval?: number;        // Collection interval in ms (default: 30000)
  enableRequestTracking?: boolean; // Track HTTP requests (default: true)
  enableErrorTracking?: boolean;   // Track 5xx errors (default: true)
  customProbes?: CustomProbe[];    // Additional custom probes
  alertThresholds?: AlertThresholds; // Custom alert thresholds
  livenessPath?: string;           // K8s liveness path (default: '/healthz')
  readinessPath?: string;          // K8s readiness path (default: '/readyz')
  probeTimeout?: number;           // Probe execution timeout in ms (default: 10000)
}

interface AlertThresholds {
  memoryUsage?: number;  // Memory usage threshold percentage
  cpuUsage?: number;     // CPU usage threshold percentage
  responseTime?: number; // Response time threshold in ms
  errorRate?: number;    // Error rate threshold per minute
}

Graceful Shutdown

The service automatically handles SIGTERM signals (sent by Kubernetes during pod termination):

Sets isShuttingDown flag — liveness and readiness probes immediately return 503
Clears all metric collection and probe execution intervals
Cleans up system metrics and request tracker resources

You can also trigger shutdown manually:

const monitoring = MonitoringService.getInstance();

process.on("SIGTERM", async () => {
  await monitoring.shutdown();
  server.close();
});

Custom Probes

import { CustomProbe, ProbeResult } from "@techalmondsai/nodejs-monitoring";

const customProbe: CustomProbe = {
  name: "external_api_check",
  check: async (): Promise<ProbeResult> => {
    const start = Date.now();
    try {
      const response = await fetch("https://api.example.com/health");
      return {
        status: response.ok ? "healthy" : "critical",
        message: `External API responded with ${response.status}`,
        value: response.status,
        metadata: {
          responseTime: Date.now() - start,
          url: "https://api.example.com/health",
        },
      };
    } catch (error) {
      return {
        status: "critical",
        message: `External API unreachable: ${error.message}`,
      };
    }
  },
  interval: 60000,
};

monitoring.addProbe(customProbe);

Probes that hang beyond the configured probeTimeout (default 10s) are automatically marked as critical.

Integration Examples

Express.js with TypeScript

import express from "express";
import { setupMonitoring } from "@techalmondsai/nodejs-monitoring";

const app = express();

const monitoring = setupMonitoring(app, {
  healthRoutePath: "/api/health",
  alertThresholds: {
    memoryUsage: 85,
    cpuUsage: 90,
  },
});

monitoring.addProbe({
  name: "postgres_connection",
  check: async () => {
    try {
      const client = await pool.connect();
      await client.query("SELECT NOW()");
      client.release();
      return { status: "healthy", message: "PostgreSQL connected" };
    } catch (error) {
      return {
        status: "critical",
        message: `PostgreSQL error: ${error.message}`,
      };
    }
  },
});

export default app;

NestJS Integration

import { NestFactory } from "@nestjs/core";
import { AppModule } from "./app.module";
import { setupMonitoring } from "@techalmondsai/nodejs-monitoring";

async function bootstrap() {
  const app = await NestFactory.create(AppModule);

  const monitoring = setupMonitoring(app.getHttpAdapter().getInstance(), {
    healthRoutePath: "/health",
    metricsInterval: 20000,
  });

  await app.listen(3000);
}

bootstrap();

Manual Usage (Without Express)

import { MonitoringService } from "@techalmondsai/nodejs-monitoring";

const monitoring = MonitoringService.getInstance({
  enableHealthRoute: false,
  metricsInterval: 15000,
});

// Get current metrics
const metrics = monitoring.getCurrentMetrics();

// Get probe results
const probes = monitoring.getProbeResults();

// Add custom probe
monitoring.addProbe({
  name: "custom_check",
  check: async () => {
    return { status: "healthy", message: "All good!" };
  },
});

API Reference

MonitoringService

| Method | Description | |--------|-------------| | getInstance(config?) | Get or create the singleton instance | | addProbe(probe) | Add a custom health probe | | getCurrentMetrics() | Get current system metrics | | getProbeResults() | Get all probe results | | requestTrackingMiddleware() | Express middleware for request tracking | | healthCheckHandler() | Express handler for /health | | livenessHandler() | Express handler for /healthz | | readinessHandler() | Express handler for /readyz | | metricsHistoryHandler() | Express handler for /health/metrics | | shutdown() | Graceful shutdown (clears intervals, sets shutting down flag) | | static reset() | Reset all singletons (for testing) |

Helper Functions

| Function | Description | |----------|-------------| | setupMonitoring(app, config?) | Quick setup for Express apps — registers all endpoints and middleware |

Exports

| Export | Description | |--------|-------------| | MonitoringService | Core monitoring singleton | | BuiltInProbes | Factory methods for built-in probes | | ContainerDetector | Container/cgroup detection utility | | HealthMetrics | TypeScript interface for metrics | | MonitoringConfig | TypeScript interface for config | | CustomProbe | TypeScript interface for custom probes | | ProbeResult | TypeScript interface for probe results | | AlertThresholds | TypeScript interface for thresholds |

Testing

npm test
npm run test:watch
npm test -- --coverage

License

MIT License - see LICENSE file for details.

Support

GitHub Issues: Report bugs and request features
Documentation: Full API documentation