@cobui/node-monitoring

v1.1.2

Published

5 days ago

A lightweight monitoring library for Node.js

0High
0Medium
0Low

cobui

monitoring metrics influxdb counter gauge histogram nodejs

node-monitoring

Lightweight monitoring for Node.js. Define metrics in YAML, drop sensors in your code, data goes to InfluxDB.

Quick start

Parse your config file and pass it directly to monitoring.add()

import { Monitoring, loadConfig, Counter, Gauge, Histogram } from "@cobui/node-monitoring";

const config = loadConfig("monitoring.yml") as any;
const monitoring = new Monitoring();

monitoring.add(config);

// Sensors can be created anywhere — no reference to `monitoring` needed
const requests = Counter.create("http.requests", "app");
const latency = Histogram.create("http.latency", "app");
const memory = Gauge.create("process.mem", "app");

// In your request handler:
requests.increment(1, { route: "/api/users", method: "GET" });
latency.record(42, { route: "/api/users" });
memory.set(process.memoryUsage().heapUsed / 1024 / 1024);

// Before process exit: flush(), then destroy().
// flush() resolves only after all data is sent; destroy() clears remaining handles.
await monitoring.flush();
monitoring.destroy();

See config.example.yml for a full annotated config file.

Sensors

Sensors are the primary recording API.

import { Counter, Gauge, Histogram } from "@cobui/node-monitoring";

const hits = Counter.create("http.requests", "app");
const memory = Gauge.create("process.mem", "app");
const latency = Histogram.create("http.latency", "app");

| Sensor | Metric type | Method | Use for | | ----------- | ----------- | -------------------------- | ------------------------------------------------- | | Counter | counter | increment(delta?, tags?) | Events (requests, errors, cache hits) | | Gauge | gauge | set(value, tags?) | Current values (memory, queue depth, connections) | | Histogram | histogram | record(value, tags?) | Distributions (latencies, sizes, durations) |

hits.increment(); // +1
hits.increment(5); // +5
hits.increment({ route: "/api" }); // +1 with tags (shorthand)

memory.set(process.memoryUsage().heapUsed);

latency.record(42, { route: "/api", status: "200" });

If a sensor fires before its namespace is active, or the URI doesn't match a registered metric, a warning is emitted once per sensor. See Warnings below.

Metric types

| Type | reset default | | ----------- | -------------------------------------- | | counter | true (per-interval rate) | | histogram | true (per-interval distribution) | | gauge | false (current value, never cleared) |

Counter and histogram reset after each collection cycle because you typically want per-interval rates and distributions, not cumulative totals. Gauge never resets because it represents an instantaneous value, when clearing it would produce a gap until the next set() call.

Set reset: false on a counter to get a monotonic total (diff/rate at query time).

MetricConfig

{
  uri:      string;          // identifier unique to namespace, e.g. "http.requests"
  type:     "counter" | "gauge" | "histogram";
  interval: number;          // collection interval in milliseconds

  // Optional
  reset?:    boolean;        // defaults: counter/histogram=true, gauge=false
  enabled?:  boolean;        // default: true
  tags?:     Record<string, string | number | boolean>;  // added to every data point
  exclude?:  string[];       // strip these tag keys before hashing to reduce cardinality
  cache?:    { max?: number }; // max distinct tag combinations to track (default: 1000)
}

Lifecycle

const monitoring = new Monitoring();

// Add a namespace — starts immediately when enabled: true (default)
monitoring.add([{ namespace: "app", transporter: ..., metrics: [...] }]);

// Start / pause all enabled namespaces
monitoring.start();
monitoring.stop();

// Per-namespace control
monitoring.setNamespaceEnabled("app", false);  // pause
monitoring.setNamespaceEnabled("app", true);   // resume
monitoring.isEnabled("app");          // → boolean

// Per-metric control
monitoring.setMetricEnabled("http.requests", false);
monitoring.reschedule("http.requests", 30_000);

// Before process exit: flush buffered data, then destroy.
// flush() waits until every queued item has been sent (or exhausted retries),
// but it does NOT close all handles — the loss-reporting timer inside the
// transport queue stays alive until destroy() is called. Always pair them.
await monitoring.flush();
monitoring.destroy();

Transporter config — InfluxDB

InfluxDB v2 (InfluxDB Cloud / OSS 2.x)

transporter:
  type: influx
  version: 2
  key:
    influx # default: "influx". Namespaces sharing the same key share one queue
    # and rate limit. Give each transporter its own key when you want
    # independent queues (e.g. different InfluxDB hosts or rate limits).
  host: influxdb.example.com
  port: 8086 # default: 8086
  protocol: https # default: https
  org: my-org
  bucket: app-metrics
  token: "YOUR_TOKEN"

  rateLimit: 20 # requests per second (default: 10)
  measurementStrategy: uri # "uri" (default) or "namespace" (see below)

  retry:
    retries: 3 # attempts after the first (default: 3)
    minTimeout: 1000 # ms before first retry (default: 1000)
    maxTimeout: 30000 # upper bound on backoff (default: 30000)
    factor: 2 # exponential multiplier (default: 2)

  queue:
    maxSize: 10000 # drop incoming when queue exceeds this (default: unlimited)
    lossInterval: 300000 # ms between loss-record flushes (default: 5 min)

InfluxDB v1 (InfluxDB OSS 1.x)

transporter:
  type: influx
  version: 1
  key: influx # default: "influx" — see note in v2 section above
  host: influx-legacy.internal
  port: 8086
  protocol: http # default: http
  database: app-metrics
  retentionPolicy: 90d # optional
  username: monitor # optional -> omit both username and password for unauthenticated instances
  password: "YOUR_PASSWORD" # optional -> must be provided together with username

Measurement strategy

| Strategy | Measurement name | When to use | | --------------- | ------------------------------------- | ------------------------------------------------- | | uri (default) | The metric URI (e.g. http.requests) | Each metric has its own schema | | namespace | The namespace (e.g. app) | All metrics in one place; URI kept as a uri tag |

Namespace tag (`includeNamespaceTag`)

By default the namespace value is not added as a tag on every data point. For most setups this is the right choice — with measurementStrategy: namespace the namespace is already the measurement name, and with measurementStrategy: uri and a single namespace it would just be a constant on every row.

Set includeNamespaceTag: true on a namespace config when you use measurementStrategy: uri and have multiple namespaces writing to the same transporter. This lets you filter by namespace in InfluxDB without it being the measurement name.

- namespace: app
  includeNamespaceTag: true   # stamps namespace: "app" on every aggregate
  transporter:
    type: influx
    measurementStrategy: uri
    ...

Cluster mode

No setup required. On worker processes, aggregates are forwarded to the primary via IPC. The primary consolidates them in a shared rate-limited queue. There is always only one queue per transporter key, regardless of how many workers or namespaces share it.

Warnings

Sensors emit a typed warning event the first time an issue is detected. By default warnings fall through to console.warn. Subscribe to any category to route them to your own logger or suppress them entirely:

import { warnings } from "@cobui/node-monitoring";

warnings.on("sensor:inactive", ({ uri, namespace }) => {
  /* namespace not started */
});
warnings.on("sensor:not-found", ({ uri, namespace, type }) => {
  /* URI not registered */
});
warnings.on("sensor:ambiguous", ({ uri, namespaces }) => {
  /* multiple active namespaces, no explicit ns */
});
warnings.on("transport:loss", ({ uri, namespaces }) => {
  /* queue dropped measurements after last failed retry */
});

Each sensor warning fires at most once per sensor instance.

Design notes

No silent failures. Sensors emit a one-time warning if the namespace is not active or the URI is not found, so misconfigurations surface immediately without spamming logs.

No TTL on metric cache. TTL causes implicit counter resets and breaks rate/diff queries. Metrics accumulate until a collection cycle runs; reset: true clears them after.

Tag ordering. Tags are sorted alphabetically before sending to the backend to improve Influx performance.

Loss records. When the queue is full or retries are exhausted, lost items are counted per namespace and flushed as a monitoring.loss aggregate at a configurable interval. This lets you track dropped metrics without flooding the queue.