npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@unrdf/observability

v26.4.8

Published

Innovative Prometheus/Grafana observability dashboard for UNRDF distributed workflows

Downloads

572

Readme

@unrdf/observability

Innovative Prometheus/Grafana observability dashboard for UNRDF distributed workflows

Real-time monitoring, alerting, and visualization for workflow execution, resource utilization, and business metrics.

Features

Metrics Collection

  • Workflow Execution Metrics: Total executions, duration, active workflows
  • Task Performance: Execution time, success rate, queue depth
  • Resource Utilization: CPU, memory, disk monitoring
  • Event Sourcing: Events appended, store size tracking
  • Business Metrics: Policy evaluations, cryptographic receipts
  • Latency Percentiles: p50, p90, p95, p99 tracking

Alerting System

  • Threshold-Based Alerts: Configurable rules with hysteresis
  • Anomaly Detection: Statistical z-score analysis
  • Webhook Notifications: HTTP callbacks for alert events
  • Alert Deduplication: Smart grouping and correlation
  • Severity Levels: INFO, WARNING, CRITICAL

Grafana Dashboards

  • Pre-built Dashboards: Ready-to-use visualizations
  • Real-time Updates: 5-second refresh intervals
  • Custom Variables: Filter by workflow ID, pattern
  • Alert Annotations: Visual markers for events
  • Exportable JSON: Import directly into Grafana

Installation

pnpm add @unrdf/observability

Quick Start

Basic Usage

import { createWorkflowMetrics } from '@unrdf/observability';

const metrics = createWorkflowMetrics({
  enableDefaultMetrics: true,
  prefix: 'unrdf_workflow_',
  labels: { environment: 'production' },
});

// Record workflow execution
metrics.recordWorkflowStart('wf-123', 'SEQUENCE');
// ... execute workflow ...
metrics.recordWorkflowComplete('wf-123', 'completed', 2.5, 'SEQUENCE');

// Export metrics
const prometheusMetrics = await metrics.getMetrics();

Complete Observability Stack

import { createObservabilityStack } from '@unrdf/observability';

const { metrics, grafana, alerts } = await createObservabilityStack({
  metrics: {
    enableDefaultMetrics: true,
    prefix: 'unrdf_workflow_',
  },
  alerts: {
    rules: [
      {
        id: 'high-latency',
        name: 'High Workflow Latency',
        metric: 'workflow_duration',
        threshold: 10,
        operator: 'gt',
        severity: 'warning',
      },
    ],
    enableAnomalyDetection: true,
  },
  grafana: {
    title: 'Production Workflows',
    datasource: 'Prometheus',
  },
});

// Listen for alerts
alerts.on('alert', alert => {
  console.log(`Alert: ${alert.name} - ${alert.severity}`);
});

Live Demo

Run the included demo to see metrics in action:

cd packages/observability
pnpm install
pnpm demo

Then visit:

  • Prometheus Metrics: http://localhost:9090/metrics
  • Metrics JSON: http://localhost:9090/metrics/json
  • Grafana Dashboard: http://localhost:9090/dashboard
  • Active Alerts: http://localhost:9090/alerts
  • Statistics: http://localhost:9090/stats

API Reference

WorkflowMetrics

Constructor

new WorkflowMetrics(config);

Config Options:

  • enableDefaultMetrics (boolean): Enable Node.js default metrics (default: true)
  • prefix (string): Metric name prefix (default: 'unrdfworkflow')
  • labels (object): Global labels for all metrics
  • collectInterval (number): Collection interval in ms (default: 10000)

Methods

recordWorkflowStart(workflowId, pattern)

Record workflow execution start.

recordWorkflowComplete(workflowId, status, durationSeconds, pattern)

Record workflow completion with duration.

recordTaskExecution(workflowId, taskId, taskType, status, durationSeconds)

Record task execution metrics.

updateTaskQueueDepth(workflowId, queueName, depth)

Update task queue depth gauge.

recordResourceUtilization(resourceType, resourceId, utilizationPercent)

Record resource utilization (0-100%).

recordEventAppended(eventType, workflowId)

Record event appended to event store.

recordPolicyEvaluation(policyName, result)

Record policy evaluation result.

recordCryptoReceipt(workflowId, algorithm)

Record cryptographic receipt generation.

recordError(errorType, workflowId, severity)

Record error occurrence.

getMetrics()

Get metrics in Prometheus text format.

getMetricsJSON()

Get metrics in JSON format.

AlertManager

Constructor

new AlertManager(config);

Config Options:

  • rules (array): Initial alert rules
  • webhooks (array): Webhook endpoints
  • checkInterval (number): Rule check interval (default: 10000)
  • enableAnomalyDetection (boolean): Enable anomaly detection (default: true)

Methods

addRule(rule)

Add alert rule.

Rule Schema:

{
  id: 'unique-rule-id',
  name: 'Human Readable Name',
  metric: 'metric_name',
  threshold: 100,
  operator: 'gt', // gt, lt, gte, lte, eq
  severity: 'warning', // info, warning, critical
  duration: 60000, // ms
  enabled: true
}
evaluateMetric(metricName, value, labels)

Evaluate metric against all rules.

getActiveAlerts()

Get currently active alerts.

getAlertHistory(filters)

Get alert history with optional filters.

getStatistics()

Get alert statistics.

Events

  • alert: Fired when alert triggers
  • alert:resolved: Fired when alert resolves
  • webhook:error: Fired on webhook delivery failure

GrafanaExporter

Constructor

new GrafanaExporter(config);

Config Options:

  • title (string): Dashboard title
  • datasource (string): Prometheus datasource name (default: 'Prometheus')
  • refreshInterval (string): Dashboard refresh interval (default: '5s')
  • tags (array): Dashboard tags

Methods

generateDashboard()

Generate complete Grafana dashboard configuration.

exportJSON(pretty)

Export dashboard as JSON string.

generateAlertDashboard()

Generate alert-focused dashboard.

Metrics Collected

Workflow Metrics

  • unrdf_workflow_executions_total (Counter): Total workflow executions
  • unrdf_workflow_execution_duration_seconds (Histogram): Workflow duration
  • unrdf_workflow_active_workflows (Gauge): Active workflows

Task Metrics

  • unrdf_workflow_task_executions_total (Counter): Total task executions
  • unrdf_workflow_task_duration_seconds (Histogram): Task duration
  • unrdf_workflow_task_queue_depth (Gauge): Queue depth

Resource Metrics

  • unrdf_workflow_resource_utilization (Gauge): Resource utilization %
  • unrdf_workflow_resource_allocations_total (Counter): Resource allocations

Event Sourcing Metrics

  • unrdf_workflow_events_appended_total (Counter): Events appended
  • unrdf_workflow_event_store_size_bytes (Gauge): Event store size

Business Metrics

  • unrdf_workflow_policy_evaluations_total (Counter): Policy evaluations
  • unrdf_workflow_crypto_receipts_total (Counter): Crypto receipts
  • unrdf_workflow_errors_total (Counter): Errors

Performance Metrics

  • unrdf_workflow_latency_percentiles (Summary): Latency percentiles

Integration with Prometheus

Prometheus Configuration

scrape_configs:
  - job_name: 'unrdf-workflows'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']

Alert Rules (prometheus.yml)

groups:
  - name: unrdf_workflow_alerts
    interval: 30s
    rules:
      - alert: HighWorkflowErrorRate
        expr: rate(unrdf_workflow_errors_total[5m]) > 1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: 'High workflow error rate detected'
          description: 'Error rate is {{ $value }} errors/sec'

      - alert: HighResourceUtilization
        expr: unrdf_workflow_resource_utilization > 90
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: 'Resource utilization above 90%'
          description: '{{ $labels.resource_type }} on {{ $labels.resource_id }}'

Grafana Dashboard Import

  1. Export dashboard JSON:
curl http://localhost:9090/dashboard/export > dashboard.json
  1. Import in Grafana:
    • Navigate to Dashboards → Import
    • Upload dashboard.json
    • Select Prometheus datasource
    • Click Import

Architecture

┌─────────────────────────────────────────────┐
│          Workflow Application               │
│  (Records metrics via WorkflowMetrics)      │
└─────────────────┬───────────────────────────┘
                  │
                  ↓
┌─────────────────────────────────────────────┐
│        Prometheus Metrics Endpoint          │
│            (Express Server)                 │
│         http://localhost:9090/metrics       │
└─────────────────┬───────────────────────────┘
                  │
                  ↓
┌─────────────────────────────────────────────┐
│         Prometheus Server (Scraper)         │
│     - Scrapes metrics every 5s              │
│     - Stores time-series data               │
│     - Evaluates alert rules                 │
└─────────────────┬───────────────────────────┘
                  │
                  ↓
┌─────────────────────────────────────────────┐
│           Grafana Dashboard                 │
│     - Visualizes metrics                    │
│     - Real-time graphs                      │
│     - Alert annotations                     │
└─────────────────────────────────────────────┘

Alert Flow

Metric Value
    ↓
AlertManager.evaluateMetric()
    ↓
Check Threshold Rules
    ↓
Check Anomaly Detection (z-score)
    ↓
Alert Triggered?
    ↓
Emit 'alert' Event
    ↓
Send Webhook Notifications
    ↓
Store in Alert History

Performance

  • Overhead: <1ms per metric recording
  • Memory: ~50MB for 1000 workflows
  • Throughput: 10,000+ metrics/sec
  • Alert Latency: <100ms detection to notification

Best Practices

1. Metric Cardinality

Keep label cardinality low to avoid memory issues:

// ✅ Good - bounded cardinality
metrics.recordWorkflowComplete(workflowId, 'completed', duration, 'SEQUENCE');

// ❌ Bad - unbounded cardinality
// metrics.recordWorkflow(userId, timestamp, randomValue);

2. Alert Thresholds

Set thresholds based on baseline + 2σ:

const baseline = 5; // seconds
const stdDev = 1.5;
const threshold = baseline + 2 * stdDev; // 8 seconds

alerts.addRule({
  id: 'high-latency',
  metric: 'workflow_duration',
  threshold,
  operator: 'gt',
  severity: 'warning',
});

3. Dashboard Organization

  • Group related panels
  • Use template variables for filtering
  • Set appropriate time ranges
  • Enable auto-refresh (5-10s)

4. Webhook Reliability

  • Use exponential backoff
  • Implement idempotency
  • Monitor webhook failures
  • Set reasonable timeouts

Troubleshooting

Metrics Not Appearing

  1. Check metrics endpoint: curl http://localhost:9090/metrics
  2. Verify Prometheus scraping: Prometheus UI → Targets
  3. Check for metric name typos

Alerts Not Firing

  1. Verify rule configuration: alerts.rules
  2. Check metric values: alerts.metricHistory
  3. Enable debug logging: alerts.on('alert', console.log)

High Memory Usage

  1. Reduce metric history: limit to 100-500 samples
  2. Lower cardinality: fewer unique label combinations
  3. Increase scrape interval: 10-30s instead of 5s

License

MIT

Contributing

See main UNRDF repository for contribution guidelines.