npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@timemacro/service-guardian

v1.0.15

Published

Enterprise Linux service monitor with auto-restart, crash recovery, OOM detection, email alerts, health checks. Alternative to PM2, Supervisor, Monit for systemd services. Monitor MySQL, Nginx, Apache, PostgreSQL, Redis. Zero-downtime production monitorin

Downloads

35

Maintainers

derricksiaworderricksiawor

Keywords

systemdmonitoringlinuxservice-monitorauto-restartservicemonitorrestartalertemaildaemonwatchdoghealth-checkdevopssysadminuptimeservice-monitoringprocess-monitorserver-monitoringautomationsystemctlprocess-managerservice-managerlinux-monitoringubuntu-monitoringdebian-monitoringcentos-monitoringrhel-monitoringserver-managementautomatic-restartservice-recoverycrash-recoveryfailure-detectionoom-killermemory-monitoringcpu-monitoringdisk-monitoringresource-monitoringemail-alertssmtp-alertsnotificationalertingmysql-monitornginx-monitorapache-monitorpostgresql-monitorredis-monitormongodb-monitordocker-monitorpm2-alternativesupervisor-alternativemonit-alternativenagios-alternativezabbix-alternativeproduction-monitoringenterprise-monitoring24x7-monitoring247-monitoringalways-onhigh-availabilityfault-toleranceself-healingauto-recoveryincident-responsedowntime-preventionuptime-monitoringservice-healthhealth-monitoringtcp-checkhttp-checkport-monitoringendpoint-monitoringapi-monitoringwebsite-monitoringserver-watchdogprocess-watchdogservice-watchdoglinux-servicelinux-daemonbackground-servicecron-monitoringscheduled-checksmaintenance-windowdependency-managementservice-dependenciesbatch-alertsalert-aggregationsmart-alertsintelligent-monitoringpredictive-monitoringproactive-monitoringzero-downtimemission-criticalbusiness-continuitydisaster-recoveryservice-resilienceservice-reliabilityinfrastructure-monitoringopsoperationssite-reliabilitysrecli-toolcommand-lineterminalbashshellsystemd-servicesystemd-managerservice-orchestrationservice-automation

Readme

Service Guardian

Enterprise-grade automatic service monitoring, recovery, and alerting system for Linux servers

npm version License: MIT Node.js Version GitHub Issues

📦 Installation

⚠️ IMPORTANT: This is a global CLI tool. Always install with the -g flag:

npm install -g @timemacro/service-guardian

Or with sudo if needed:

sudo npm install -g @timemacro/service-guardian

Service Guardian is a production-ready Node.js daemon that monitors your Linux services, automatically recovers from failures, and sends intelligent alerts. Built for system administrators and DevOps teams who need reliable service uptime without manual intervention.

The Problem It Solves

Ever had MySQL crash at 3 AM due to an OOM killer? Or Apache go down during peak traffic? Service Guardian ensures your critical services stay running by:

  • Detecting failures instantly - Not just checking if process exists, but verifying services actually work
  • Smart auto-recovery - Distinguishes between crashes and manual stops, only restarts genuine failures
  • Intelligent alerting - Batched, actionable alerts with system context, not spam

Key Features

🛡️ Core Monitoring

  • Systemd Integration - Deep integration with systemd for accurate service state detection
  • Intelligent Failure Analysis - Differentiates between:
    • OOM (Out of Memory) kills
    • Service crashes
    • Manual stops (won't restart these)
    • Dependency failures
  • Parallel Monitoring - Efficiently monitors multiple services simultaneously
  • Resource-Aware - Monitors CPU, memory, disk usage before taking actions

🔄 Advanced Recovery

  • Smart Auto-Restart - With exponential backoff to prevent restart loops
  • Dependency Management - Handles service dependencies and circular dependencies
  • Recovery Actions - Beyond just restart:
    • Clear system cache
    • Kill memory-intensive processes
    • Reload configurations
    • Clean zombie processes
    • Repair databases
  • Maintenance Windows - Pause monitoring during planned maintenance

🏥 Health Checks

  • Beyond Process Monitoring - Tests if services actually work:
    • TCP port checks (is MySQL accepting connections?)
    • HTTP endpoint checks (is API returning 200?)
    • Custom script checks (complex business logic)
    • Command checks (simple shell commands)
  • Failure Thresholds - Only alerts after X consecutive failures (no false alarms)
  • User-Friendly Messages - Clear explanations of what's wrong and how to fix it

📧 Intelligent Alerting

  • Beautiful HTML Emails - Professional, readable alert emails with system context
  • Alert Aggregation - Batches multiple alerts to reduce email spam
  • Rate Limiting - Prevents alert storms during major incidents
  • Cooldown Periods - Won't repeatedly alert for the same issue
  • Contextual Information - Includes failure analysis, resource usage, recent logs

📊 Metrics & Reporting

  • Service Metrics - Track uptime, restart counts, failure patterns
  • Resource Metrics - Monitor CPU, memory, disk usage over time
  • Daily Aggregation - Historical data for trend analysis
  • Health Reports - Summary of all monitored services

🔒 Security

  • Command Injection Protection - All inputs sanitized and validated
  • Whitelisted Commands - Only approved system commands can be executed
  • Path Traversal Prevention - Secure file operations
  • No Hardcoded Credentials - Everything configurable via environment variables

Installation

Prerequisites

  • Node.js >= 16.0.0
  • Linux with systemd (Debian, Ubuntu, RHEL, etc.)
  • Root or sudo access (for systemctl commands)

Install via npm

# Install globally
npm install -g @timemacro/service-guardian

# Or with sudo if needed
sudo npm install -g @timemacro/service-guardian

Install from source

# Clone from your private repository
# https://github.com/derricksiawor/service-guardian
cd service-guardian
npm install
npm link

Quick Start

1. Install and Check Version

# Install globally
npm install -g @timemacro/service-guardian

# Verify installation
sg --version
sg --help                     # See all available commands

2. Configure Email Alerts (Optional but Recommended)

sg config email               # Interactive email setup

You'll be prompted for SMTP settings:

  • SMTP Host (e.g., smtp.gmail.com)
  • SMTP Port (e.g., 587)
  • Username
  • Password
  • From address
  • To address

3. Add Services to Monitor

# Add a service (auto-restart and alerts are enabled by default)
sg add mysql

# Add multiple services
sg add nginx
sg add postgresql
sg add redis

# Add with custom settings
sg add apache2 --max-restarts 10

# List all monitored services
sg list

4. Monitor Your Services

# The daemon auto-starts when you add services
sg status                     # Check daemon and all services status

# View logs
sg logs                       # Recent logs
sg logs --follow              # Live logs (like tail -f)
sg logs --tail 100            # Last 100 lines

# Manual operations
sg check mysql                # Check specific service
sg restart                    # Restart the daemon
sg test                       # Test all services

Usage

Command Reference

Service Guardian can be invoked using either service-guardian or sg (shorthand). We recommend using sg for convenience.

Quick Information Commands

# Get started quickly
sg                            # Show help and available commands
sg --help                     # Show detailed help
sg --version                  # Show version

# View current state
sg status                     # Show daemon status and all monitored services
sg list                       # List all monitored services
sg info                       # Show system information and configuration

Core Commands

# Daemon Control (auto-starts if not running)
sg start                      # Start monitoring daemon (auto-starts on first command)
sg stop                       # Stop monitoring daemon
sg restart                    # Restart daemon
sg status                     # Show daemon and services status

# Service Management
sg add <service> [options]    # Add service to monitoring
sg remove <service>           # Remove service from monitoring
sg list                       # List all monitored services
sg enable <service>           # Enable monitoring for service
sg disable <service>          # Disable monitoring for service

# Monitoring & Logs
sg logs                       # View recent daemon logs
sg logs --follow              # View logs in real-time (like tail -f)
sg logs --tail 50             # View last 50 log lines
sg check <service>            # Manually check service status
sg test                       # Test monitoring all services

Advanced Features

# Health Checks
sg health add <service> [options]     # Add health check
sg health list                         # List all health checks
sg health remove <service>             # Remove health check
sg health test <service>               # Test health check

# Dependencies
sg deps add <service> <deps...>       # Add service dependencies
sg deps remove <service> <deps...>    # Remove dependencies
sg deps list [service]                # List dependencies
sg deps check                          # Check for circular dependencies

# Maintenance Windows
sg maintenance add [options]          # Schedule maintenance
sg maintenance list                    # List maintenance windows
sg maintenance remove <name>           # Remove maintenance window

# Groups & Tags
sg group create <name>                 # Create service group
sg group add <group> <services...>    # Add services to group
sg group list                          # List all groups
sg tag add <service> <tags...>        # Add tags to service
sg tag list [service]                  # List tags

# Metrics & Reports
sg metrics [service] [options]         # View service metrics
sg report [options]                    # Generate health report

# Configuration
sg config email                        # Configure email settings
sg config show                         # Show configuration
sg config set <key> <value>           # Set config value
sg export [file]                       # Export configuration
sg import <file>                       # Import configuration

Configuration Options

Configuration is stored in /etc/service-guardian/config.json (or ~/.service-guardian/config.json for non-root users).

{
  // Monitoring
  "CHECK_INTERVAL": 30,              // Seconds between checks
  "HEALTH_CHECK_INTERVAL": 60,       // Seconds between health checks
  
  // Restart Settings
  "MAX_RESTARTS": 5,                 // Max restart attempts
  "RESTART_DELAY": 10,               // Initial delay (seconds)
  "RESTART_BACKOFF_MULTIPLIER": 2,   // Exponential backoff
  "MAX_RESTART_DELAY": 300,          // Max delay (seconds)
  
  // Alerts
  "ALERT_COOLDOWN": 600,             // Seconds between alerts
  "ALERT_BATCH_INTERVAL": 60,        // Batch window (seconds)
  "MAX_ALERTS_PER_HOUR": 10,         // Rate limiting
  
  // Email Settings (set via sg config email)
  "SMTP_HOST": "smtp.gmail.com",
  "SMTP_PORT": 587,
  "SMTP_USER": "[email protected]",
  "SMTP_PASS": "your-app-password",
  "EMAIL_FROM": "[email protected]",
  "EMAIL_TO": "[email protected]"
}

How It Works

1. Service Monitoring Flow

┌─────────────────┐
│ Cron Scheduler  │ Every 30 seconds
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Check Services  │ Parallel checks
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Analyze Status  │ Is service healthy?
└────────┬────────┘
         │
    ┌────┴────┐
    │ Healthy │ Not Healthy
    └────┬────┘
         │
         ▼
┌─────────────────┐
│ Failure Analysis│ Why did it fail?
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Recovery Actions│ Try to fix
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Auto-Restart?   │ If enabled
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Send Alert?     │ If enabled & not in cooldown
└─────────────────┘

2. Failure Detection

Service Guardian performs intelligent failure analysis:

// Not just "is process running?"
if (!service.isActive) {
  // Analyze WHY it's not running
  const analysis = await analyzeFailure(service);
  
  if (analysis.type === 'MANUAL_STOP') {
    // User stopped it, don't restart
    return;
  }
  
  if (analysis.type === 'OOM_KILL') {
    // Killed by OOM, check memory before restart
    if (memory.usage > 90%) {
      // Clean up memory first
      await clearSystemCache();
    }
  }
  
  // Smart restart with backoff
  await attemptRestart(service);
}

3. Health Checks

Beyond process monitoring, health checks verify services actually work:

// TCP Health Check Example
const mysql_health = {
  type: 'tcp',
  host: 'localhost',
  port: 3306,
  timeout: 10,
  interval: 60
};

// Results in user-friendly messages:
// ✅ "mysql is responding on localhost:3306"
// ❌ "mysql is not accepting connections on localhost:3306. 
//     The service may be down or not listening on this port.
//     Suggestion: Verify mysql is running with: systemctl status mysql"

4. Alert Aggregation

Intelligent batching reduces email spam:

// Instead of 10 emails in 1 minute:
// "nginx failed"
// "mysql failed"
// "redis failed"
// ...

// You get 1 comprehensive email:
// "3 services need attention:
//  - nginx: Connection refused on port 80
//  - mysql: OOM killed (memory: 95%)
//  - redis: Dependency postgres is down"

Real-World Examples

Example 1: MySQL OOM Protection

# Add MySQL with OOM recovery (auto-restart and alerts enabled by default)
sg add mysql --max-restarts 5

# Add health check to verify it's accepting connections
sg health add mysql --type tcp --port 3306

# Add recovery action to clear cache when memory is high
sg recovery add mysql --type clear-cache --threshold 90

When MySQL gets OOM-killed:

  1. Service Guardian detects the OOM kill (not just "service down")
  2. Checks system memory usage
  3. If memory > 90%, clears system cache first
  4. Restarts MySQL with exponential backoff
  5. Verifies it's accepting connections
  6. Sends detailed alert with memory stats and suggestions

Example 2: Dependent Services

# Setup WordPress stack with dependencies
sg add nginx
sg add php-fpm
sg add mysql

# Define dependencies
sg deps add nginx php-fpm
sg deps add php-fpm mysql

# If MySQL fails, Service Guardian will:
# 1. Restart MySQL first
# 2. Then restart php-fpm (depends on MySQL)
# 3. Then restart nginx (depends on php-fpm)

Example 3: Maintenance Windows

# Schedule maintenance window for updates
sg maintenance add "Weekly Updates" \
  --days sunday \
  --start 02:00 \
  --duration 2 \
  --services nginx,mysql,redis

# During maintenance:
# - No auto-restarts
# - No alerts
# - Services can be safely updated

Example 4: Custom Health Checks

# Create custom health check script
cat > /etc/service-guardian/health-checks/api-check.sh << 'EOF'
#!/bin/bash
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" http://localhost/api/health)
if [ "$RESPONSE" = "200" ]; then
  echo "API is healthy"
  exit 0
else
  echo "API returned status code: $RESPONSE"
  exit 1
fi
EOF

chmod +x /etc/service-guardian/health-checks/api-check.sh

# Add the health check
sg health add api --type script --script api-check.sh

Architecture

Security Features

  1. Input Validation - All inputs validated with JSON schemas
  2. Command Whitelisting - Only approved system commands
  3. Shell Escape - Prevents command injection
  4. Path Validation - Prevents directory traversal
  5. Secure Execution - Isolated command execution

Performance

  • Parallel Monitoring - Check multiple services simultaneously
  • Efficient Resource Usage - Minimal CPU and memory footprint
  • Optimized Queries - Batch operations where possible
  • Caching - Reduces repeated system calls

Reliability

  • Crash Recovery - Daemon automatically recovers from crashes
  • Data Persistence - Configuration and metrics survive restarts
  • Atomic Operations - Prevents partial updates
  • Graceful Shutdown - Cleanly stops all operations

Troubleshooting

Service Guardian won't start

# Check if already running
sg status

# Check logs for errors
sg logs --tail 50

# Verify Node.js version
node --version  # Should be >= 16.0.0

# Check permissions
ls -la /etc/service-guardian/

Services not being monitored

# Verify service is added
sg list

# Check if service exists
systemctl status <service-name>

# Test monitoring manually
sg check <service-name>

# Check dependencies
sg deps check

Not receiving alerts

# Test email configuration
sg config email --test

# Check alert settings
sg config show | grep ALERT

# View recent alerts
sg logs | grep "Alert sent"

# Check cooldown status
sg status --verbose

High memory usage

# Check metrics history
sg metrics --days 7

# Clear old metrics
sg metrics --cleanup

# Reduce check frequency
sg config set CHECK_INTERVAL 60

Development

Running Tests

npm test                 # Run all tests
npm run test:watch      # Watch mode
npm run test:coverage   # Coverage report

Contributing

For contributions, please see CONTRIBUTING.md or open an issue at https://github.com/derricksiawor/service-guardian/issues

License

This project is licensed under the MIT License - see the LICENSE file for details.

Copyright (c) 2025 Derrick S. K. Siawor

Author

Derrick S. K. Siawor Website: https://derricksiawor.com

Support

Acknowledgments

Built with enterprise-grade libraries:

  • Commander.js - CLI interface
  • Nodemailer - Email alerts
  • node-cron - Scheduling
  • Winston - Logging
  • Chalk - Terminal styling

Stop losing sleep over crashed services. Let Service Guardian keep watch.