@orkify/cli

v1.0.0

Published

11 days ago

Modern JS process orchestration and deployment for your own infrastructure

0High
0Medium
0Low

orkify-orkify

process-manager cluster socket.io sticky-sessions zero-downtime graceful-reload node daemon cli orkify

Modern JS process orchestration and deployment for your own infrastructure.

Features

Cluster Mode - Run multiple workers sharing the same port using Node's cluster module
Cross-Platform Load Balancing - True round-robin distribution across all workers on Linux, macOS, and Windows
Zero-Downtime Reload - Rolling restarts that replace workers one-by-one with no dropped requests
WebSocket Sticky Sessions - Built-in session affinity for Socket.IO and WebSocket connections
Process Persistence - Save running processes and restore them after reboot
Auto-Restart - Automatically restart crashed processes with configurable limits
File Watching - Reload on file changes during development
Log Rotation - Automatic log rotation with gzip compression and configurable retention
Deployment - Local and remote deploy with automatic rollback
Cron Scheduler - Built-in cron that dispatches HTTP requests to managed processes on a schedule
Native TypeScript - Run .ts files directly with no build step (Node.js 22.18+)
Shared Cluster Cache - Built-in in-memory cache with zero-config cross-worker sync via IPC
Next.js - Auto-detection, 'use cache' and ISR cache handlers, Server Actions encryption key, security header stripping (CVE-2025-29927), version skew protection, ISR request coalescing
Modern Stack - Pure ESM, TypeScript, Node.js 22.18+
MCP Integration - Built-in Model Context Protocol server for AI tool integration
orkify.com - Optional SaaS dashboard with deploy management, real-time metrics, log streaming, and remote control

Installation

npm install -g @orkify/cli

Or run directly with npx:

npx @orkify/cli up app.js

Quick Start

# Start a single process (daemon mode)
orkify up app.js

# TypeScript works out of the box — no build step
orkify up app.ts

# Start with one worker per CPU core
orkify up app.js -w 0

# Start with 4 clustered workers
orkify up app.js -w 4

# Start with a custom name
orkify up app.js -n my-api -w 4

# Enable file watching for development
orkify up app.js --watch

# Enable sticky sessions for Socket.IO
orkify up server.js -w 4 --sticky --port 3000

# Run in foreground (for containers like Docker/Kubernetes)
orkify run app.js -w 4

Commands

| Command | Description | | -------------------------------- | ---------------------------------------------------------------- | | orkify up <script> | Start a process (daemon mode) | | orkify down <name\|id\|all> | Stop process(es) | | orkify run <script> | Run in foreground (for containers) | | orkify restart <name\|id\|all> | Hard restart (stop + start) | | orkify reload <name\|id\|all> | Zero-downtime rolling reload | | orkify list | List all processes with status | | orkify list -f, --follow | Live monitoring — auto-refreshing process table (Ctrl+C to stop) | | orkify list -v, --verbose | Verbose list (includes PIDs) | | orkify list --all-users | List processes from all users (requires sudo) | | orkify logs [name] | View logs (-f to follow, -n lines, --err/--out) | | orkify delete <name\|id\|all> | Stop and remove from process list | | orkify flush [name\|id\|all] | Truncate logs and remove rotated archives | | orkify snap [file] [--no-env] | Snapshot current process list | | orkify restore [file] | Restore previously saved processes (--no-remote) | | orkify kill [--force] | Stop the daemon (--force skips graceful shutdown) | | orkify daemon-reload | Reload daemon code (snap → kill → restore) | | orkify autostart | Set up boot persistence | | orkify deploy pack [dir] | Create a deploy tarball | | orkify deploy local <tarball> | Deploy from a local tarball | | orkify deploy upload [dir] | Upload a build artifact for deployment | | orkify mcp | Start MCP server for AI tools (stdio) | | orkify mcp --simple-http | Start MCP HTTP server (runs inside daemon) | | orkify mcp stop | Stop the MCP HTTP server | | orkify mcp status | Show MCP HTTP server status | | orkify mcp keygen | Generate a new MCP API key |

Options for `up` and `run`

-n, --name <name>         Process name
-w, --workers <number>  Number of workers (0 = CPU cores, -1 = CPUs-1)
--watch                   Watch for file changes and reload (up only)
--watch-paths <paths...>  Specific paths to watch (up only)
--cwd <path>              Working directory
--node-args="<args>"      Arguments passed to Node.js (quoted)
--args="<args>"           Arguments passed to your script (quoted)
--kill-timeout <ms>       Graceful shutdown timeout (default: 5000)
--max-restarts <count>    Max restart attempts (default: 10)
--min-uptime <ms>         Min uptime before restart counts (default: 1000)
--restart-delay <ms>      Delay between restarts (default: 100)
--sticky                  Enable sticky sessions for WebSocket/Socket.IO
--port <port>             Port for sticky routing (defaults to PORT env)
--reload-retries <count>  Retries per worker slot during reload (0-3, default: 3)
--health-check <path>     Health check endpoint (e.g. /health, requires --port)
--log-max-size <size>     Max log file size before rotation (default: 100M)
--log-max-files <count>   Rotated log files to keep (default: 90, 0 = no rotation)
--log-max-age <days>      Delete rotated logs older than N days (default: 90, 0 = no limit)
--cron <spec>             Cron job (repeatable) — see Cron Scheduler

--restart-on-mem <size> — Restart when a worker's RSS exceeds threshold (e.g. 512M, 1G)

Cluster Mode

When you specify -w <workers> with more than 1 worker, ORKIFY runs your app in cluster mode:

orkify up server.js -w 4

This spawns a primary process that manages 4 worker processes. All workers share the same port - Node's cluster module handles the load balancing automatically.

┌──────┬──────────┬─────────┬───┬───┬────────┬──────┬──────────┬────────┐
│ id   │ name     │ mode    │ ↺ │ ✘ │ status │ cpu  │ mem      │ uptime │
├──────┼──────────┼─────────┼───┼───┼────────┼──────┼──────────┼────────┤
│ 0    │ server   │ cluster │ 0 │ 0 │ online │ 0.0% │ 192.1 MB │ -      │
│ │    ├──────────┼─────────┼───┼───┼────────┼──────┼──────────┼────────┤
│ ├─ 0 │ worker 0 │         │ 0 │ 0 │ online │ 0.0% │ 48.2 MB  │ 5m     │
│ │    ├──────────┼─────────┼───┼───┼────────┼──────┼──────────┼────────┤
│ ├─ 1 │ worker 1 │         │ 0 │ 0 │ online │ 0.0% │ 47.9 MB  │ 5m     │
│ │    ├──────────┼─────────┼───┼───┼────────┼──────┼──────────┼────────┤
│ ├─ 2 │ worker 2 │         │ 0 │ 0 │ online │ 0.0% │ 48.1 MB  │ 5m     │
│ │    ├──────────┼─────────┼───┼───┼────────┼──────┼──────────┼────────┤
│ └─ 3 │ worker 3 │         │ 0 │ 0 │ online │ 0.0% │ 48.0 MB  │ 5m     │
└──────┴──────────┴─────────┴───┴───┴────────┴──────┴──────────┴────────┘

Zero-Downtime Reload

The reload command performs a rolling restart:

Spawn a new worker
Wait for it to signal ready
Gracefully stop the old worker
Repeat for each worker

orkify reload my-api

During reload, there's always at least one worker handling requests - no downtime.

Reload Failure Handling

Each worker slot gets up to N retries during reload (default 3, max 3, configurable with --reload-retries):

# Disable retries (immediate failure on first timeout)
orkify up app.js -w 4 --reload-retries 0

# Use 1 retry per slot
orkify up app.js -w 4 --reload-retries 1

If a new worker fails to become ready after all retries:

The old worker is kept alive (no process loss)
The worker is marked as stale — shown as online (stale) in orkify list
Remaining worker slots are aborted to prevent cascading failures

Fix the issue and reload again — a successful reload clears all stale flags.

Memory Threshold Restart

Automatically restart workers when their RSS memory exceeds a threshold — a safety net for memory-leaking apps:

orkify up server.js -w 4 --restart-on-mem 512M

How it works:

Checked every 1 second (piggybacks on the existing stats collection interval)
Per-worker: each worker is checked individually against the threshold, not the aggregate cluster total
Cluster mode: zero-downtime — a replacement worker is spawned and must become ready before the old one is stopped
Fork mode: the process is stopped then restarted (brief downtime is unavoidable with a single process)
30-second cooldown per worker after each memory restart to let the new process stabilize
Counts as a restart (visible in orkify list) but not a crash — does not count toward --max-restarts

Worker Readiness

orkify auto-detects when your app starts listening on a port — no extra code needed. If your app calls server.listen(), workers are automatically marked as online. This works in both fork mode and cluster mode.

For background workers or queue consumers that don't bind a port, signal ready manually:

// Only needed for apps that don't call server.listen()
if (process.send) {
  process.send('ready');
}

Both signals are equivalent — whichever arrives first marks the worker as online. If neither arrives within 30 seconds, the worker is marked as errored.

Health Check Readiness

When --health-check is set (e.g. --health-check /health), orkify performs an HTTP readiness check after a worker signals ready but before declaring it online:

orkify up server.js -w 4 --port 3000 --health-check /health

The flow:

Worker signals ready (listening event or process.send('ready'))
orkify hits http://localhost:{port}{healthCheck} — retries up to 3 times with 1s delay
If 2xx response → worker is declared online (old worker can be stopped during reload)
If all retries fail → worker is treated as failed

This applies to all reloads, not just deploys. If --health-check is set but --port is not, the health check is skipped.

Graceful Shutdown

Handle SIGTERM to gracefully drain connections:

process.on('SIGTERM', () => {
  server.close(() => {
    process.exit(0);
  });
});

Environment Variables

orkify sets these environment variables on every managed process:

| Variable | Description | | --------------------- | ------------------------------------------------------------ | | ORKIFY_PROCESS_ID | Process ID in orkify | | ORKIFY_PROCESS_NAME | Process name (from -n flag) | | ORKIFY_WORKER_ID | Worker index: 0 in fork mode, 0 to N-1 in cluster mode | | ORKIFY_CLUSTER_MODE | "true" in cluster mode, unset in fork mode | | ORKIFY_WORKERS | Total number of workers | | ORKIFY_EXEC_MODE | "fork" or "cluster" |

Detecting the Primary Worker

Worker IDs are stable — ORKIFY_WORKER_ID=0 survives crashes, restarts, and zero-downtime reloads. Use it to elect a primary worker for singletons (database connections, WebSocket clients, cron-like tasks):

const isPrimary = !process.env.ORKIFY_WORKER_ID || process.env.ORKIFY_WORKER_ID === '0';

if (isPrimary) {
  // Only worker 0 connects to Discord, runs scheduled jobs, etc.
  startSingletonService();
}

Worker IPC (Broadcasting)

In cluster mode, workers can send messages to all other workers via the primary process. Send a message with type: 'broadcast' and orkify relays it to every sibling:

// Worker 1: send a cache-invalidation signal
process.send?.({
  __orkify: true,
  type: 'broadcast',
  channel: 'cache:invalidate',
  data: { key: 'users:123' },
});

// All other workers receive it:
process.on('message', (msg) => {
  if (msg?.__orkify && msg.type === 'broadcast' && msg.channel === 'cache:invalidate') {
    cache.delete(msg.data.key);
  }
});

Messages must have __orkify: true and type: 'broadcast'. The channel and data fields are yours to define. The sending worker does not receive its own broadcast — only siblings do.

Request/response pattern: To route a request to a specific worker (e.g., worker 0 for singletons), broadcast the request. Worker 0 picks it up via isPrimary, processes it, and broadcasts the response. Other workers ignore both messages since they don't match any pending request ID.

Shared Cluster Cache

orkify ships a built-in shared cache that works across cluster workers with zero external dependencies. Reads are synchronous Map lookups — faster than localhost Redis with no network round trip, no serialization, and no async overhead.

import { cache } from '@orkify/cache';

cache.set('user:123', userData, { ttl: 300 }); // write + broadcast
cache.set('key', value, { ttl: 300, tags: ['group'] }); // with tags
cache.get<User>('user:123'); // sync — in-memory only
await cache.getAsync<User>('user:123'); // async — memory first, then disk
await cache.incr('rl:user:42', 1, { ttlIfNew: 60 }); // atomic counter / rate limit
cache.invalidateTag('group'); // bulk invalidation across workers
cache.stats(); // { size, hits, misses, hitRate, totalBytes, diskSize }

LRU eviction (entry-count and byte-based) with two-tier architecture (memory + disk)
TTL expiration and tag-based group invalidation with timestamps
Cluster-safe: automatic IPC synchronization, snapshots for new workers
Atomic incr() for counters and rate limits — see below
V8 serialization (Map, Set, Date, RegExp, Error, ArrayBuffer, TypedArray)
File-backed persistence — evicted entries spill to disk, survive restarts

Works standalone (npm run dev), in fork mode, and in cluster mode with zero code changes. The API is identical in every mode.

cache.incr(key, delta?, { ttlIfNew? }) atomically bumps an integer entry and returns the new value. In cluster mode the increment runs on the primary, so concurrent calls from many workers never lose updates. ttlIfNew applies only on creation — subsequent increments don't reset the timer. That makes it a one-line rate-limit bucket without the classic Redis INCR + EXPIRE race:

const count = await cache.incr(`rl:${apiKey}:${minute}`, 1, { ttlIfNew: 120 });
if (count > limitPerMinute) return res.status(429).end();

Full documentation

Next.js

orkify auto-detects Next.js apps (via package.json or next.config) and applies production hardening automatically: encryption key management, security header stripping, and version skew protection.

For the full setup — standalone builds, static file handling, cache handlers, and error tracking — install the companion package and follow the build & deploy guide:

npm install @orkify/next

Build & deploy — requires output: 'standalone' in your Next.js config and copying public/ + .next/static/ into the standalone directory after building.
Cache handlers — drop-in replacements for Next.js 16 'use cache' and ISR cache, backed by @orkify/cache. Tag invalidation propagates across all cluster workers. ISR request coalescing prevents duplicate revalidations.
Browser error tracking — captures window.onerror and unhandled rejections, normalizes cross-browser stacks to V8 format, and relays them through the regular telemetry pipeline. Zero additional API calls.
Server Actions encryption — auto-generates a stable NEXT_SERVER_ACTIONS_ENCRYPTION_KEY consistent across workers and reloads.
Security header stripping — blocks CVE-2025-29927 (middleware bypass) and CVE-2024-46982 (cache poisoning) by stripping dangerous headers from external requests.
Version skew protection — auto-sets NEXT_DEPLOYMENT_ID during deploys so old/new workers coexist safely.

Full documentation | Example project

Socket.IO / WebSocket Support

For WebSocket applications, use the --sticky flag to ensure connections from the same client always route to the same worker:

orkify up socket-server.js -w 4 --sticky --port 3000

This extracts session IDs from Socket.IO handshakes and consistently routes connections to the same worker based on a hash of the session ID.

Log Rotation

orkify automatically rotates process logs to prevent unbounded disk growth. Logs are written to ~/.orkify/logs/ and rotated when a file exceeds the size threshold or on the first write of a new day.

How It Works

When a log file exceeds --log-max-size (default: 100 MB) or a new calendar day starts, orkify rotates the file
The rotated file is compressed with gzip in the background (typically ~90% compression)
Archives older than --log-max-age days are deleted
If the archive count still exceeds --log-max-files, the oldest are pruned

Defaults

| Setting | Default | Description | | ----------------- | ------- | ----------------------------------------- | | --log-max-size | 100M | Rotate when file exceeds 100 MB | | --log-max-files | 90 | Keep up to 90 rotated archives per stream | | --log-max-age | 90 | Delete archives older than 90 days |

With defaults, each process uses at most ~1 GB of log storage: one 100 MB active file + up to 90 compressed archives (~10 MB each at ~90% compression).

File Layout

~/.orkify/logs/
  myapp.stdout.log                            # active (current writes)
  myapp.stdout.log-20260215T091200.123.gz     # rotated + compressed
  myapp.stdout.log-20260216T143052.456.gz
  myapp.stderr.log                            # active stderr
  myapp.stderr.log-20260217T080000.789.gz

Configuration

# Custom rotation settings
orkify up app.js --log-max-size 50M --log-max-files 30 --log-max-age 30

# Disable rotation (logs grow unbounded)
orkify up app.js --log-max-files 0

# Size accepts K, M, G suffixes
orkify up app.js --log-max-size 500K
orkify up app.js --log-max-size 1G

Viewing Logs

# View last 100 lines (default)
orkify logs my-api

# View last 500 lines
orkify logs my-api -n 500

# Follow log output (stream new logs)
orkify logs my-api -f

# Show only stdout or stderr
orkify logs my-api --out
orkify logs my-api --err

Flushing Logs

Truncate active log files and remove all rotated archives:

# Flush logs for all processes
orkify flush

# Flush logs for a specific process
orkify flush my-api

Environment Files

ORKIFY supports loading environment variables from .env files using Node.js native --env-file flag (Node 20.6+). Pass it via --node-args:

# Daemon mode
orkify up app.js -w 4 --node-args="--env-file=.env"

# Foreground mode
orkify run app.js -w 4 --node-args="--env-file=.env"

# Multiple node args
orkify up app.js --node-args="--env-file=.env --max-old-space-size=4096"

The env file format:

# .env
DATABASE_URL=postgres://localhost:5432/mydb
API_KEY=secret-key-123
NODE_ENV=production

Environment variables are passed to both the primary process and all workers in cluster mode.

Keeping Secrets Out of State

By default orkify snap persists the full process environment (including process.env inherited values like PATH, HOME, API keys, etc.) into ~/.orkify/snapshot.yml. Use --no-env to omit environment variables from the snapshot:

# Start with env loaded from .env file
orkify up app.js -n my-api -w 4 --node-args="--env-file=.env"

# Save without baking env vars into snapshot.yml
orkify snap --no-env

# Snap to a custom file for use as a declarative config
orkify snap config/processes.yml

Processes restored via orkify restore after a --no-env snap will inherit the daemon's own environment. Combined with --node-args "--env-file .env", secrets stay in your .env file and are never duplicated into the snapshot.

Snapshot File

orkify snap writes a YAML file to ~/.orkify/snapshot.yml by default. orkify restore reads from the same path.

# Save and restore — most common usage
orkify snap
orkify restore

# Custom file paths
orkify snap config/processes.yml
orkify restore config/processes.yml

File format

version: 1
processes:
  - name: 'api'
    script: '/app/server.js'
    cwd: '/app'
    workerCount: 4
    execMode: 'cluster'
    watch: false
    env:
      NODE_ENV: 'production'
      PORT: '3000'
    nodeArgs: []
    args: []
    killTimeout: 5000
    maxRestarts: 10
    minUptime: 1000
    restartDelay: 100
    sticky: false
mcp:
  transport: 'simple-http'
  port: 8787
  bind: '127.0.0.1'
  cors: '*'

The mcp section is only present when the MCP HTTP server is running at snapshot time. Old snapshots without mcp are loaded normally — orkify restore skips MCP startup in that case.

Restore behavior

When you run orkify restore, the behavior depends on whether an API key and deploy metadata are present:

With ORKIFY_API_KEY + active deploy — orkify first tries to restore from the remote deploy API. If the remote call fails, it falls back to the local snapshot file automatically.
Without API key or deploy — orkify goes straight to the local snapshot file (~/.orkify/snapshot.yml).
--no-remote — skips the remote deploy check entirely, always uses the local snapshot.

# Restore from remote deploy (if configured), otherwise snapshot
orkify restore

# Always use local snapshot, ignore remote deploy
orkify restore --no-remote

The file is plain YAML so you can hand-edit it and use it as a declarative config. Here's what it looks like:

version: 1
processes:
  - name: 'my-api'
    script: '/app/dist/server.js'
    cwd: '/app'
    workerCount: 4
    execMode: 'cluster'
    watch: false
    env:
      NODE_ENV: 'production'
    nodeArgs:
      - '--max-old-space-size=4096'
    args: []
    killTimeout: 5000
    maxRestarts: 10
    minUptime: 1000
    restartDelay: 100
    sticky: false
    port: 3000

Required fields:

| Field | Description | | ----------- | --------------------------------------------------------- | | processes | Array of process configs | | script | Path to the entry script (absolute, or relative to cwd) |

Optional fields:

| Field | Default | --------------- | version | name | cwd | workerCount | execMode | watch | watchPaths | env | nodeArgs | args | killTimeout | maxRestarts | minUptime | restartDelay | sticky | port | reloadRetries | healthCheck | — | cron | logMaxSize | logMaxFiles | logMaxAge | Description | | ------------------ | -------------------------------------------------------- | | 1 | Schema version | | basename of script | Process name | | daemon working dir | Working directory | | 1 | Number of workers (1 = fork mode, >1 = cluster) | | from workerCount | "fork" or "cluster" | | false | Watch for file changes | | — | Specific paths to watch | | — | Environment variables | | — | Node.js CLI flags (e.g. ["--inspect"]) | | — | Script arguments | | 5000 | Graceful shutdown timeout in ms | | 10 | Max auto-restart attempts | | 1000 | Min uptime before a restart counts toward the limit (ms) | | 100 | Delay between restarts in ms | | false | Enable sticky sessions for WebSocket/Socket.IO | | — | Port for sticky session routing | | 3 | Retries per worker slot during reload (0-3) | | Health check endpoint path (e.g. /health) | | — | Cron jobs (array of schedule + path) | | 104857600 | Max log file size in bytes before rotation (100 MB) | | 90 | Max rotated log files to keep (0 = no rotation) | | 7776000000 | Max age of rotated logs in ms (90 days, 0 = no limit) |

A minimal config:

version: 1
processes:
  - script: /app/dist/server.js

All string values are double-quoted in the generated file to prevent YAML type coercion (e.g. "3000" stays a string, not an integer). If you hand-edit the file, unquoted env values like PORT: 3000 or DEBUG: true are automatically coerced back to strings when loaded. Quoting is still recommended to avoid surprises (e.g. 1.0 parses as 1).

Boot Persistence

To automatically restore processes after a server reboot, use the provided systemd service template.

# Find your orkify binary path
which orkify

# Copy the template unit (shipped with the npm package)
sudo cp $(npm root -g)/orkify/boot/systemd/[email protected] /etc/systemd/system/

# If your orkify binary is not at /usr/local/bin/orkify, edit the unit file:
#   sudo systemctl edit orkify@  →  override ExecStart/ExecStop paths

# Enable for your user
sudo systemctl daemon-reload
sudo systemctl enable orkify@$(whoami)

The @ template runs as the user you specify after the @. Replace $(whoami) with any username:

# Run as the "deploy" user
sudo systemctl enable orkify@deploy

# Run as "app"
sudo systemctl enable orkify@app

On boot the service calls orkify restore to bring back all snapshotted processes, and orkify kill on stop. Each user has their own isolated process list under ~/.orkify/.

Make sure to snapshot your processes so there is something to restore:

orkify snap

Environment Variables (optional)

To inject environment variables (API keys, database credentials, etc.) into your managed processes, create an env file:

sudo mkdir -p /etc/orkify
sudo touch /etc/orkify/env
sudo chmod 600 /etc/orkify/env

The service template looks for /etc/orkify/env and loads it if present. Variables defined there are available to all orkify-managed processes. The file is read by systemd as root before dropping privileges, so chmod 600 keeps your secrets safe while still injecting them into the process environment.

Starting

To start immediately without rebooting:

sudo systemctl start orkify@$(whoami)

Container Mode

Use orkify run for Docker, Kubernetes, or any container environment where you need the process in the foreground.

Why `run` instead of `up`?

| Mode | Command | Use Case | | ------------- | ------------ | ------------------------------------------- | | Daemon | orkify up | Development, servers, long-running services | | Container | orkify run | Docker, Kubernetes, any PID 1 scenario |

In containers, processes run as PID 1 and must handle signals directly. The run command:

Runs in the foreground (no daemon)
Properly forwards SIGTERM/SIGINT to child processes
Exits with correct exit codes for orchestrators
Supports graceful shutdown with configurable timeout

Single Instance (Fork Mode)

Best for most containers where the orchestrator handles scaling:

FROM node:22-alpine
WORKDIR /app
COPY . .
RUN npm install && npm run build

CMD ["orkify", "run", "app.js", "--silent"]

# docker-compose.yml
services:
  api:
    build: .
    deploy:
      replicas: 4 # Let Docker/K8s handle scaling

Cluster Mode (Multi-Core Containers)

For containers with multiple CPUs where you want in-process clustering:

CMD ["orkify", "run", "app.js", "-w", "4", "--silent"]

# kubernetes deployment
spec:
  containers:
    - name: api
      command: ['orkify', 'run', 'app.js', '-w', '4', '--silent']
      resources:
        limits:
          cpu: '4' # Match -w count to CPU limit

Socket.IO in Containers

CMD ["orkify", "run", "server.js", "-w", "4", "--sticky", "--port", "3000", "--silent"]

Container Options

The run command supports the same core options as up:

-n, --name <name>         Process name
-w, --workers <number>    Number of workers (cluster mode)
--cwd <path>              Working directory
--node-args="<args>"      Arguments passed to Node.js (quoted)
--args="<args>"           Arguments passed to your script (quoted)
--sticky                  Enable sticky sessions for Socket.IO
--port <port>             Port for sticky session routing
--kill-timeout <ms>       Graceful shutdown timeout (default: 5000)
--reload-retries <count>  Retries per worker slot during reload (0-3, default: 3)
--silent                  Suppress startup messages (cleaner container logs)

Signal Handling

The run command properly handles container signals:

Container Orchestrator
        │
        │ SIGTERM (graceful stop)
        ▼
┌─────────────────┐
│   orkify run    │
│                 │──► Forwards SIGTERM to child
│  kill-timeout   │──► Waits up to --kill-timeout ms
│                 │──► SIGKILL if timeout exceeded
└────────┬────────┘
         │
         ▼
   Exit code 0 (graceful) or 143 (SIGTERM) or 137 (SIGKILL)

SIGTERM/SIGINT/SIGHUP → Forwarded to child process(es)
Graceful shutdown → Waits for --kill-timeout ms (default: 5000)
SIGKILL fallback → Force kills if child doesn't exit in time
Exit codes → Preserves child exit code (or 128 + signal number)

Quick Reference

| Scenario | Command | | ---------------------- | ------------------------------------------------------ | | Simple container | orkify run app.js --silent | | Multi-core container | orkify run app.js -w 4 --silent | | Socket.IO in container | orkify run app.js -w 4 --sticky --port 3000 --silent | | Development (verbose) | orkify run app.js | | Long graceful shutdown | orkify run app.js --kill-timeout 30000 --silent |

Deployment

orkify includes built-in deployment with automatic rollback. Create a tarball of your project, deploy it locally or through orkify.com, and orkify handles extract → install → build → symlink → reconcile → monitor.

How It Works

Pack — orkify deploy pack creates a tarball of your project
Deploy — Deploy locally with orkify deploy local, or upload to orkify.com with orkify deploy upload and trigger from the dashboard
Execute — orkify extracts the artifact, runs install/build, and starts your app
Monitor — orkify watches for crashes after deploy and automatically rolls back if workers fail

Deploy Quick Start

# First time: configure deploy settings (saved to orkify.yml)
orkify deploy upload --interactive

# Upload an artifact (defaults to current directory)
orkify deploy upload

# Upload from a specific directory
orkify deploy upload ./myapp

# Bump package.json patch version and upload (e.g. 1.0.0 → 1.0.1)
orkify deploy upload --npm-version-patch

# Explicit API key (alternative to ORKIFY_API_KEY env var)
orkify deploy upload --api-key orkify_xxx

Upload Options

| Flag | Description | | --------------------- | ------------------------------------------------- | | --interactive | Prompt for deploy settings (saved to orkify.yml) | | --npm-version-patch | Bump package.json patch version before upload | | --api-key <key> | API key (alternative to ORKIFY_API_KEY env var) | | --api-host <url> | Override API host URL |

Local Deploy

Deploy from a local tarball — useful for self-managed servers, air-gapped environments, and custom CI/CD pipelines.

# Create a deploy artifact
orkify deploy pack ./myapp --output myapp.tar.gz

# Copy to server and deploy
scp myapp.tar.gz server:~/
ssh server orkify deploy local myapp.tar.gz

# With environment variables
orkify deploy local myapp.tar.gz --env-file .env.production

Deploy Configuration

Deploy configuration is stored in orkify.yml at your project root:

version: 1

deploy:
  install: npm ci
  build: npm run build
  crashWindow: 30
  buildEnv:
    NEXT_PUBLIC_API_URL: 'https://api.example.com'
    NEXT_PUBLIC_SITE_NAME: 'My App'

processes:
  - name: api
    script: dist/server.js
    workerCount: 4
    sticky: true
    port: 3000
    healthCheck: /health
  - name: worker
    script: dist/worker.js
    workerCount: 2

The deploy section configures build/install steps. The processes section defines what gets started — the same format used by orkify snap.

Deploy Options

| Field | Description | | ------------- | -------------------------------------------------------------------------------- | | install | Install command (auto-detected: npm, yarn, pnpm, bun) | | build | Build command (optional, runs after install) | | buildEnv | Build-time-only env vars (e.g. NEXT_PUBLIC_*). Not passed to runtime processes | | crashWindow | Seconds to monitor for crashes after deploy (default: 30) |

Deploy Lifecycle

Pack → [Upload] → Extract → Install → Build → Reconcile → Monitor → Success
                                                                  │
                                                 Crash detected?  │
                                                                  ▼
                                                      Auto-rollback

On deploy (both local and remote), orkify reconciles running processes against the processes in orkify.yml:

New processes are started
Unchanged processes get a zero-downtime reload
Changed processes (different script, worker count, etc.) are replaced
Removed processes are stopped

The daemon keeps the previous release on disk. If workers crash within the monitoring window, orkify automatically rolls back to the previous version.

SaaS Platform

orkify.com is an optional paid companion that provides:

Deploy management — Upload artifacts, trigger deploys, track rollout status
Real-time metrics — CPU, memory, and event loop monitoring with historical data
Log streaming — Centralized log aggregation from all your servers
Crash detection — Automatic error capture with stack traces and context
Remote control — Start, stop, restart, and reload processes from the dashboard
Secrets management — Encrypted environment variables injected at deploy time
Multi-server — Manage processes across all your servers from one dashboard
Support chat — Embeddable widget that routes visitor messages to Discord —

The CLI works standalone without orkify.com. Connect it by setting an API key:

ORKIFY_API_KEY=orkify_xxx orkify up app.js

Source Map Support

When your application uses a bundler (webpack, esbuild, turbopack, rollup, vite), errors from minified or bundled code are automatically resolved to their original source locations using source maps.

The daemon reads .map files from disk at runtime and resolves every frame in the error's stack trace back to the original file, line, and column. The dashboard then shows the original source code instead of minified output. Resolution happens entirely on your server — source maps and original source code never leave your infrastructure. Unlike services that require uploading maps to external servers, there is no build-time upload step and no risk of source code exposure.

This works automatically when .map files are present alongside the bundled output. All major bundlers include sourcesContent in their source maps by default, so resolution works even when original source files aren't on disk.

Next.js

Next.js does not emit server-side source maps by default. To enable them, add the following to your next.config.ts:

const nextConfig: NextConfig = {
  experimental: {
    serverSourceMaps: true,
  },
};

This applies to both webpack and turbopack modes. With this option enabled, errors from API routes and server components will resolve to the original TypeScript source. Browser errors captured via Frontend Error Tracking go through the same source map resolution pipeline, so minified client-side stacks are mapped back to original source locations.

Deploy Artifacts

Source maps are available on the deploy target as long as your bundler generates them. In most setups, the build output directory (.next/, dist/) is gitignored and excluded from the tarball — the deploy build step regenerates everything including .map files on the target.

If your build output is committed and you want to exclude .map files from the artifact (smaller uploads), set sourcemaps: false in your orkify.yml:

deploy:
  install: npm ci
  build: npm run build
  sourcemaps: false

Or use the --no-sourcemaps flag:

orkify deploy upload --no-sourcemaps
orkify deploy pack --no-sourcemaps

Error Grouping

Errors are grouped on the dashboard by a fingerprint computed from the error type, message, file, and function name:

Function name over line number. When a function name is available (from the stack trace or source map), the fingerprint uses file + function name instead of file + line number. This means errors stay grouped even when lines shift between deploys.
Error type included. A TypeError and a ReferenceError at the same location produce different groups.
Message normalization. Dynamic values (UUIDs, numbers, IP addresses, hex strings) are stripped from the message before hashing, so "User 123 not found" and "User 456 not found" group together.
Fallback. When no function name is available (anonymous functions, top-level code), the fingerprint falls back to file + line number.

If you upgrade from a version without this algorithm, existing error groups will re-fingerprint once. This is expected — the new groups are more stable.

Cron Scheduler

The daemon includes a built-in cron scheduler that dispatches HTTP requests to managed processes on a schedule. This lets you trigger periodic tasks (health checks, cleanup jobs, cache warming) without external cron infrastructure.

Usage

# Run a cron job every 2 minutes
orkify up app.js --cron "*/2 * * * * /api/cron/heartbeat-check"

# Multiple cron jobs
orkify up app.js \
  --cron "*/2 * * * * /api/cron/heartbeat-check" \
  --cron "0 * * * * /api/cron/cleanup"

The --cron format is "<schedule> <path>" — the last whitespace-delimited token is the HTTP path, everything before it is the cron expression.

Ecosystem Config

# orkify.yml
processes:
  - name: web
    script: server.js
    workers: 4
    cron:
      - schedule: '*/2 * * * *'
        path: /api/cron/heartbeat-check
      - schedule: '0 * * * *'
        path: /api/cron/cleanup
        method: POST # default: GET
        timeout: 60000 # ms, default: 30000

How It Works

When a job is due, the scheduler looks up the process port via the orchestrator
It makes an HTTP request to http://localhost:{port}{path} with the cron secret as Authorization: Bearer <secret>
In cluster mode, the OS routes each request to a single worker — no duplication across workers
The port is auto-detected when your app calls server.listen() — works in both fork and cluster mode

Limits

| Limit | Value | Reason | | ---------------- | -------- | --------------------------------------------------------------------------- | | Minimum interval | 1 minute | Cron has minute granularity; jobs fire within seconds of their target time | | Maximum interval | 24 hours | Cron jobs running less frequently than daily should use external scheduling |

Sub-minute schedules (e.g. 6-field expressions with seconds like */30 * * * * *) are rejected at registration time with a clear error.

Overlap Prevention

Each job tracks a running flag. If a previous invocation is still in-flight when the next tick fires, the job is skipped. This prevents slow handlers from stacking up.

Cron Secret

When cron jobs are configured, orkify generates a random secret per process and:

Sets ORKIFY_CRON_SECRET in the child process environment
Sends it as Authorization: Bearer <secret> on every cron request

Your route should validate the header to ensure only the daemon can trigger it:

export async function GET(request: NextRequest) {
  const authHeader = request.headers.get('authorization');
  if (authHeader !== `Bearer ${process.env.ORKIFY_CRON_SECRET}`) {
    return new Response('Unauthorized', { status: 401 });
  }
  // ... handle cron job
}

The secret is regenerated on every process spawn — no config needed. You can also check process.env.ORKIFY_CRON_SECRET to detect whether orkify cron is active (e.g. to skip internal timers).

Persistence and Recovery

Cron jobs are part of the process config and persisted in snapshots. They survive:

orkify snap / orkify restore — cron config is saved and restored with the snapshot
orkify daemon-reload — the daemon captures running configs (including cron), starts a new daemon, and restores them
Daemon crash — crash recovery spawns a new daemon and restores all process configs including cron jobs

In all cases, cron jobs are re-registered automatically when the process is restored. The first tick after recovery evaluates the cron expression from the current time, so no "catch-up" runs are fired for ticks missed while the daemon was down.

Edge Cases

| Scenario | Behavior | | ------------------------------------ | --------------------------------------------------------------------------- | | Process has no detected port | Job logs "no port detected, skipping" and advances to next run | | Process is stopped (orkify down) | Cron jobs are unregistered immediately | | Process is deleted (orkify delete) | Cron jobs are unregistered immediately | | HTTP request fails or times out | Error is logged, job advances to next run | | Daemon crashes mid-tick | Crash recovery restores all configs; in-flight requests are lost (no retry) | | Invalid cron expression | Rejected at registration with an error message | | Deploy reconcile | New cron config from orkify.yml is registered after reconcile completes |

MCP Integration

orkify includes a built-in Model Context Protocol server, enabling AI assistants like Claude Code to manage your processes directly. It supports two transports:

Stdio (default) — for local AI tools running on the same machine. No auth needed.
HTTP — for remote AI agents over the network. Authenticated via bearer tokens.

Stdio Mode (Local)

Stdio is the default transport. The MCP client spawns orkify as a subprocess — same user, same machine, no network involved. No authentication is required.

There are three ways to register the MCP server with your AI tools:

Option A — `add-mcp` (multi-tool)

add-mcp auto-detects installed AI tools (Claude Code, Cursor, VS Code, Windsurf, etc.) and writes the correct config for each one.

# Install once (global)
npm install -g add-mcp

# Auto-detect tools and register orkify (interactive)
npx add-mcp "orkify mcp"

# Register globally (user-level, all projects)
npx add-mcp "orkify mcp" -g

# Target a specific tool
npx add-mcp "orkify mcp" -a claude-code
npx add-mcp "orkify mcp" -a cursor
npx add-mcp "orkify mcp" -a vscode

Option B — Claude Code CLI

If you only use Claude Code:

claude mcp add orkify -- orkify mcp

Option C — Manual JSON

Add to your Claude Code MCP settings (~/.claude/settings.json):

{
  "mcpServers": {
    "orkify": {
      "command": "orkify",
      "args": ["mcp"]
    }
  }
}

For Cursor, VS Code, and other tools, consult their docs for the equivalent MCP config location.

HTTP Mode (Remote)

HTTP mode starts an authenticated HTTP server inside the daemon that remote AI agents can connect to. Because it runs in-process with the daemon, the MCP server is automatically managed by orkify kill, orkify snap/orkify restore, orkify daemon-reload, and crash recovery.

1. Generate a key

# Full access (all tools)
orkify mcp keygen --name "my-agent"

# Read-only (list and logs only)
orkify mcp keygen --name "monitor" --tools list,logs

# Ops access (specific tools)
orkify mcp keygen --name "ops" --tools list,logs,restart,reload,down

# Restrict to specific IPs (individual or CIDR)
orkify mcp keygen --name "ci-agent" --allowed-ips "10.0.0.0/8,192.168.1.50"

The command prints the token to stdout and adds it to ~/.orkify/mcp.yml.

2. Start the HTTP server

# Default: localhost:8787
orkify mcp --simple-http

# Custom port and bind address
orkify mcp --simple-http --port 9090 --bind 0.0.0.0

3. Manage the HTTP server

# Check if the MCP HTTP server is running
orkify mcp status

# Stop the MCP HTTP server
orkify mcp stop

4. Connect a client

MCP clients authenticate with Authorization: Bearer <token>:

curl -X POST http://localhost:8787/mcp \
  -H "Authorization: Bearer orkify_mcp_..." \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}'

5. Register with AI tools

add-mcp:

npx add-mcp "http://your-server:8787/mcp" \
  --header "Authorization: Bearer orkify_mcp_..." \
  -n orkify

Claude Code CLI:

claude mcp add --transport http \
  --header "Authorization: Bearer orkify_mcp_..." \
  orkify http://your-server:8787/mcp

HTTP Options

--simple-http              Use HTTP transport with local key auth
--port <port>          HTTP port (default: 8787)
--bind <address>       HTTP bind address (default: 127.0.0.1)
--cors <origin>        Enable CORS ("*", a specific URL, or comma-separated URLs)

CORS (Browser Clients)

By default, browser-based MCP clients are blocked by CORS policy. Enable CORS with the --cors flag:

# Allow any origin
orkify mcp --simple-http --cors "*"

# Allow a specific origin
orkify mcp --simple-http --cors "https://dashboard.example.com"

When a specific origin is set (not *), the server includes a Vary: Origin header for correct HTTP caching. OPTIONS preflight requests are handled automatically and cached for 24 hours.

Key Management

Keys are stored in ~/.orkify/mcp.yml (created with 0600 permissions):

keys:
  - name: my-agent
    token: orkify_mcp_a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6
    tools:
      - '*'

  - name: monitor
    token: orkify_mcp_f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1
    tools:
      - list
      - logs

  - name: ci-agent
    token: orkify_mcp_9876543210ab9876543210ab9876543210ab9876543210ab
    tools:
      - list
      - logs
      - restart
      - reload
      - down
    allowedIps:
      - 10.0.0.0/8
      - 192.168.1.50

Each key has:

| Field | Description | | ------------ | ---------------------------------------------------------------- | | name | Identifier for logging and error messages | | token | Bearer token (orkify_mcp_ + 48 hex chars) | | tools | Allowed MCP tools — ["*"] for all, or explicit list | | allowedIps | Optional IP allowlist — individual IPs or CIDRs (all if omitted) |

Keygen Options

orkify mcp keygen [--name <name>] [--tools <tool,...>] [--allowed-ips <ips>]

--name <name>           Key name for identification (default: "default")
--tools <tools>         Comma-separated list of allowed tools (default: all)
--allowed-ips <ips>     Comma-separated IPs or CIDRs (default: all)

Editing Keys

The config file is plain YAML — you can hand-edit it to rename keys, change tool permissions, or remove keys. The server reloads the config on SIGHUP or file change (polled every 2 seconds), so changes take effect without restarting.

Tool Scoping

When a key has a restricted tools list, any call to a tool not in the list returns a FORBIDDEN error. This lets you create read-only keys for monitoring dashboards or limited-access keys for specific teams.

Valid tool names: list, logs, snap, listAllUsers, up, down, restart, reload, delete, restore, kill.

Available MCP Tools

| Tool | Description | | -------------- | ----------------------------------------------- | | up | Start a new process with optional configuration | | down | Stop process(es) by name, ID, or "all" | | restart | Hard restart (stop + start) | | reload | Zero-downtime rolling reload | | delete | Stop and remove from process list | | list | List all processes with status and metrics | | listAllUsers | List processes from all users (requires sudo) | | logs | Get recent log lines from a process | | snap | Snapshot process list for later restoration | | restore | Restore previously saved processes | | kill | Stop the ORKIFY daemon |

Example Usage

Once configured, you can ask Claude to manage your processes:

"Start my API server with 4 workers"
"List all running processes"
"Reload the web app with zero downtime"
"Show me the logs for the worker process"
"Stop all processes"

Architecture

Daemon Mode (`orkify up`)

┌─────────────────────────────────────────────────────────────┐
│                       CLI (orkify up)                       │
└─────────────────────────────┬───────────────────────────────┘
                              │ IPC (Unix Socket / Named Pipe)
┌─────────────────────────────▼───────────────────────────────┐
│                          Daemon                             │
│  ┌───────────────────────────────────────────────────────┐  │
│  │                      Orchestrator                     │  │
│  └───────────────────────────┬───────────────────────────┘  │
│                              │                              │
│  ┌───────────────────────────▼───────────────────────────┐  │
│  │                    ManagedProcess                     │  │
│  │                                                       │  │
│  │  Fork Mode (-w 1):        Cluster Mode (-w N):        │  │
│  │  ┌─────────────┐          ┌─────────────────────┐     │  │
│  │  │   Child     │          │   ClusterWrapper    │     │  │
│  │  │   Process   │          │      (Primary)      │     │  │
│  │  └─────────────┘          │  ┌─────┐ ┌─────┐    │     │  │
│  │                           │  │ W1  │ │ W2  │    │     │  │
│  │                           │  └─────┘ └─────┘    │     │  │
│  │                           │  ┌─────┐ ┌─────┐    │     │  │
│  │                           │  │ W3  │ │ W4  │    │     │  │
│  │                           │  └─────┘ └─────┘    │     │  │
│  │                           └─────────────────────┘     │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Container Mode (`orkify run`)

┌─────────────────────────────────────────────────────────────┐
│                    Container (PID 1)                        │
│  ┌───────────────────────────────────────────────────────┐  │
│  │                   orkify run                          │  │
│  │                                                       │  │
│  │  Fork Mode (-w 1):        Cluster Mode (-w N):        │  │
│  │  ┌─────────────┐          ┌─────────────────────┐     │  │
│  │  │   Child     │◄─SIGTERM │   ClusterWrapper    │     │  │
│  │  │   Process   │          │      (Primary)      │     │  │
│  │  └─────────────┘          │  ┌─────┐ ┌─────┐    │     │  │
│  │                           │  │ W1  │ │ W2  │◄─SIGTERM │  │
│  │                           │  └─────┘ └─────┘    │     │  │
│  │                           └─────────────────────┘     │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Requirements

Node.js 22.18.0 or higher
Cross-platform: macOS, Linux, Windows (uses Unix sockets on macOS/Linux, Named Pipes on Windows)

License

Apache License 2.0 - see LICENSE for details.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Table of Contents

Features

Installation

Quick Start

Commands

Options for up and run

Cluster Mode

Zero-Downtime Reload

Reload Failure Handling

Memory Threshold Restart

Worker Readiness

Health Check Readiness

Graceful Shutdown

Environment Variables

Detecting the Primary Worker

Worker IPC (Broadcasting)

Shared Cluster Cache

Next.js

Socket.IO / WebSocket Support

Log Rotation

How It Works

Defaults

File Layout

Configuration

Viewing Logs

Flushing Logs

Environment Files

Keeping Secrets Out of State

Snapshot File

File format

Restore behavior

Boot Persistence

Environment Variables (optional)

Starting

Container Mode

Why run instead of up?

Single Instance (Fork Mode)

Cluster Mode (Multi-Core Containers)

Socket.IO in Containers

Container Options

Signal Handling

Quick Reference

Deployment

How It Works

Deploy Quick Start

Upload Options

Local Deploy

Deploy Configuration

Deploy Options

Deploy Lifecycle

SaaS Platform

Source Map Support

Next.js

Deploy Artifacts

Error Grouping

Cron Scheduler

Usage

Ecosystem Config

How It Works

Limits

Overlap Prevention

Cron Secret

Persistence and Recovery

Edge Cases

MCP Integration

Stdio Mode (Local)

Option A — add-mcp (multi-tool)

Option B — Claude Code CLI

Option C — Manual JSON

HTTP Mode (Remote)

1. Generate a key

2. Start the HTTP server

3. Manage the HTTP server

4. Connect a client

Options for `up` and `run`

Why `run` instead of `up`?

Option A — `add-mcp` (multi-tool)

Daemon Mode (`orkify up`)

Container Mode (`orkify run`)