@rekl0w/panic-tool

v0.3.0

Published

3 months ago

A minimal production incident triage CLI and lightweight backend for fast health checks, summaries, root-cause heuristics, and recovery suggestions.

0High
0Medium
0Low

rekl0w

incident-response on-call triage health-check bun hono

Panic Tool

Panic Tool is a tiny dual-mode production incident response CLI for developers and on-call engineers.

It is not a dashboard, observability platform, log collector, alerting system, or Sentry clone. It is intentionally focused on one operational question:

What is broken, what is probably causing it, and what should I do now?

Built with Bun + Hono.

Features

CLI-first incident triage
FULL MODE for engineering/debug visibility with panic check --full
PANIC MODE for 4-line outage decisions with panic emergency
HTTP health checks
TCP health checks for DB, Redis, queues, brokers, and internal services
Unified service health report
Human-readable incident summaries
Simple rule-based probable root cause heuristics with confidence and evidence
Actionable recovery suggestions
Optional lightweight Hono JSON API

Dual-mode design

Panic Tool intentionally separates understanding from action.

FULL MODE: understand the system

Triggered by:

panic check --full

Use FULL MODE when you need engineering/debug visibility. It includes service status, latency, dependency status, optional log summaries from config, matched rule, confidence, evidence, explanation, and suggested next action.

FULL MODE is informative, structured, and calm. It does not force a decision.

PANIC MODE: fix the system now

Triggered by:

panic emergency

Use PANIC MODE during an active outage. It always prints exactly four decision lines:

WHAT: ...
ROOT CAUSE: ...
IMPACT: ...
NEXT ACTION: ...

PANIC MODE is intentionally minimal, decisive, and authoritative. It does not include extra debugging detail.

Install

Install Panic Tool from npm as @rekl0w/panic-tool.

Runtime requirement: Bun 1.3.11+ must be installed because the CLI binary runs on the Bun runtime.

npm

Install globally:

npm install -g @rekl0w/panic-tool

Run without global install:

npx @rekl0w/panic-tool@latest check

You can also use npm exec:

npm exec @rekl0w/panic-tool@latest -- incident --config ./panic.demo.config.json

Bun

Install globally:

bun add -g @rekl0w/panic-tool

Run without global install:

bunx @rekl0w/panic-tool@latest check

Quick start

Create a config file after installing globally with npm or Bun:

panic init

Edit panic.config.json with your services, then run:

panic check
panic check --full
panic status
panic emergency
panic incident

Prefer one-shot usage instead of global install? Use either runner:

npx @rekl0w/panic-tool@latest incident --config ./panic.demo.config.json
bunx @rekl0w/panic-tool@latest emergency --config ./panic.demo.config.json

Use a custom config file:

panic incident --config ./prod.panic.json

Try it without your own infrastructure

This repo includes panic.demo.config.json, which uses public GitHub endpoints plus one intentionally failing local TCP check so you can see incident output immediately.

panic check --config ./panic.demo.config.json
panic check --full --config ./panic.demo.config.json
panic emergency --config ./panic.demo.config.json
panic incident --config ./panic.demo.config.json

Without global install:

npx @rekl0w/panic-tool@latest check --config ./panic.demo.config.json
bunx @rekl0w/panic-tool@latest emergency --config ./panic.demo.config.json

Useful public targets for manual testing:

https://api.github.com — public HTTP endpoint
https://www.githubstatus.com/api/v2/status.json — public status JSON endpoint
github.com:443 — public TCP/TLS target
127.0.0.1:65432 — intentionally failing TCP target for demo incidents

If you want an open-source app to test against, use any repo/service that exposes a /health, /ready, /status, or similar endpoint. Panic Tool only needs HTTP URLs or TCP host/port targets; it does not require Sentry, Datadog, Prometheus, or logs.

Config example

{
  "timeoutMs": 2000,
  "latencyWarningMs": 750,
  "services": [
    {
      "name": "api",
      "type": "http",
      "url": "https://api.example.com/health",
      "critical": true,
      "dependsOn": ["db", "redis"],
      "logSummary": "Recent errors show upstream dependency timeouts."
    },
    {
      "name": "db",
      "type": "tcp",
      "host": "localhost",
      "port": 5432,
      "critical": true,
      "logSummary": "Connection pool near limit; recent timeout spikes."
    },
    {
      "name": "redis",
      "type": "tcp",
      "host": "localhost",
      "port": 6379,
      "critical": false,
      "logSummary": "Eviction count normal; memory pressure unknown."
    }
  ]
}

CLI usage

`panic check`

Runs all configured checks and prints a terminal-friendly health report.

panic check

Example:

Panic Tool — Health Check
=========================

✅ api          healthy     121ms https://api.example.com/health critical — HTTP health check passed
❌ db           down       2001ms localhost:5432 critical — TCP timeout after 2000ms
⚠️  redis        degraded    812ms localhost:6379 — Slow TCP connection (812ms)
✅ queue        healthy      96ms https://queue.example.com/health — HTTP health check passed

`panic check --full`

Runs FULL MODE for engineering/debug visibility.

panic check --full

Example:

Panic Tool — Full System Check
==============================

Overall: DOWN
Checked: 2026-04-26T10:12:00.000Z

Services:
  - api
    status     : DOWN
    type       : http
    target     : https://api.example.com/health
    latency    : 1800ms
    critical   : yes
    message    : HTTP 503
  - db
    status     : DOWN
    type       : tcp
    target     : localhost:5432
    latency    : 2001ms
    critical   : yes
    message    : TCP timeout after 2000ms

Dependencies:
  - api: db=DOWN, redis=OK
  - db: none

Log summaries:
  - api: Recent errors show upstream dependency timeouts.
  - db: Connection pool near limit; recent timeout spikes.

Rule engine:
  Rule       : DB_DOWN_API_DOWN
  Confidence : HIGH
  Cause      : Database failure is causing API failure.
  Explanation: Rule DB_DOWN_API_DOWN matched: database is DOWN while API is DOWN.

Evidence:
  - db=DOWN latency=2001ms target=localhost:5432 critical message="TCP timeout after 2000ms"
  - api=DOWN latency=1800ms target=https://api.example.com/health critical message="HTTP 503"
  - api -> db: declared dependency

Suggested next action:
  Check database availability and connection pool before restarting the API.

`panic status`

Shows a compact operational status summary.

panic status

Example:

Panic Tool — Status
===================

Failing : 1
Degraded: 1
Healthy : 2

Critical failures:
  - db: TCP timeout after 2000ms (2001ms)

`panic emergency`

Runs PANIC MODE for an active incident. The output is intentionally limited to four lines.

panic emergency

Example:

WHAT: api, db is DOWN.
ROOT CAUSE: [HIGH] Database failure is causing API failure. Rule DB_DOWN_API_DOWN matched: database is DOWN while API is DOWN.
IMPACT: Critical outage affecting api, db.
NEXT ACTION: Check database availability and connection pool before restarting the API.

`panic incident`

Generates a human-readable triage summary with probable root cause hints and recovery suggestions.

panic incident

Example:

Panic Tool — Incident Triage
============================

❌ Overall: DOWN
Checked: 2026-04-26T10:12:00.000Z

Summary:
  Overall status is DOWN. 1 failing, 1 degraded, 2 healthy. Most likely: Database is down, so API failures are likely caused by unavailable DB connections.

Failing:
  - db: TCP timeout after 2000ms (2001ms)

Degraded:
  - redis: Slow TCP connection (812ms) (812ms)

Healthy:
  - api: HTTP health check passed (121ms)
  - queue: HTTP health check passed (96ms)

Probable root cause:
  - Database is down, so API failures are likely caused by unavailable DB connections.
  - High latency detected in redis; check downstream dependencies and saturation.

What to do now:
  1. Check db logs and restart the service if it is safe.
  2. Verify network access, port, credentials, and connection pool limits for db.
  3. Check DB availability, max connections, slow queries, replication lag, and recent migrations.

Lightweight backend

Panic Tool also ships a small Hono backend for JSON output.

Run locally from the repository:

bun run dev

Endpoints:

GET / — service metadata
GET /health — raw check results
GET /status — compact status JSON
GET /incident — summary, root-cause hints, and suggestions
GET /emergency — panic-mode decision object

Default port is 3030. Override it with PORT.

Architecture

+--------------------+        +-------------------------+
| panic CLI          |        | Hono lightweight backend |
|                    |        |                         |
| panic check        |        | GET /health             |
| panic check --full |        | GET /status             |
| panic emergency    |        | GET /emergency          |
| panic incident     |        | GET /incident           |
+---------+----------+        +------------+------------+
          |                                |
          +---------------+----------------+
                          |
                          v
              +-----------------------+
              | Health aggregator     |
              | - HTTP checks         |
              | - TCP checks          |
              | - latency thresholds  |
              +-----------+-----------+
                          |
                          v
              +-----------------------+
              | Dual-mode engine      |
              | - FULL MODE details   |
              | - PANIC MODE decision |
              | - deterministic rules |
              | - next action         |
              +-----------------------+

Project structure

panic-tool/
├── src/
│   ├── checkers.ts     # HTTP + TCP health checks
│   ├── cli.ts          # panic check/status/incident commands
│   ├── config.ts       # JSON config loading + validation
│   ├── format.ts       # terminal-friendly output formatting
│   ├── incident.ts     # summaries, heuristics, suggestions
│   ├── index.ts        # public exports
│   ├── server.ts       # Hono backend
│   └── types.ts        # shared TypeScript types
├── scripts/build.ts    # package build script
├── panic.config.example.json
├── panic.demo.config.json
├── package.json
├── tsconfig.json
└── README.md

Development

Install dependencies:

bun install

Run typecheck:

bun run typecheck

Build package output:

bun run build

Run CLI from source:

bun run src/cli.ts check --full --config ./panic.demo.config.json
bun run src/cli.ts emergency --config ./panic.demo.config.json

Design principles

CLI-first
Keep FULL MODE and PANIC MODE separate
Small architecture, no microservices
Rule-based heuristics, no heavy ML
Root-cause output should include confidence and evidence when detail mode is used
Fast triage over historical forensics
Suggestions should be operationally useful
No dashboard in the MVP

Non-goals

No observability platform
No tracing system
No log ingestion pipeline
No Sentry or Datadog clone behavior
No ML-heavy root-cause analysis

Roadmap

--json CLI output
YAML config support
Docker image
GitHub Actions release workflow
Signal adapters for Datadog, CloudWatch, git history, and DB state
Pluggable checks for Kubernetes, systemd, and cloud load balancers
Configurable custom rule packs

Release notes

See CHANGELOG.md for version history.

Latest release: v0.3.0

Contributing

Issues and pull requests are welcome. Please keep contributions aligned with the product scope: fast incident triage and actionable recovery suggestions.

See CONTRIBUTING.md.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Panic Tool

Features

Dual-mode design

FULL MODE: understand the system

PANIC MODE: fix the system now

Install

npm

Bun

Quick start

Try it without your own infrastructure

Config example

CLI usage

panic check

panic check --full

panic status

panic emergency

panic incident

Lightweight backend

Architecture

Project structure

Development

Design principles

Non-goals

Roadmap

Release notes

Contributing

License

`panic check`

`panic check --full`

`panic status`

`panic emergency`

`panic incident`