nori-premortem
v1.0.2
Published

Readme
Nori Premortem
A system monitoring daemon that intelligently diagnoses machine issues before critical failure using Claude AI.
Why Premortem?
When a machine dies, you are often left with no real idea what happened and why, because the machine takes everything with it. Traditional monitoring rarely captures meaningful diagnostics, because it makes strong assumptions up front about possible sources of failure and is not able to dynamically adjust based on in-stream information. You can figure out that your system OOM'd, but you won't easily figure out why, or even more important, where in your code the problem came from.
Premortem spawns a Claude agent the moment issues arise, analyzing the system in real-time and streaming diagnostics to a safe backend. Instead of metric graphs, engineers get AI-powered root cause analysis.
Overview
Premortem continuously watches system vitals (CPU, memory, disk, processes) and spawns Claude agents to diagnose problems when thresholds are breached. All diagnostic output is streamed in real-time to a configured webhook endpoint. Since the analysis is done by an AI agent, it is more flexible and more able to dig into the actual cause of why things are going haywire.
Features
- Continuous Monitoring: Polls system metrics at configurable intervals
- Intelligent Diagnosis: Spawns Claude agents with full system context when thresholds breach
- Real-Time Streaming: Fire-and-forget webhook delivery ensures data reaches backend even if the machine crashes
- Reset on Completion: Daemon resets after each agent run, allowing multiple diagnostic sessions
- Configurable Thresholds: Monitor memory %, disk %, CPU %, and process counts
Quick Start
# Clone and install
git clone [email protected]:tilework-tech/nori-premortem.git
cd nori-premortem
npm install
npm run build
# Configure
cp defaultConfig.example.json defaultConfig.json
# Edit defaultConfig.json with your webhook URL and Anthropic API key
# Run
node build/cli.js --config ./defaultConfig.jsonInstallation
Clone and install:
git clone [email protected]:tilework-tech/nori-premortem.git
cd nori-premortem
npm install
npm run buildOr install globally:
npm install -g .Configuration
Create your configuration file from the example template:
cp defaultConfig.example.json defaultConfig.json
# Edit defaultConfig.json with your webhookUrl, anthropicApiKey, and desired thresholdsNote: defaultConfig.json is gitignored to prevent accidentally committing sensitive credentials.
Example configuration:
{
"webhookUrl": "https://your-server.com/webhook-endpoint",
"anthropicApiKey": "sk-ant-your-api-key-here",
"pollingInterval": 10000,
"thresholds": {
"memoryPercent": 90,
"diskPercent": 85,
"cpuPercent": 80
},
"agentConfig": {
"customPrompt": "You are diagnosing system performance issues. Focus on memory usage, disk space, CPU utilization, and process behavior."
},
"heartbeat": {
"url": "https://your-server.com/heartbeat-endpoint",
"interval": 60000,
"processName": "my-process"
}
}Configuration Options
- webhookUrl (required): HTTP endpoint to receive diagnostic output
- Must accept POST requests with JSON payloads containing Claude SDK message objects
- Messages are grouped by
session_idfield - Each message follows the format:
{type: string, session_id: string, ...other_fields}
- anthropicApiKey (required): Your Anthropic API key for Claude
- pollingInterval (optional, default: 10000): Milliseconds between system checks
- thresholds (required): At least one threshold must be configured
- memoryPercent: Trigger when memory usage exceeds this percentage (uses "available" memory, not "used", to avoid false alerts from Linux buffer/cache)
- diskPercent: Trigger when disk usage exceeds this percentage
- cpuPercent: Trigger when CPU usage exceeds this percentage
- agentConfig (optional): Claude agent configuration
- customPrompt: Additional context prepended to diagnostic prompt (default: null)
- Note: Model, allowed tools, and max turns are controlled by SDK defaults and not user-configurable
- heartbeat (optional): Health check configuration
- url: Endpoint to receive periodic heartbeat signals
- interval (default: 60000): Milliseconds between heartbeat signals
- processName: Process name to monitor and report in heartbeat
Usage
Run the daemon:
nori-premortem --config ./config.jsonThe daemon will:
- Validate the Anthropic API key with a test query (fail-fast if invalid)
- Create the archive directory at
~/.premortem-logsif it doesn't exist - Validate the archive directory is writable (fail-fast if not)
- Start monitoring system metrics
- When a threshold is breached, spawn a Claude agent with system context
- Stream all agent output to your webhook endpoint
- Save complete session transcripts to
~/.premortem-logs/agent-{sessionId}.jsonl - Reset after the agent completes, ready to trigger again
Stop the daemon with Ctrl+C.
Webhook Integration
Premortem streams diagnostic data to any HTTP endpoint that accepts POST requests. This allows integration with existing monitoring infrastructure, logging systems, or custom backends.
Webhook Endpoint Requirements
The configured webhook endpoint must:
- Accept POST requests with raw Claude SDK message payloads
- Handle messages grouped by
session_idfield - Be highly available (premortem uses fire-and-forget delivery with no retry logic)
Message Format
Messages are sent as raw Claude SDK output, one message per POST:
{
"type": "assistant",
"session_id": "session-abc123",
"message": {
"role": "assistant",
"content": "Analyzing system metrics..."
}
}The session_id field groups messages into a single diagnostic transcript artifact on the backend.
Architecture
Daemon (monitoring loop)
↓ (threshold breach detected)
Agent SDK (Claude diagnostics)
↓ (immediate streaming)
Webhook Endpoint (your server)Key Design Decisions:
- First-breach-only: When multiple thresholds breach, only the first (memory > disk > cpu) triggers
- Reset on completion: Agent finish resets daemon state, allowing new breaches to trigger
- Fire-and-forget webhooks: No retries - webhook endpoint must be reliable
- API key in config: anthropicApiKey stored in config file, set to env before SDK calls
Development
Run tests:
npm testWatch mode:
npm run test:watchBuild:
npm run buildTroubleshooting
Daemon not starting:
- Check that
anthropicApiKeyis valid in config - Verify webhook URL is reachable
No agent triggering:
- Check threshold values - may need to lower them for testing
- Review daemon logs for system metrics
Webhook not receiving data:
- Test webhook endpoint separately
- Check firewall/network settings
- Remember: no retries, so endpoint must be reliable
License
MIT
