drain3-js

v1.1.0

Published

a month ago

JavaScript port of Drain3 log parsing algorithm

0High
0Medium
0Low

tuhinpal

drain log-parsing log-clustering template-mining drain3

drain3-js

A zero-dependency JavaScript/TypeScript port of Drain3, a log parsing algorithm that automatically clusters log messages into templates.

Input: An array of structured log entries { id, log, severity } Output: Clustered templates grouped by severity, with log IDs preserved

Installation

npm install drain3-js

Quick Start

import { clusterLogs } from 'drain3-js';

const logs = [
  { id: "req-1", log: "Requested is served to 3uedhx2wedock", severity: "INFO" },
  { id: "req-2", log: "Requested is served to 8fjsk29dkx", severity: "INFO" },
  { id: "req-3", log: "Requested is served to pq83nxlow2", severity: "INFO" },
  { id: "req-4", log: "Connection timeout for user abc123", severity: "ERROR" },
  { id: "req-5", log: "Connection timeout for user xyz789", severity: "ERROR" },
  { id: "req-6", log: "Connection timeout for user lmn456", severity: "WARN" },
  { id: "req-7", log: "Disk usage at 85% on server-01", severity: "WARN" },
  { id: "req-8", log: "Disk usage at 92% on server-03", severity: "WARN" },
];

const clusters = clusterLogs(logs);
console.log(clusters);

Output:

[
  {
    clusterId: 1,
    template: "Requested is served to <*>",
    severity: "INFO",
    count: 3,
    logs: [
      { id: "req-1", log: "Requested is served to 3uedhx2wedock" },
      { id: "req-2", log: "Requested is served to 8fjsk29dkx" },
      { id: "req-3", log: "Requested is served to pq83nxlow2" },
    ]
  },
  {
    clusterId: 2,
    template: "Connection timeout for user <*>",
    severity: "ERROR",
    count: 2,
    logs: [
      { id: "req-4", log: "Connection timeout for user abc123" },
      { id: "req-5", log: "Connection timeout for user xyz789" },
    ]
  },
  {
    clusterId: 3,
    template: "Connection timeout for user <*>",
    severity: "WARN",           // same template, separate cluster (different severity)
    count: 1,
    logs: [
      { id: "req-6", log: "Connection timeout for user lmn456" },
    ]
  },
  {
    clusterId: 4,
    template: "Disk usage at <*> on <*>",
    severity: "WARN",
    count: 2,
    logs: [
      { id: "req-7", log: "Disk usage at 85% on server-01" },
      { id: "req-8", log: "Disk usage at 92% on server-03" },
    ]
  }
]

API

`clusterLogs(logs, options?)`

Clusters an array of log entries into templates, grouped by severity.

clusterLogs(logs: LogEntry[], options?: DrainOptions): ClusterResult[]

| Parameter | Type | Description | |-----------|------|-------------| | logs | LogEntry[] | Array of structured log entries | | options | DrainOptions | Optional configuration (see below) |

`DrainOptions`

| Option | Type | Default | Description | |--------|------|---------|-------------| | simTh | number | 0.4 | Similarity threshold (0-1). Higher values create more clusters. | | depth | number | 4 | Prefix tree depth (minimum 3). Higher values give more precise matching. | | maxChildren | number | 100 | Maximum children per tree node. | | maxClusters | number | undefined | Maximum clusters to keep. Oldest are evicted via LRU when exceeded. | | extraDelimiters | string[] | [] | Additional characters to split tokens on (beyond whitespace). | | masking | MaskingRule[] | [] | Regex rules to mask tokens before clustering. | | snapshot | DrainSnapshot | undefined | Restore state from a previous toJSON() call. Used with createDrain(). |

`LogEntry`

| Field | Type | Description | |-------|------|-------------| | id | string \| number | Unique identifier for the log entry | | log | string | The log message content | | severity | string | Severity level (e.g. "INFO", "ERROR", "WARN") |

`ClusterResult`

Logs are clustered per-severity -- the same log pattern with different severities produces separate clusters.

| Field | Type | Description | |-------|------|-------------| | clusterId | number | Unique cluster identifier (assigned in creation order) | | template | string | Log template with <*> wildcards for variable parts | | severity | string | Severity shared by all logs in this cluster | | count | number | Number of logs in this cluster | | logs | Array<{ id, log }> | Original entries (id + log message) in input order |

`MaskingRule`

| Field | Type | Description | |-------|------|-------------| | regex | RegExp | Pattern to match (use g flag for global replacement) | | maskWith | string | Label for the mask (appears as <:label:> in templates) |

Incremental / Batch Processing

For processing logs in batches (e.g. streaming, tailing files), use createDrain() to maintain state across batches:

import { createDrain } from 'drain3-js';

const drain = createDrain({ simTh: 0.4 });

// Process first batch - returns results for this batch
const result1 = drain.addLogs([
  { id: 1, log: "connected to server-01", severity: "INFO" },
  { id: 2, log: "connected to server-02", severity: "INFO" },
  { id: 3, log: "error 404 on /api/users", severity: "ERROR" },
]);
// result1[0] = { template: "connected to <*>", severity: "INFO", count: 2, logs: [{id:1, ...}, {id:2, ...}] }
// result1[1] = { template: "error <*> on /api/users", severity: "ERROR", count: 1, logs: [{id:3, ...}] }

// Process second batch - builds on existing templates
const result2 = drain.addLogs([
  { id: 4, log: "connected to server-03", severity: "INFO" },
  { id: 5, log: "error 500 on /api/orders", severity: "ERROR" },
]);
// result2[0] = { template: "connected to <*>", severity: "INFO", count: 3, logs: [{id:4, ...}] }
// result2[1] = { template: "error <*> on <*>", severity: "ERROR", count: 2, logs: [{id:5, ...}] }
//                                                          ^ total count          ^ only this batch

Saving and Restoring State

Save the Drain state as a JSON snapshot and restore it later. Snapshots are lightweight - they store only templates, counts, and severity, not the original log strings.

// Save state
const snapshot = drain.toJSON();
const json = JSON.stringify(snapshot);
// Store `json` in a file, database, Redis, etc.

// Later: restore and continue
const restored = createDrain({ simTh: 0.4, snapshot: JSON.parse(json) });
const result3 = restored.addLogs(newBatch);
// Picks up where it left off - counts accumulate, templates evolve

A snapshot for 1,000 clusters is roughly 80 KB regardless of how many logs were processed.

`createDrain(options?)`

Creates a stateful Drain instance for incremental processing.

Returns: DrainInstance

| Method | Returns | Description | |--------|---------|-------------| | addLogs(logs) | ClusterResult[] | Process a batch. Returns results with count = total across all batches, logs = only this batch. | | getClusters() | ClusterResult[] | All current clusters with total counts. logs is empty. | | toJSON() | DrainSnapshot | Lightweight snapshot for persistence. Pass as options.snapshot to restore. |

Examples

Masking sensitive patterns

const clusters = clusterLogs(logs, {
  masking: [
    { regex: /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g, maskWith: "IP" },
    { regex: /\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\b/g, maskWith: "UUID" },
  ],
});
// Templates will contain <:IP:> and <:UUID:> instead of <*>

Tuning similarity

// Strict: only very similar logs merge
const strict = clusterLogs(logs, { simTh: 0.8 });

// Loose: aggressively merge similar logs
const loose = clusterLogs(logs, { simTh: 0.2 });

Extra delimiters

// Split on = and ; in addition to whitespace
const clusters = clusterLogs(logs, {
  extraDelimiters: ["=", ";"],
});

Limiting cluster count

// Keep at most 1000 clusters, evicting oldest via LRU
const clusters = clusterLogs(logs, { maxClusters: 1000 });

How It Works

The Drain algorithm parses logs using a fixed-depth prefix tree:

Preprocess - Optionally mask known patterns (IPs, UUIDs, etc.) with labeled placeholders
Tokenize - Split log by whitespace (and any extra delimiters) into tokens
Group by severity - Each severity gets its own prefix tree, so clusters never mix severities
Tree lookup - Walk the prefix tree: first by token count, then by leading tokens
Similarity check - Compare with candidate clusters. If similarity >= threshold, merge
Merge or create - Matched: update template (differing tokens become <*>). Unmatched: create new cluster.

Compatibility

Node.js 18+
Cloudflare Workers
Modern browsers (ES2020)
Zero dependencies

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

drain3-js

Installation

Quick Start

API

clusterLogs(logs, options?)

DrainOptions

LogEntry

ClusterResult

MaskingRule

Incremental / Batch Processing

Saving and Restoring State

createDrain(options?)

Examples

Masking sensitive patterns

Tuning similarity

Extra delimiters

Limiting cluster count

How It Works

Compatibility

License

`clusterLogs(logs, options?)`

`DrainOptions`

`LogEntry`

`ClusterResult`

`MaskingRule`

`createDrain(options?)`