drain3-js
v1.1.0
Published
JavaScript port of Drain3 log parsing algorithm
Maintainers
Readme
drain3-js
A zero-dependency JavaScript/TypeScript port of Drain3, a log parsing algorithm that automatically clusters log messages into templates.
Input: An array of structured log entries { id, log, severity }
Output: Clustered templates grouped by severity, with log IDs preserved
Installation
npm install drain3-jsQuick Start
import { clusterLogs } from 'drain3-js';
const logs = [
{ id: "req-1", log: "Requested is served to 3uedhx2wedock", severity: "INFO" },
{ id: "req-2", log: "Requested is served to 8fjsk29dkx", severity: "INFO" },
{ id: "req-3", log: "Requested is served to pq83nxlow2", severity: "INFO" },
{ id: "req-4", log: "Connection timeout for user abc123", severity: "ERROR" },
{ id: "req-5", log: "Connection timeout for user xyz789", severity: "ERROR" },
{ id: "req-6", log: "Connection timeout for user lmn456", severity: "WARN" },
{ id: "req-7", log: "Disk usage at 85% on server-01", severity: "WARN" },
{ id: "req-8", log: "Disk usage at 92% on server-03", severity: "WARN" },
];
const clusters = clusterLogs(logs);
console.log(clusters);Output:
[
{
clusterId: 1,
template: "Requested is served to <*>",
severity: "INFO",
count: 3,
logs: [
{ id: "req-1", log: "Requested is served to 3uedhx2wedock" },
{ id: "req-2", log: "Requested is served to 8fjsk29dkx" },
{ id: "req-3", log: "Requested is served to pq83nxlow2" },
]
},
{
clusterId: 2,
template: "Connection timeout for user <*>",
severity: "ERROR",
count: 2,
logs: [
{ id: "req-4", log: "Connection timeout for user abc123" },
{ id: "req-5", log: "Connection timeout for user xyz789" },
]
},
{
clusterId: 3,
template: "Connection timeout for user <*>",
severity: "WARN", // same template, separate cluster (different severity)
count: 1,
logs: [
{ id: "req-6", log: "Connection timeout for user lmn456" },
]
},
{
clusterId: 4,
template: "Disk usage at <*> on <*>",
severity: "WARN",
count: 2,
logs: [
{ id: "req-7", log: "Disk usage at 85% on server-01" },
{ id: "req-8", log: "Disk usage at 92% on server-03" },
]
}
]API
clusterLogs(logs, options?)
Clusters an array of log entries into templates, grouped by severity.
clusterLogs(logs: LogEntry[], options?: DrainOptions): ClusterResult[]| Parameter | Type | Description |
|-----------|------|-------------|
| logs | LogEntry[] | Array of structured log entries |
| options | DrainOptions | Optional configuration (see below) |
DrainOptions
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| simTh | number | 0.4 | Similarity threshold (0-1). Higher values create more clusters. |
| depth | number | 4 | Prefix tree depth (minimum 3). Higher values give more precise matching. |
| maxChildren | number | 100 | Maximum children per tree node. |
| maxClusters | number | undefined | Maximum clusters to keep. Oldest are evicted via LRU when exceeded. |
| extraDelimiters | string[] | [] | Additional characters to split tokens on (beyond whitespace). |
| masking | MaskingRule[] | [] | Regex rules to mask tokens before clustering. |
| snapshot | DrainSnapshot | undefined | Restore state from a previous toJSON() call. Used with createDrain(). |
LogEntry
| Field | Type | Description |
|-------|------|-------------|
| id | string \| number | Unique identifier for the log entry |
| log | string | The log message content |
| severity | string | Severity level (e.g. "INFO", "ERROR", "WARN") |
ClusterResult
Logs are clustered per-severity -- the same log pattern with different severities produces separate clusters.
| Field | Type | Description |
|-------|------|-------------|
| clusterId | number | Unique cluster identifier (assigned in creation order) |
| template | string | Log template with <*> wildcards for variable parts |
| severity | string | Severity shared by all logs in this cluster |
| count | number | Number of logs in this cluster |
| logs | Array<{ id, log }> | Original entries (id + log message) in input order |
MaskingRule
| Field | Type | Description |
|-------|------|-------------|
| regex | RegExp | Pattern to match (use g flag for global replacement) |
| maskWith | string | Label for the mask (appears as <:label:> in templates) |
Incremental / Batch Processing
For processing logs in batches (e.g. streaming, tailing files), use createDrain() to maintain state across batches:
import { createDrain } from 'drain3-js';
const drain = createDrain({ simTh: 0.4 });
// Process first batch - returns results for this batch
const result1 = drain.addLogs([
{ id: 1, log: "connected to server-01", severity: "INFO" },
{ id: 2, log: "connected to server-02", severity: "INFO" },
{ id: 3, log: "error 404 on /api/users", severity: "ERROR" },
]);
// result1[0] = { template: "connected to <*>", severity: "INFO", count: 2, logs: [{id:1, ...}, {id:2, ...}] }
// result1[1] = { template: "error <*> on /api/users", severity: "ERROR", count: 1, logs: [{id:3, ...}] }
// Process second batch - builds on existing templates
const result2 = drain.addLogs([
{ id: 4, log: "connected to server-03", severity: "INFO" },
{ id: 5, log: "error 500 on /api/orders", severity: "ERROR" },
]);
// result2[0] = { template: "connected to <*>", severity: "INFO", count: 3, logs: [{id:4, ...}] }
// result2[1] = { template: "error <*> on <*>", severity: "ERROR", count: 2, logs: [{id:5, ...}] }
// ^ total count ^ only this batchSaving and Restoring State
Save the Drain state as a JSON snapshot and restore it later. Snapshots are lightweight - they store only templates, counts, and severity, not the original log strings.
// Save state
const snapshot = drain.toJSON();
const json = JSON.stringify(snapshot);
// Store `json` in a file, database, Redis, etc.
// Later: restore and continue
const restored = createDrain({ simTh: 0.4, snapshot: JSON.parse(json) });
const result3 = restored.addLogs(newBatch);
// Picks up where it left off - counts accumulate, templates evolveA snapshot for 1,000 clusters is roughly 80 KB regardless of how many logs were processed.
createDrain(options?)
Creates a stateful Drain instance for incremental processing.
Returns: DrainInstance
| Method | Returns | Description |
|--------|---------|-------------|
| addLogs(logs) | ClusterResult[] | Process a batch. Returns results with count = total across all batches, logs = only this batch. |
| getClusters() | ClusterResult[] | All current clusters with total counts. logs is empty. |
| toJSON() | DrainSnapshot | Lightweight snapshot for persistence. Pass as options.snapshot to restore. |
Examples
Masking sensitive patterns
const clusters = clusterLogs(logs, {
masking: [
{ regex: /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g, maskWith: "IP" },
{ regex: /\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\b/g, maskWith: "UUID" },
],
});
// Templates will contain <:IP:> and <:UUID:> instead of <*>Tuning similarity
// Strict: only very similar logs merge
const strict = clusterLogs(logs, { simTh: 0.8 });
// Loose: aggressively merge similar logs
const loose = clusterLogs(logs, { simTh: 0.2 });Extra delimiters
// Split on = and ; in addition to whitespace
const clusters = clusterLogs(logs, {
extraDelimiters: ["=", ";"],
});Limiting cluster count
// Keep at most 1000 clusters, evicting oldest via LRU
const clusters = clusterLogs(logs, { maxClusters: 1000 });How It Works
The Drain algorithm parses logs using a fixed-depth prefix tree:
- Preprocess - Optionally mask known patterns (IPs, UUIDs, etc.) with labeled placeholders
- Tokenize - Split log by whitespace (and any extra delimiters) into tokens
- Group by severity - Each severity gets its own prefix tree, so clusters never mix severities
- Tree lookup - Walk the prefix tree: first by token count, then by leading tokens
- Similarity check - Compare with candidate clusters. If similarity >= threshold, merge
- Merge or create - Matched: update template (differing tokens become
<*>). Unmatched: create new cluster.
Compatibility
- Node.js 18+
- Cloudflare Workers
- Modern browsers (ES2020)
- Zero dependencies
License
MIT
