logpare
v0.1.0
Published
Semantic log compression that reduces repetitive events while preserving diagnostic information
Downloads
332
Maintainers
Readme
logpare
Semantic log compression for LLM context windows. Reduces repetitive log output by 60-90% while preserving diagnostic information.
The Problem
AI assistants processing logs waste tokens on repetitive patterns. A 10,000-line log dump might contain 50 unique message templates repeated thousands of times — but the LLM sees (and bills for) every repetition.
The Solution
LogPare uses the Drain algorithm to identify log templates, then outputs a compressed format showing each template once with occurrence counts.
Input (10,847 lines):
INFO Connection from 192.168.1.1 established
INFO Connection from 192.168.1.2 established
INFO Connection from 10.0.0.55 established
... (10,844 more similar lines)
Output (23 templates):
=== Log Compression Summary ===
Input: 10,847 lines → 23 templates (99.8% reduction)
Top templates by frequency:
1. [4,521x] INFO Connection from <*> established
2. [3,892x] DEBUG Request <*> processed in <*>
3. [1,203x] WARN Retry attempt <*> for <*>
...Installation
As a CLI tool (recommended for command-line usage)
Install globally to use logpare directly from anywhere:
npm install -g logpare
# Now works directly
logpare server.logAs a library
Install locally in your project for programmatic usage:
npm install logpare
# or
pnpm add logpareNote: Local installs require
npxto run the CLI:npx logpare server.log
CLI Usage
LogPare includes a command-line interface for quick log compression:
# Compress a log file
logpare server.log
# Pipe from stdin
cat /var/log/syslog | logpare
# JSON output
logpare --format json app.log
# Custom algorithm parameters
logpare --depth 5 --threshold 0.5 access.log
# Write to file
logpare --output templates.txt error.log
# Multiple files
logpare access.log error.log server.logUsing a local install? Prefix commands with
npx:npx logpare server.log cat /var/log/syslog | npx logpare
CLI Options
| Option | Short | Description | Default |
|--------|-------|-------------|---------|
| --format | -f | Output format: summary, detailed, json | summary |
| --output | -o | Write output to file | stdout |
| --depth | -d | Parse tree depth | 4 |
| --threshold | -t | Similarity threshold (0.0-1.0) | 0.4 |
| --max-children | -c | Max children per node | 100 |
| --max-clusters | -m | Max total clusters | 1000 |
| --max-templates | -n | Max templates in output | 50 |
| --help | -h | Show help | |
| --version | -v | Show version | |
Programmatic Usage
Simple API
import { compress } from 'logpare';
const logs = [
'INFO Connection from 192.168.1.1 established',
'INFO Connection from 192.168.1.2 established',
'ERROR Connection timeout after 30s',
'INFO Connection from 10.0.0.1 established',
];
const result = compress(logs);
console.log(result.formatted);
// === Log Compression Summary ===
// Input: 4 lines → 2 templates (50.0% reduction)
// ...Text Input
import { compressText } from 'logpare';
const logFile = fs.readFileSync('app.log', 'utf-8');
const result = compressText(logFile, { format: 'json' });Advanced API
import { createDrain, defineStrategy } from 'logpare';
// Custom preprocessing strategy
const customStrategy = defineStrategy({
patterns: {
requestId: /req-[a-z0-9]+/gi,
},
getSimThreshold: (depth) => depth < 2 ? 0.5 : 0.4,
});
const drain = createDrain({
depth: 4,
maxClusters: 500,
preprocessing: customStrategy,
});
drain.addLogLines(logs);
const result = drain.getResult('detailed');Output Formats
Summary (default)
Compact overview with top templates and rare events:
=== Log Compression Summary ===
Input: 10,847 lines → 23 templates (99.8% reduction)
Top templates by frequency:
1. [4,521x] INFO Connection from <*> established
2. [3,892x] DEBUG Request <*> processed in <*>
3. [1,203x] WARN Retry attempt <*> for <*>
Rare events (≤5 occurrences):
- [1x] FATAL Database connection lost
- [2x] ERROR Out of memory exception in <*>Detailed
Full template list with all diagnostic metadata:
Template #1: INFO Connection from <*> established
Occurrences: 4,521
Severity: info
First seen: line 1
Last seen: line 10,234
Sample values: [["192.168.1.1"], ["10.0.0.55"], ["172.16.0.1"]]
URLs: api.example.com, cdn.example.com
Status codes: 200, 201
Correlation IDs: req-abc123, trace-xyz789
Durations: 45ms, 120ms, 2.5sJSON
Machine-readable format with version field and complete metadata:
{
"version": "1.1",
"stats": {
"inputLines": 10847,
"uniqueTemplates": 23,
"compressionRatio": 0.998,
"estimatedTokenReduction": 0.95,
"processingTimeMs": 234
},
"templates": [{
"id": "abc123",
"pattern": "INFO Connection from <*> established",
"occurrences": 4521,
"severity": "info",
"isStackFrame": false,
"firstSeen": 1,
"lastSeen": 10234,
"sampleVariables": [["192.168.1.1"], ["10.0.0.55"]],
"urlSamples": ["api.example.com"],
"fullUrlSamples": ["https://api.example.com/v1/users"],
"statusCodeSamples": [200, 201],
"correlationIdSamples": ["req-abc123"],
"durationSamples": ["45ms", "120ms"]
}]
}compress(logs, { format: 'json' });Diagnostic Metadata
LogPare automatically extracts diagnostic information from matching log lines:
| Metadata | Description | Supported Formats |
|----------|-------------|-------------------|
| URLs | Hostnames and full URLs | https://..., http://... |
| Status codes | HTTP status codes | status 404, HTTP/1.1 500, code=200 |
| Correlation IDs | Request/trace identifiers | trace-id: xxx, request-id: xxx, UUIDs |
| Durations | Timing values | 45ms, 1.5s, 200µs, 2min, 1h |
This metadata is preserved in templates and available in detailed/JSON output formats.
Severity Detection
Each template is automatically tagged with a severity level:
| Severity | Detected Patterns |
|----------|------------------|
| error | ERROR, FATAL, Exception, Failed, TypeError, ReferenceError, panic |
| warning | WARN, Warning, Deprecated, [Violation] |
| info | Default for other logs |
Stack traces are also automatically detected (V8/Node.js, Firefox, Chrome DevTools formats) and marked with isStackFrame: true.
API Reference
compress(lines, options?)
Compress an array of log lines.
lines:string[]- Log lines to compressoptions.format:'summary' | 'detailed' | 'json'- Output format (default:'summary')options.maxTemplates:number- Max templates in output (default:50)options.drain:DrainOptions- Algorithm configuration
Returns CompressionResult with templates, stats, and formatted output.
compressText(text, options?)
Compress a multi-line string (splits on newlines).
createDrain(options?)
Create a Drain instance for incremental processing.
options.depth:number- Parse tree depth (default:4)options.simThreshold:number- Similarity threshold 0-1 (default:0.4)options.maxChildren:number- Max children per node (default:100)options.maxClusters:number- Max total templates (default:1000)options.preprocessing:ParsingStrategy- Custom preprocessingoptions.onProgress:ProgressCallback- Progress reporting callback
Progress Reporting
Track progress during long-running operations:
import { createDrain } from 'logpare';
const drain = createDrain({
onProgress: (event) => {
console.log(`${event.currentPhase}: ${event.processedLines} lines`);
if (event.percentComplete !== undefined) {
console.log(`Progress: ${event.percentComplete.toFixed(1)}%`);
}
}
});
drain.addLogLines(logs);
const result = drain.getResult();The callback receives ProgressEvent with:
processedLines: Lines processed so fartotalLines: Total lines (if known)currentPhase:'parsing'|'clustering'|'finalizing'percentComplete: 0-100 (only iftotalLinesknown)
defineStrategy(overrides)
Create a custom preprocessing strategy.
const strategy = defineStrategy({
patterns: { customId: /id-\d+/g },
tokenize: (line) => line.split(','),
getSimThreshold: (depth) => 0.5,
});Built-in Patterns
LogPare automatically masks common variable types:
- IPv4/IPv6 addresses
- Port numbers (e.g.,
:443,:8080) - UUIDs
- Timestamps (ISO, Unix)
- File paths and URLs
- Hex IDs
- Block IDs (HDFS)
- Numbers with units (e.g.,
250ms,1024KB)
Automatic detection features:
- Severity tagging — Templates are tagged as
error,warning, orinfo - Stack frame detection — Identifies stack traces (V8, Firefox, Chrome formats)
- Diagnostic extraction — Captures URLs, HTTP status codes, correlation IDs, and durations
Performance
- Speed: >10,000 lines/second
- Memory: O(templates), not O(lines)
- V8 Optimized: Uses
Mapfor tree nodes, monomorphic constructors
Parameter Tuning Guide
When to Adjust Parameters
| Symptom | Cause | Solution |
|---------|-------|----------|
| Too many templates | Threshold too high | Lower simThreshold (e.g., 0.3) |
| Templates too generic | Threshold too low | Raise simThreshold (e.g., 0.5) |
| Similar logs not grouped | Depth too shallow | Increase depth (e.g., 5-6) |
| Too much memory usage | Too many clusters | Lower maxClusters |
Recommended Settings by Log Type
Structured logs (JSON, CSV):
{ depth: 3, simThreshold: 0.5 }Noisy application logs:
{ depth: 5, simThreshold: 0.3 }System logs (syslog, journald):
{ depth: 4, simThreshold: 0.4 } // defaults work wellHigh-volume logs (>1M lines):
{ maxClusters: 500, maxChildren: 50 }Troubleshooting
"Too many templates"
If you're getting more templates than expected:
Lower the similarity threshold: Templates that should group together may not meet the default 0.4 threshold
compress(logs, { drain: { simThreshold: 0.3 } })Check for unmaked variables: Custom IDs or tokens may need masking
const strategy = defineStrategy({ patterns: { customId: /your-pattern/g } });
"Templates are too generic"
If templates are over-grouping different log types:
Raise the similarity threshold:
compress(logs, { drain: { simThreshold: 0.5 } })Increase tree depth:
compress(logs, { drain: { depth: 5 } })
"Memory usage too high"
For very large log files:
Limit clusters: Set
maxClustersto cap memory usagecompress(logs, { drain: { maxClusters: 500 } })Process in batches: Use
createDrain()and process chunks
"Some patterns not being masked"
Add custom patterns for domain-specific tokens:
const strategy = defineStrategy({
patterns: {
sessionId: /sess-[a-f0-9]+/gi,
orderId: /ORD-\d{10}/g,
}
});Coming from Python Drain3?
See MIGRATION.md for a detailed comparison and migration guide.
License
MIT
