@piscodm/claw-compressor
v0.1.3
Published
LLM context compression plugin for OpenClaw — reduces token usage by compressing tool results and normalizing messages at write time.
Maintainers
Readme
claw-compressor
An OpenClaw plugin that compresses tool results and normalizes messages at write time, reducing LLM token usage across all sessions — no proxy, no routing, no payments required.
Why
Every tool call (file reads, shell output, web fetches, API responses) gets stored in the session transcript and re-sent to the LLM on every subsequent turn. Most of that output is noise. A 10KB exec result usually has ~300 chars of useful signal. The rest is repeated headers, empty lines, and irrelevant detail.
This plugin intercepts messages at write time, compresses them once, and stores the compressed version permanently. Every future turn in that session benefits automatically.
This plugin works with any provider (Anthropic, OpenAI, etc.) going direct. It hooks into OpenClaw's session pipeline, not the HTTP layer.
How It Works
The plugin registers four OpenClaw hooks:
tool_result_persist
Fires before a tool result is written to the session transcript. This is the main compression path — the biggest wins come from compressing exec, read, web_fetch, and other tool outputs.
Compression is only applied when the content exceeds observationThreshold (default: 500 chars). Below that, the result is stored as-is.
before_message_write
Fires before any message (user, assistant, system) is written. Used only for whitespace normalization — a low-risk pass that catches redundant blank lines and leading/trailing space.
Tool results are skipped here since they're already handled by tool_result_persist.
gateway_start
Fires once when the gateway starts. Kicks off an async, non-blocking check against the npm registry to see if a newer version of the plugin is available. No startup latency — runs after a 3-second delay.
session_start
Fires at the start of each agent session. If an update was found at gateway start, injects a system event into the session so the agent naturally informs the user. Each session is notified at most once per gateway boot.
Compression Layers
Three layers run in order:
| Layer | What it does | Default |
|-------|-------------|---------|
| observation | Extracts key signal from large tool outputs (errors, status codes, JSON keys, first/last lines) | ✅ enabled |
| json-compact | Strips whitespace from pretty-printed JSON blobs embedded in output | ✅ enabled |
| whitespace | Normalizes repeated blank lines, trims trailing spaces | ✅ enabled |
Layers that require the model to decode a codebook (dictionary, path shortening, dynamic codebook) are intentionally excluded — they add tokens in the header and create risk of the model misunderstanding compressed content.
Observation Layer Details
The heaviest hitter. For tool results over the threshold:
- Detects and preserves error messages / stack traces
- Preserves status lines (exit codes, HTTP status, success/failure markers)
- Extracts top-level JSON keys (structure without values)
- Keeps the first and last N lines of output
- Truncates to
observationMaxLength(default: 300 chars)
Result: a 10KB exec output becomes a 200-char summary that retains all actionable information.
Installation
openclaw plugins install @piscodm/claw-compressor
openclaw gateway restartThat's it. The plugin activates automatically on next gateway start. You'll see this in the logs:
[plugins] Claw Compressor active | layers: observation, json-compact, whitespace | threshold: 500 charsDevelopment / Local Install
To run from source (edits take effect after npm run build + gateway restart):
git clone https://github.com/jpbedoya/claw-compressor
cd claw-compressor
npm install
npm run build
openclaw plugins install -l .
openclaw gateway restartStats
Compression events are logged to ~/.openclaw/compressor-stats.jsonl (one JSON line per compressed message).
In chat: /compress_stats — shows last 7 days
With period: /compress_stats 30 — shows last 30 days (max 90)
From the shell:
node -e "
const fs = require('fs');
const lines = fs.readFileSync(process.env.HOME + '/.openclaw/compressor-stats.jsonl','utf8').trim().split('\n').map(l=>JSON.parse(l));
const totalOrig = lines.reduce((s,e)=>s+e.originalChars,0);
const totalComp = lines.reduce((s,e)=>s+e.compressedChars,0);
const saved = totalOrig - totalComp;
console.log('Saved:', saved.toLocaleString(), '/', totalOrig.toLocaleString(), '(' + Math.round(saved/totalOrig*100) + '%)');
console.log('Est tokens saved:', Math.round(saved/4).toLocaleString());
"Each stats entry looks like:
{
"timestamp": "2026-03-05T04:26:32.370Z",
"toolName": "read",
"sessionKey": "agent:main:telegram:group:-1001234567890:topic:123",
"originalChars": 8432,
"compressedChars": 287,
"savedChars": 8145,
"layers": ["observation", "json-compact"]
}Stats are global — all sessions (Telegram, Discord, sub-agents, etc.) write to the same file.
Configuration
In ~/.openclaw/openclaw.json, under plugins.entries.claw-compressor.config:
{
"plugins": {
"entries": {
"claw-compressor": {
"enabled": true,
"config": {
"enabled": true,
"observation": true,
"jsonCompact": true,
"whitespace": true,
"observationThreshold": 500,
"observationMaxLength": 300,
"logStats": true,
"statsPath": "~/.openclaw/compressor-stats.jsonl"
}
}
}
}
}All fields are optional — defaults shown above.
| Field | Default | Description |
|-------|---------|-------------|
| enabled | true | Master switch. Set to false to disable all compression. |
| observation | true | Enable observation layer (main compression) |
| jsonCompact | true | Enable JSON compaction |
| whitespace | true | Enable whitespace normalization |
| observationThreshold | 500 | Min chars to trigger observation compression |
| observationMaxLength | 300 | Max chars after observation compression |
| logStats | true | Write stats to JSONL file |
| statsPath | ~/.openclaw/compressor-stats.jsonl | Path to stats file |
Project Structure
claw-compressor/
├── package.json # npm package with openclaw.extensions field
├── tsconfig.json # TypeScript config (ESM, NodeNext)
├── openclaw.plugin.json # Plugin manifest (id, name, configSchema)
├── README.md # This file
├── src/
│ ├── index.ts # Plugin entry — registers hooks + /compress_stats command
│ ├── stats.ts # JSONL stats writer (initStats, logStats, buildStats)
│ ├── report.ts # Stats reader + formatter (loadStats, buildReport, formatReport)
│ └── compression/
│ ├── types.ts # CompressionConfig, DEFAULT_CONFIG, CompressionResult
│ ├── index.ts # compress() pipeline
│ └── layers/
│ ├── observation.ts # Observation layer (main compression logic)
│ ├── json-compact.ts # JSON compaction
│ └── whitespace.ts # Whitespace normalization
└── dist/ # Compiled output (loaded by OpenClaw)Development
# Build
npm run build
# Build + watch
npm run build -- --watch
# After any change to src/:
npm run build && openclaw gateway restartKey Technical Notes
AgentMessagecontent format: Tool results (role: "toolResult") usecontent: Array<{ type: "text", text: string } | { type: "image", ... }>— never a plain string. The plugin handles both formats viagetTextContent()/withContent().tool_result_persistis synchronous: The hook runner calls it withrunToolResultPersist()(no async). Our compression is also synchronous — no issue.- Write-time compression: Content is compressed once when stored. The model never sees the original. Every subsequent turn sends the compressed version automatically.
- Stats are cumulative: The JSONL file grows indefinitely. No rotation is built in — prune manually if needed.
- Update check is fully async:
gateway_startfires the npm registry check after a 3-second delay. If the network is slow or down, the check fails silently — no impact on startup. session_startdedup: Each session is notified about a pending update at most once per gateway boot, tracked via an in-memory Set. No disk writes needed.
Quality Assurance
The compression pipeline has two safety properties:
1. Per-tool thresholds — Read and web_fetch only compress above 2,000 chars and preserve up to 1,200 chars (the content is informational). exec and process compress above 300 chars and keep up to 400 (the output is mostly ephemeral noise).
2. Observation layer preserves actionable signal — error lines, status codes, key JSON fields, and first/last lines of output are always extracted before truncation.
Session transcript inspection confirmed: compressed results are meaningful and agents operated normally on compressed context. The most extreme cases (e.g. read: 3378→9 chars) turned out to be genuine short outputs — e.g. a Read call on a non-existent file that returned an ENOENT error, correctly compressed to not found.
Test Suite
18 tests across all layers (npm test, runs in ~150ms):
- Whitespace layer — collapses blank lines, trims trailing spaces, passes short content unchanged
- JSON compaction — validates round-trip correctness (
JSON.parse(result)must equal original) - Observation layer — asserts error lines, status, and result markers survive compression; respects
maxLength - Pipeline end-to-end — per-tool fixture tests assert key signal is preserved; ratio sanity checks ensure output is never longer than input
The fixture tests are the quality gate: each uses a real-world tool output and a must contain list of strings that the compressed result is required to include.
Results (First 2 Days, 2026-03-05 → 2026-03-06)
762 compression events across 28 sessions (including multiple spawned sub-agents):
| Metric | Value | |--------|-------| | Original size | 3,749,467 chars | | Stored size | 856,574 chars | | Chars saved | 2,892,893 (77%) | | Tokens saved | ~723,223 | | Cost saved | ~$2.17 @ $3/1M tokens |
Compression by Tool
| Tool | Events | Compression |
|------|--------|-------------|
| exec | 422 | 87% |
| browser | 22 | 97% |
| read | 250 | 74% |
| process | 7 | 82% |
| sessions_list | 3 | 97% |
| subagents | 1 | 96% |
browser and sessions_list lead at 97% — their outputs are large structured blobs with little reusable signal. exec at 87% is the volume driver (422 events). read at 74% is the char-volume driver (2.9M chars original).
Sub-agents drive the majority of savings — each doing heavy exec/read/browser work that compresses 87–97%. The orchestrating parent session compresses less (short sessions_spawn and message outputs have little to compress).
At this rate: ~$1/day on typical multi-agent usage. Token savings also improve model focus — less noise in the context window means fewer confusions over long sessions.
