compress-on-input
v0.1.0
Published
MCP proxy that compresses bloated tool results before they enter Claude's context window
Maintainers
Readme
compress-on-input
Compress bloated tool results before they eat Claude's context window. Zero runtime dependencies.
Quick Start
# Install globally
npm install -g compress-on-input
# Add hook to Claude Code (one-time setup)
compress-on-input install
# Verify everything works
compress-on-input check
# Restart Claude Code — done!That's it. Every tool result is now automatically compressed before entering Claude's context.
The Problem
MCP tools return massive payloads that burn through Claude's context window:
| Source | Typical Size | With compress-on-input | |---|---|---| | Screenshot (base64) | ~250k tokens | ~500 tokens (OCR text) | | DOM snapshot | 10-50k tokens | 3-15k tokens | | API response (500 rows) | 50-100k tokens | 2-5k tokens | | Large text/docs | 10-50k tokens | 3-20k tokens |
Claude's attention is O(n²). More tokens = slower responses, higher cost, earlier compaction. compress-on-input fixes this at the source.
How It Works
compress-on-input installs as a PostToolUse hook in Claude Code. After every tool call, it intercepts the result and compresses it before Claude sees it.
Tool executes → Hook fires → compress-on-input compresses → Claude receives compressed resultWorks with all tools — MCP servers (Playwright, databases, APIs) and built-in tools (Read, Bash, Grep). No need to configure each server individually.
Compression Strategies
Each result is automatically routed to the best compressor:
| Content Type | Strategy | What it does | Reduction | |---|---|---|---| | Screenshots | OCR | Apple Vision / Tesseract extracts text from image | ~99% | | DOM snapshots | Cleanup | Strips noise, builds ref mapping table | 50-70% | | Large JSON | Collapse | Schema-aware array/object summarization | 60-90% | | Large text | Smart truncate | BM25-ranked middle + optional Gemini | 60-90% | | Small content | Passthrough | Below threshold — untouched | 0% |
Smart Text Compression
For large text (>5k tokens), instead of dumb truncation:
≤5k tokens → passthrough (no compression needed)
5k–23k tokens → head(2k) + full middle + tail(1k)
23k–53k tokens → head(2k) + BM25-ranked middle(→20k) + tail(1k)
>53k tokens → head(2k) + BM25(→50k) + Gemini(→20k) + tail(1k)How BM25 ranking works: The middle section is split into ~512-token chunks. A synthetic relevance query is built from tool metadata:
read_file({path: "src/auth/login.ts"}) → query: "auth login"
browser_navigate({url: "react.dev/..."}) → query: "react useState reference"
browser_snapshot() after navigate(url) → inherits URL intent from sessionChunks are ranked by BM25 similarity to this query. Top chunks (by relevance, in original order) fill the token budget. This keeps the most relevant content, not just the beginning.
Gemini 2.5 Flash-Lite (optional, ~$0.01/call) compresses further for very large texts. Falls back to BM25-only if no API key or if Gemini is unavailable.
Installation
Option A: Hook mode (recommended)
Works with all tools automatically:
npm install -g compress-on-input
compress-on-input install
compress-on-input check # verify everything works
# Restart Claude Code (exit + claude)This adds to ~/.claude/settings.json:
{
"hooks": {
"PostToolUse": [{
"matcher": ".*",
"hooks": [{
"type": "command",
"command": "compress-on-input --hook --verbose",
"timeout": 15
}]
}]
}
}Matcher examples
The matcher field is a regex that controls which tools trigger compression:
| Matcher | What gets compressed |
|---|---|
| .* | All tools (recommended — built-in tools are auto-skipped) |
| mcp__.* | Only MCP tools (Playwright, databases, APIs) |
| mcp__playwright__.* | Only Playwright MCP tools |
| mcp__playwright__\|mcp__webflow__ | Specific MCP servers |
Note: Built-in tools (Read, Bash, Grep) are always skipped internally — they don't support output replacement. Using
.*is safe and recommended.
Option B: Proxy mode (wrap specific MCP server)
# In ~/.claude.json, change MCP server command:
compress-on-input --wrap "npx @playwright/mcp@latest --cdp-endpoint http://localhost:9222" --verboseUseful for testing or when you only want compression for specific servers.
From source
git clone https://github.com/Chill-AI-Space/compress-on-input.git
cd compress-on-input
npm install --include=dev
npm run build
# Then: node dist/index.js installUninstall
compress-on-input uninstall
# Restart Claude CodeConfiguration
Config file
~/.config/compress-on-input/config.json:
{
"threshold": 500,
"maxTextTokens": 2000,
"activationBytes": 400000,
"ocrEngine": "auto",
"verbose": true,
"dryRun": false,
"geminiApiKey": "your-gemini-api-key"
}| Option | Default | Description |
|---|---|---|
| threshold | 500 | Min tokens to trigger compression |
| maxTextTokens | 2000 | Target token budget per text block |
| activationBytes | 400000 | Min transcript size to activate (hook mode) |
| ocrEngine | "auto" | auto / vision (macOS) / tesseract |
| verbose | false | Log compression stats to stderr |
| dryRun | false | Log without modifying results |
| geminiApiKey | — | Gemini API key for smart compression of huge texts |
| rules | (see below) | Per-tool compression rules |
Environment variables
export GEMINI_API_KEY="your-key" # Alternative to config fileCLI flags
--hook Run as PostToolUse hook (used by install)
--wrap "cmd args" Wrap an MCP server (proxy mode)
--config <path> Custom config file path
--verbose Log compression stats to stderr
--dry-run Log what would be compressed, don't modify
--ocr-engine <engine> auto | vision | tesseract
--max-text-tokens <n> Token budget for text blocks (default: 2000)
--threshold <n> Min tokens to trigger compression (default: 500)
--gemini-api-key <key> Gemini API key for smart compressionPriority: CLI flags > env vars > config file > defaults
Per-tool rules
Override compression strategy for specific tools:
{
"rules": [
{ "toolName": "browser_snapshot", "strategy": "dom-cleanup" },
{ "toolName": "my_screenshot_tool", "strategy": "ocr" },
{ "toolNamePattern": "db_.*", "strategy": "json-collapse", "maxTokens": 5000 },
{ "toolNamePattern": ".*", "strategy": "auto" }
]
}Strategies: auto | ocr | dom-cleanup | json-collapse | truncate | passthrough
Rules match tool names in order. First match wins. Default is auto (content-aware routing).
Compressors in Detail
OCR
- macOS: Compiles and caches a Swift binary using Apple Vision framework at
~/.cache/compress-on-input/vision-ocr-{hash} - Other OS: Falls back to Tesseract (
tesseractmust be in PATH) - Quality check: if OCR returns <7 non-whitespace chars → keeps original image
- Safety: Only OCRs images when a file path exists in sibling text blocks (original is on disk). Generated images (base64-only) pass through untouched.
DOM Cleanup
- Strips
[ref=e2]markers from inline text - Builds a compact ref mapping table at the bottom (Claude can still click elements)
- Removes
role="generic"androle="none"noise - Collapses empty generic nodes
- Deduplicates repeated navigation blocks
- Collapses multiple blank lines
JSON Collapse
- Arrays >10 items → first 3 items + schema summary
- Detects homogeneous arrays (same keys) and shows shape:
{id, name, email} - Nesting beyond depth 5 → collapsed to key summary
- Strips null values, empty strings, empty arrays
- Falls through to truncate if content isn't valid JSON
Smart Truncate
- Splits text into head (2k tokens), middle, tail (1k tokens)
- Chunks middle into ~512-token segments with 50-token overlap
- Structure-aware splitting: paragraph → line → sentence → word → hard split
- Ranks chunks with BM25 using synthetic query from tool context
- Selects top chunks (preserving original order) to fit 20k token budget
- Optional Gemini 2.5 Flash-Lite pass for middle sections >20k tokens
- Graceful fallback chain: Gemini fails → BM25-only, no query → preserve order
Architecture
src/
├── index.ts CLI entry point, arg parsing
├── proxy.ts JSON-RPC stdio proxy (proxy mode)
├── hook.ts PostToolUse hook handler (hook mode)
├── pipeline.ts Content-aware routing + compression orchestration
├── classifier.ts Content type detection (image/DOM/JSON/text)
├── config.ts Config loading, rule matching
├── chunker.ts Structure-aware text chunking
├── bm25.ts BM25 keyword ranking (~100 lines, zero deps)
├── query-builder.ts Synthetic relevance query from tool metadata
├── session.ts Tool call history for intent inheritance
├── logger.ts Stderr logging
└── compressors/
├── ocr.ts Apple Vision / Tesseract OCR
├── dom-cleanup.ts Accessibility tree cleanup + ref mapping
├── json-collapse.ts Schema-aware JSON summarization
├── truncate.ts Smart head + BM25 middle + tail pipeline
└── gemini.ts Gemini Flash-Lite API (raw fetch, no SDK)Design Decisions
Zero runtime dependencies. BM25, chunking, Gemini API — all built from scratch. Only TypeScript and Vitest as dev deps. Small, fast, auditable.
Hook-first architecture. PostToolUse hooks intercept ALL tool results universally. No need to wrap each MCP server individually. Proxy mode exists as an alternative.
Content-aware routing. Each content type gets a specialized compressor. DOM cleanup preserves clickability. JSON collapse preserves schema. OCR preserves semantic content.
Synthetic relevance queries. We infer intent from tool call metadata (URL navigated, file path read, grep pattern searched). Based on the HyDE principle — approximate queries work surprisingly well for ranking.
Session intent inheritance. browser_snapshot() after browser_navigate(url) inherits the URL's intent. Dramatically improves BM25 ranking for follow-up tool calls.
Fail-safe everywhere. Compressor fails → return original. Compression increases size >10% → return original. Gemini fails → BM25-only. No query signal → preserve original order. The tool never makes things worse.
Context-aware activation (hook mode). Checks transcript file size before compressing. Early in a session (context mostly empty), compression is skipped — full context is valuable. As context fills up, compression activates. Exception: screenshots are always compressed (250k tokens each).
Testing
npm test # 56 tests across 9 test files
npm run build # compile TypeScriptTroubleshooting
First step for any issue:
compress-on-input checkRuns 17 self-diagnostic checks: hook installation, binary in PATH, OCR engine, log directories, compression tests, performance benchmarks. Shows exact fix instructions for every failure.
Hook not firing?
- Run
compress-on-input check— it checks settings.json automatically - Restart Claude Code after installing (
/exit+claude) - Run
compress-on-input --hook --verbosemanually with test input
OCR not working?
- macOS: Should work automatically (Apple Vision framework)
- Linux: Install
tesseract(apt install tesseract-ocr) - Check:
compress-on-input --hook --verbosewill log OCR errors to stderr
Want to disable for specific tools?
{
"rules": [
{ "toolName": "my_special_tool", "strategy": "passthrough" },
{ "toolNamePattern": ".*", "strategy": "auto" }
]
}Gemini compression not activating?
- Only triggers for text >53k tokens (after head/tail split, middle >50k)
- Needs
GEMINI_API_KEYenv var orgeminiApiKeyin config - Get a key at Google AI Studio (free tier available)
License
MIT
