smart-data-pruner
v1.0.1
Published
Smartly prune massive JSON/strings for LLM context optimization with cost estimation.
Maintainers
Readme
✂️ Smart Data Pruner
Reduce LLM Noise. Save Money. Optimize Context.
Smart Data Pruner is a production-ready utility designed for AI engineering. It intelligently shrinks massive JSON objects and strings to strictly fit within your LLM's context window—without crashing on circular references or losing critical schema structure.
✨ Features
- 🧠 Intelligent Pruning: Multi-stage algorithm (Clean -> Light -> Aggressive -> Nuclear -> Bedrock) adapts to your data.
- 💰 Cost Estimator: Calculate costs for GPT-4o, Claude 3.5, Gemini 1.5, and more.
- 🛡️ Robust & Safe: Handles circular references, deep nesting, and non-JSON inputs gracefully.
- 🚀 CLI & Library: Professional CLI with spinners and pretty-printing.
📦 Installation
npm install smart-data-pruner🚀 Usage
As a Library
const { SmartPruner, estimateCost } = require('smart-data-pruner');
const massiveData = { /* ... 50MB of logs ... */ };
// 1. Check Cost
try {
const cost = estimateCost(massiveData, 'gpt-4o');
console.log(`Potential Cost: $${cost.costUSD}`);
} catch (err) {
console.error(err);
}
// 2. Prune it!
const pruner = new SmartPruner();
const result = pruner.prune(massiveData, 4000); // Target: 4000 tokens
console.log(`Strategy Used: ${result.strategy}`);
console.log(result.output); CLI Tool
# Prune a file to 4000 tokens (default) and save
npx smart-prune huge-logs.json --out pruned-logs.json
# Prune to specific budget with pretty printing
npx smart-prune data.json --tokens 2000 --pretty
# Estimate cost only
npx smart-prune data.json --cost --model claude-3-5-sonnet🧠 Pruning Strategies
The pruner applies these strategies sequentially until the token budget is met:
- Clean: Removes
null,undefined, empty strings/arrays. - Light Trim: Truncates strings > 1000 chars, arrays > 100 items.
- Heuristic: target specific noisy keys (logs, history, embeddings).
- Aggressive Trim: Strings > 200 chars, arrays > 20 items.
- Nuclear: Strings > 50 chars, arrays > 5 items.
- Bedrock: Strings > 20 chars, arrays > 1 item (Preserves only schema structure).
📊 Supported Models
- OpenAI: gpt-4o, gpt-4-turbo, gpt-3.5-turbo, gpt-4o-mini
- Anthropic: claude-3-5-sonnet, claude-3-opus, claude-3-haiku
- Google: gemini-1.5-pro, gemini-1.5-flash
📄 License
MIT
