smart-data-pruner

v1.0.1

Published

7 days ago

Smartly prune massive JSON/strings for LLM context optimization with cost estimation.

0High
0Medium
0Low

ritesh17rb

llm ai prune json cost-estimator openai token-limit context-window

✂️ Smart Data Pruner

Reduce LLM Noise. Save Money. Optimize Context.

Smart Data Pruner is a production-ready utility designed for AI engineering. It intelligently shrinks massive JSON objects and strings to strictly fit within your LLM's context window—without crashing on circular references or losing critical schema structure.

✨ Features

🧠 Intelligent Pruning: Multi-stage algorithm (Clean -> Light -> Aggressive -> Nuclear -> Bedrock) adapts to your data.
💰 Cost Estimator: Calculate costs for GPT-4o, Claude 3.5, Gemini 1.5, and more.
🛡️ Robust & Safe: Handles circular references, deep nesting, and non-JSON inputs gracefully.
🚀 CLI & Library: Professional CLI with spinners and pretty-printing.

📦 Installation

npm install smart-data-pruner

🚀 Usage

As a Library

const { SmartPruner, estimateCost } = require('smart-data-pruner');

const massiveData = { /* ... 50MB of logs ... */ };

// 1. Check Cost
try {
    const cost = estimateCost(massiveData, 'gpt-4o');
    console.log(`Potential Cost: $${cost.costUSD}`);
} catch (err) {
    console.error(err);
}

// 2. Prune it!
const pruner = new SmartPruner();
const result = pruner.prune(massiveData, 4000); // Target: 4000 tokens

console.log(`Strategy Used: ${result.strategy}`);
console.log(result.output);

CLI Tool

# Prune a file to 4000 tokens (default) and save
npx smart-prune huge-logs.json --out pruned-logs.json

# Prune to specific budget with pretty printing
npx smart-prune data.json --tokens 2000 --pretty

# Estimate cost only
npx smart-prune data.json --cost --model claude-3-5-sonnet

🧠 Pruning Strategies

The pruner applies these strategies sequentially until the token budget is met:

Clean: Removes null, undefined, empty strings/arrays.
Light Trim: Truncates strings > 1000 chars, arrays > 100 items.
Heuristic: target specific noisy keys (logs, history, embeddings).
Aggressive Trim: Strings > 200 chars, arrays > 20 items.
Nuclear: Strings > 50 chars, arrays > 5 items.
Bedrock: Strings > 20 chars, arrays > 1 item (Preserves only schema structure).

📊 Supported Models

OpenAI: gpt-4o, gpt-4-turbo, gpt-3.5-turbo, gpt-4o-mini
Anthropic: claude-3-5-sonnet, claude-3-opus, claude-3-haiku
Google: gemini-1.5-pro, gemini-1.5-flash

📄 License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme