dumbl

v1.0.0

Published

21 days ago

DUMBL - Token reduction algorithm for LLM inputs. Compress JSON/TOML while maintaining LLM readability. Save 15-30% on API costs.

💡 Why DUMBL?

LLMs understand text with missing vowels and abbreviated words due to linguistic redundancy. DUMBL exploits this to reduce token count (and costs) by 15-30%.

Original:  "Desenvolva uma aplicação completa utilizando programação orientada"
DUMBL:     "Desenvlva uma aplicç complta utiliznd programç orientda"
Savings:   ~20% fewer tokens

Save money on API calls while maintaining semantic clarity for GPT-4, Claude, Llama, Gemini, and other LLMs.

✨ Features

🚀 15-30% token reduction on typical LLM prompts
🔧 Three compression levels - light, medium, aggressive
📦 Zero dependencies - TOML support is optional
🌍 Multilingual - English & Portuguese optimized
🔒 Smart preservation - URLs, emails, paths, tech terms stay intact
📝 TypeScript ready - Full type definitions included
⚡ Fast - Minimal overhead for real-time use

📦 Installation

npm install dumbl

For TOML support (optional):

npm install dumbl @iarna/toml

🚀 Quick Start

const { dumbl } = require('dumbl');

// Create instance (level 1-3)
const d = dumbl.aggressive(); // level 3

// Compress object
const result = d.compress({
  prompt: "Explique detalhadamente o processamento"
});
// → { prompt: "Explqe detlhadmt o procesmt" }

// Output as JSON
const json = d.toJSON(data);

// Output as DUMBL format (most compact)
const compact = d.toDUMBL(data);

// Quick debug - see stats and result
const { dumblDry } = require('dumbl');
dumblDry(data); // logs stats + compressed result

📖 API Reference

Factory Functions

const { dumbl } = require('dumbl');

// With options
const d = dumbl({ 
  level: 3,           // 1=light, 2=medium, 3=aggressive
  preserveKeys: true, // don't compress object keys
  minWordLength: 3    // min chars to compress
});

// Presets
dumbl.light()      // level 1 - safe, minimal
dumbl.medium()     // level 2 - balanced
dumbl.aggressive() // level 3 - maximum compression

Instance Methods

const d = dumbl.aggressive();

// Compress anything
d.compress(object)     // → compressed object
d.compress(jsonString) // → compressed object
d.compress(text)       // → compressed string

// Output formats
d.toJSON(input)   // → JSON string (compressed)
d.toDUMBL(input)  // → DUMBL format (most compact)

// With TOML (requires @iarna/toml)
const TOML = require('@iarna/toml');
d.compress(tomlString, TOML)
d.toTOML(input, TOML)

// Parse DUMBL back
d.parseDUMBL(dumblString) // → object

// Statistics
d.stats(original, compressed)
// → { savedChars, ratio, estimatedTokensSaved, ... }

One-shot Functions

const { compress, toJSON, toDUMBL, dumblDry } = require('dumbl');

// Quick compression (uses level 3)
compress({ prompt: "..." })
toJSON({ prompt: "..." })
toDUMBL({ prompt: "..." })

// Dry run - logs stats and result to console
dumblDry({ prompt: "..." })

Dry Run (Debug/Preview)

Use dumblDry to preview compression results with statistics:

const { dumblDry } = require('dumbl');

dumblDry({ prompt: "Explique detalhadamente o processamento de dados" });

Output:

┌─────────────────────────────────────────┐
│             DUMBL Dry Run               │
├─────────────────────────────────────────┤
│ Original:        56 chars
│ Compressed:      44 chars
│ Saved:           12 chars (21.4%)
│ Est. tokens: ~3 saved
├─────────────────────────────────────────┤
│ Result:
└─────────────────────────────────────────┘
{
  "prmpt": "Explque dtalhdamt o prcesmento de ddos"
}

📊 Compression Levels

| Level | Description | Use Case | |-------|-------------|----------| | 1 | Remove duplicates only | Conservative, max readability | | 2 | + Suffix abbreviations | Balanced | | 3 | + Vowel removal | Maximum savings |

🔧 DUMBL Format

Custom ultra-compact format, JSON-compatible:

// JSON
{"enabled":true,"count":null,"items":["a","b"]}

// DUMBL
{enabled:T,count:N,items:["a","b"]}

Features:

T/F for booleans
N for null
Unquoted keys when possible
No whitespace

📈 Benchmark

npm run benchmark

Sample results:

| Format | Size | vs JSON | |--------|------|---------| | JSON (pretty) | 1250 | +45% | | JSON (compact) | 862 | baseline | | TOML | 780 | -10% | | JSON+DUMBL L3 | 680 | -21% | | DUMBL format | 620 | -28% |

🛡️ What's Preserved

DUMBL intelligently preserves:

✅ Short words (≤3 chars)
✅ Connectors (the, of, de, para, etc.)
✅ URLs, emails, file paths
✅ Tech terms (API, JSON, HTTP, etc.)
✅ Numbers and booleans
✅ Object structure

🤖 LLM Compatibility

Tested and confirmed readable by:

✅ GPT-4 / GPT-4o / GPT-4o-mini
✅ Claude 3.5 / Claude 4
✅ Llama 3 / Llama 3.1
✅ Gemini Pro / Gemini Ultra
✅ Mistral / Mixtral

The compression maintains semantic meaning while reducing tokens.

📘 TypeScript

Full type definitions included:

import { dumbl, DumblOptions, DumblStats } from 'dumbl';

const d = dumbl.aggressive();
const stats: DumblStats = d.stats(original, compressed);

🧪 Testing

npm test

🤝 Contributing

Contributions are welcome! Please read our Contributing Guide for details.

📄 License

👤 Author

Frederico Bezerra