prompt-minifier

v0.1.1

Published

13 hours ago

Token-aware prompt minifier with explainable diff. Shrinks LLM prompts and tells you exactly what changed.

0High
0Medium
0Low

cihangirbozdogan

llm prompt tokenizer minify openai anthropic gpt claude tokens ai

prompt-minifier

Token-aware prompt minifier with explainable diff. Shrinks LLM prompts and tells you exactly what changed.

import { minify } from 'prompt-minifier';

const r = minify(
  'In order to win, please could you take into consideration the fact that, ' +
  'due to the fact that we are at this point in time absolutely essential, ' +
  'you must respond in a timely manner.',
);

r.minified;
// → 'To win, consider the fact that, because we are now absolutely essential,
//    you must respond promptly.'

r.savings.charPercent; // 45.1
r.changes.length;      // 7

Zero runtime dependencies. Bring your own tokenizer (gpt-tokenizer, tiktoken, anything callable). Falls back to a chars / 4 estimator if you don't.

When you'd use this

prompt-minifier finds the wordy bits in a prompt and replaces them with shorter equivalents that mean the same thing. It's most useful when:

you're embedding user-written prompts (template prompts, GPT instructions, customer support replies) where verbosity is normal
you have a polite system prompt with "please could you", "I would like you to", "thanks in advance" that ships on every request
you generated a prompt with another LLM and it came back full of "in order to", "due to the fact that", and "it is important to note that"

It is not going to find anything in a tightly-written engineering prompt — and that is fine. We measured: across all 1,624 prompts in awesome-chatgpt-prompts, the median prompt saves 0%, because those prompts are already well-engineered. The point of prompt-minifier is to flatten the long tail.

Install

npm i prompt-minifier

Node 18+. ESM and CJS, types included.

Quick start

Library

import { minify } from 'prompt-minifier';

const result = minify(prompt, { level: 'balanced' });

console.log(`${result.tokensBefore} → ${result.tokensAfter} tokens (-${result.savings.tokenPercent.toFixed(1)}%)`);
console.log(`${result.changes.length} changes`);

for (const c of result.changes) {
  console.log(`${c.rule}: "${c.from}" → "${c.to}"`);
}

CLI

# minify a file (writes to stdout)
prompt-minifier my-prompt.txt > shrunk.txt

# pipe and inspect a colored diff
cat my-prompt.txt | prompt-minifier --diff

# emit the full result as JSON
cat my-prompt.txt | prompt-minifier --json

# list every available rule
prompt-minifier --rules

Levels

minify(prompt, { level: 'balanced' }) // default

| Level | Rules | Risk | What it does | | --- | --- | --- | --- | | safe | smart-quotes, dashes, nbsp, zero-width, collapse-whitespace | none | Character-level cleanup. Won't change a single word — but does fix the dozen-or-so Unicode characters that tokenize badly. | | balanced (default) | safe + politeness, filler-words, verbose-phrases | low | Phrase-level rewrites: "in order to" → "to", strips "please could you", drops "as you can see". Preserves meaning. | | aggressive | balanced + redundant-qualifiers, redundant-adverbs | medium | Strips tautological pairs ("end result" → "result", "very unique" → "unique"). May tighten phrasing in ways some readers consider stylistic, not literal. |

Each individual rule can be turned on or off — see API.

Tokenizer setup

By default, prompt-minifier uses a quick Math.ceil(text.length / 4) estimator (OpenAI's rule of thumb). For accurate counts, plug in any tokenizer that exposes a (text) => count function.

`gpt-tokenizer` (OpenAI, pure JS)

import { encode } from 'gpt-tokenizer';
import { minify } from 'prompt-minifier';

minify(prompt, {
  level: 'balanced',
  countTokens: (t) => encode(t).length,
});

`tiktoken` (OpenAI, WASM)

import { encoding_for_model } from 'tiktoken';
import { minify } from 'prompt-minifier';

const enc = encoding_for_model('gpt-4o');
minify(prompt, {
  level: 'balanced',
  countTokens: (t) => enc.encode(t).length,
});

`llama-tokenizer-js` (Llama / Mistral)

import LlamaTokenizer from 'llama-tokenizer-js';
import { minify } from 'prompt-minifier';

minify(prompt, {
  level: 'balanced',
  countTokens: (t) => LlamaTokenizer.encode(t).length,
});

Preserve zones

Some text in a prompt is structural and must not be touched. By default prompt-minifier skips:

``` ... ``` fenced code blocks
{{name}} (Handlebars / Jinja)
${name} (JS template literal)

You can disable either, or add custom regions:

minify(prompt, {
  preserveCodeBlocks: false,        // touch code too
  preserveTemplateVars: false,      // touch {{x}} and ${x} too
  preserve: [/<thinking>[\s\S]*?<\/thinking>/g],  // never touch <thinking> blocks
});

The matching string is left byte-for-byte identical in the output. The pipeline splits the input into editable + preserved chunks, runs rules only on the editable ones, and reassembles.

API

function minify(input: string, options?: MinifyOptions): MinifyResult;

interface MinifyOptions {
  level?: 'safe' | 'balanced' | 'aggressive';   // default 'balanced'
  countTokens?: (text: string) => number;       // default chars/4
  preserve?: RegExp[];
  preserveCodeBlocks?: boolean;                 // default true
  preserveTemplateVars?: boolean;               // default true
  rules?: { include?: string[]; exclude?: string[] };
  customRules?: Rule[];
}

interface MinifyResult {
  minified: string;
  original: string;
  tokensBefore: number;
  tokensAfter: number;
  charsBefore: number;
  charsAfter: number;
  savings: {
    tokens: number;
    tokenPercent: number;
    chars: number;
    charPercent: number;
  };
  changes: Change[];
}

interface Change {
  rule: string;
  category: 'safe' | 'balanced' | 'aggressive';
  from: string;
  to: string;
  index: number;       // position in the input at the time the rule fired
  savedChars: number;
}

Helpers:

import { listRules, listRulesForLevel } from 'prompt-minifier';

listRules();                  // every built-in rule
listRulesForLevel('balanced'); // only the rules that fire at this level

Custom rules

A rule is a name, a category (which level it belongs to), and an apply function:

import type { Rule } from 'prompt-minifier';

const noEmoji: Rule = {
  name: 'no-emoji',
  category: 'safe',
  description: 'Strip emoji',
  apply(text) {
    const re = /\p{Extended_Pictographic}/gu;
    const matches = Array.from(text.matchAll(re));
    let result = '';
    let cursor = 0;
    const changes = matches.map((m) => {
      const idx = m.index!;
      result += text.slice(cursor, idx);
      cursor = idx + m[0].length;
      return { from: m[0], to: '', index: idx, savedChars: m[0].length };
    });
    result += text.slice(cursor);
    return { text: result, changes };
  },
};

minify(prompt, { customRules: [noEmoji] });

Custom rules run after the built-ins selected for the level.

CLI reference

prompt-minifier [file] [options]

  --level <level>          safe | balanced | aggressive (default: balanced)
  --diff                   colored before/after diff with per-rule savings
  --json                   emit full MinifyResult as JSON
  --quiet                  suppress trailing summary on TTY output
  --rules                  list every built-in rule
  --include <a,b>          only run these rules
  --exclude <a,b>          run all except these
  --no-preserve-code       do not protect ``` ``` blocks
  --no-preserve-vars       do not protect {{var}} or ${var}
  --version
  --help

Benchmarks

Run npm run bench:corpus to reproduce these numbers locally — the script downloads awesome-chatgpt-prompts (CC0) and runs every prompt at level: 'balanced'.

Corpus: 1,624 prompts, 4.5M chars total.

| Metric | Value | | --- | ---: | | Aggregate savings | 0.40% | | Median prompt | 0.00% | | P90 (top decile) | 0.63% | | Best single prompt | 59.89% | | Prompts saving ≥ 2% | 47 (2.9%) | | Prompts saving ≥ 5% | 14 (0.9%) |

The honest read: most prompts in this corpus are already well-engineered, so they save nothing. The package earns its keep on the long-tail prompts that aren't.

For a more representative picture of "what an actual verbose prompt saves," see the bundled fixtures under test/fixtures/:

| Fixture | Balanced savings | | --- | ---: | | synthesized-polite.txt (polite system prompt) | 29.6% | | synthesized-technical.txt (verbose technical instructions) | 13.6% |

FAQ

Will this change my model's output? Probably not. The safe rules touch only invisible Unicode and whitespace. The balanced rules replace verbose phrases with their meaning-preserving short forms (the kind of edits a human editor would do). aggressive strips tautological pairs that some readers consider stylistic — when in doubt, stay on balanced. A verify mode that runs both versions through a cheap LLM and measures output similarity is on the v0.2 roadmap.

Why is the token count slightly different from my real tokenizer? Because by default, prompt-minifier uses chars / 4 as a free estimator. Pass your real tokenizer via countTokens for exact numbers — see Tokenizer setup.

Why didn't it touch my code block? Code fences (``` ... ```) and template variables ({{x}}, ${x}) are protected by default. Disable with preserveCodeBlocks: false or preserveTemplateVars: false. To protect arbitrary regions, pass preserve: [/.../].

Are the changes reversible? No — a minified prompt can't be exactly reconstructed from the output. But every change is recorded in result.changes (rule, before, after, position, chars saved), so you have a full audit trail of what the package did.

Roadmap

v0.2 — verify mode: run both prompts through a cheap LLM (Haiku, 4o-mini) and report a semantic-similarity score, so you can flag changes that move the needle.
v0.3 — markdown-AST preserve mode (currently regex-based — works but doesn't understand nested fences).
v0.4 — companion packages prompt-minifier-gpt, prompt-minifier-claude, prompt-minifier-llama that pre-wire popular tokenizers.

Contributing

The single biggest way to improve this package is to expand its phrase dictionaries. Each dictionary is a JSON file under src/data/:

verbose-phrases.json — wordy → concise
politeness.json — polite padding to delete
filler-words.json — narrative filler
redundant-qualifiers.json — tautological pairs
redundant-adverbs.json — empty intensifiers

Open a PR adding entries — no code changes required for new phrases. Each entry should be unambiguous and meaning-preserving in nearly every context. If a substitution would change meaning under a plausible reading, leave it out.

Author

Cihangir Bozdogan — [email protected]

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

prompt-minifier

When you'd use this

Install

Quick start

Library

CLI

Levels

Tokenizer setup

gpt-tokenizer (OpenAI, pure JS)

tiktoken (OpenAI, WASM)

llama-tokenizer-js (Llama / Mistral)

Preserve zones

API

Custom rules

CLI reference

Benchmarks

FAQ

Roadmap

Contributing

Author

License

`gpt-tokenizer` (OpenAI, pure JS)

`tiktoken` (OpenAI, WASM)

`llama-tokenizer-js` (Llama / Mistral)