prompt-identifiers

v0.1.1

Published

20 days ago

Efficient ID compression for LLM prompts in JS/TS/Node. Reduce token usage by up to 90%. UUID, ULID, regex

Downloads

267

0High
0Medium
0Low

fogx

llm ai tokens compression prompt uuid ulid

prompt-identifiers

Efficient, reversible ID compression for LLM prompts - reduce token usage by up to 90%.

Zero runtime dependencies - pure TypeScript implementation.

Installation

npm install prompt-identifiers

Quick Start

import { encode, decode } from "prompt-identifiers";

// Encode UUIDs to short placeholders
const result = encode(
  "User 123e4567-e89b-42d3-a456-426655440000 sent message to 987fcdeb-51a2-43f7-8d9c-0123456789ab",
  { inputFormat: "UUID", outputFormat: "Numeric" }
);

console.log(result.encoded);
// "User 000 sent message to 001"

console.log(result.mapping);
// { "000": "123e4567-e89b-42d3-a456-426655440000", "001": "987fcdeb-51a2-43f7-8d9c-0123456789ab" }

// Decode LLM response back to original IDs
const restored = decode(result.encoded, result.mapping);
// "User 123e4567-e89b-42d3-a456-426655440000 sent message to 987fcdeb-51a2-43f7-8d9c-0123456789ab"

Why Use This?

LLMs tokenize UUIDs inefficiently - a single UUID consumes ~18 tokens. By replacing IDs with short placeholders:

Reduce token usage by up to 90% on ID-heavy prompts
Lower API costs proportionally
Increase effective context window for complex prompts

The mapping preserves the original IDs for perfect reconstruction.

Input Formats

Built-in Formats

| Format | Pattern | Example | | -------- | -------------------------- | -------------------------------------- | | 'UUID' | RFC 4122 UUID v4 | 123e4567-e89b-42d3-a456-426655440000 | | 'ULID' | Crockford Base32, 26 chars | 01ARZ3NDEKTSV4RRFFQ69G5FAV |

Custom RegExp

Pass any RegExp to match custom ID patterns:

// Match custom user IDs
encode("User user-123456 logged in", {
  inputFormat: /user-\d{6}/gi,
  outputFormat: "Numeric",
});
// → { encoded: "User 000 logged in", mapping: { "000": "user-123456" } }

// Match order codes
encode("Order ORD-ABC-123 shipped", {
  inputFormat: /ORD-[A-Z]{3}-\d{3}/gi,
  outputFormat: "Numeric",
});

The global flag (g) is added automatically if not present.

Output Formats

Built-in Formats

| Format | Description | Examples | | --------------- | ---------------------------------------------------- | ---------------------------------- | | 'SafeNumeric' | Recommended. Collision-safe with tildes | ~000~, ~001~, ~002~ | | 'Numeric' | Smart triplet expansion | 000, 001, ..., 999, 001000 | | 'IdToken' | Base62 compact | 0, A, z, 10 | | 'Passthrough' | No replacement | Original text unchanged |

SafeNumeric Format (Recommended)

The SafeNumeric format wraps placeholders in tildes (~) to prevent collision with naturally-occurring numbers in LLM responses. See the main documentation for detailed format comparisons and delimiter guidance.

// Problem with Numeric format:
encode("User abc-123...", config) → "User 000"
// LLM responds: "User 000 reported error code 001"
decode(response, mapping) → Wrong! "001" gets decoded even though it's not a placeholder

// Solution with SafeNumeric:
encode("User abc-123...", config) → "User ~000~"
// LLM responds: "User ~000~ reported error code 001"
decode(response, mapping) → Correct! Only ~000~ is decoded

For custom delimiters, use the template format (see below).

Template Strings

Use { template: string } with format specifiers:

// Plain numeric
encode(text, { inputFormat: "UUID", outputFormat: { template: "<id:{i}>" } });
// → <id:0>, <id:1>, <id:2>, ...

// Zero-padded to 4 digits
encode(text, { inputFormat: "UUID", outputFormat: { template: "ID_{i:04}" } });
// → ID_0000, ID_0001, ID_0002, ...

// Base62 encoding
encode(text, {
  inputFormat: "UUID",
  outputFormat: { template: "[{i:base62}]" },
});
// → [0], [A], [z], [10], ...

// Smart triplet expansion (like SafeNumeric but with custom delimiters)
encode(text, {
  inputFormat: "UUID",
  outputFormat: { template: "[[{i:zeroFilled}]]" },
});
// → [[000]], [[001]], ..., [[999]], [[001000]], ...

Format specifiers:

{i} - plain numeric: 0, 1, 2, ...
{i:02}, {i:03}, {i:04} - zero-padded to N digits
{i:zeroFilled} - smart triplet expansion: 000, 001, ..., 999, 001000, ...
{i:base62} - base62 encoding

Custom Functions

For full control, pass a formatter function:

// Custom prefix
encode(text, {
  inputFormat: "UUID",
  outputFormat: (i) => `[[ID_${i}]]`,
});
// → [[ID_0]], [[ID_1]], ...

// Hex encoding
encode(text, {
  inputFormat: "UUID",
  outputFormat: (i) => `0x${i.toString(16).toUpperCase()}`,
});
// → 0x0, 0x1, ..., 0xA, 0xB, ...

// Letter-based
encode(text, {
  inputFormat: "UUID",
  outputFormat: (i) => String.fromCharCode(65 + i),
});
// → A, B, C, ...

API Reference

See docs/API.md for complete type definitions and detailed documentation.

`encode(text, config)`

function encode(text: string, config: EncodeConfig): EncodeResult;

Replace IDs in text with placeholders. Returns encoded text and a mapping to restore original IDs.

`decode(text, mapping)`

function decode(text: string, mapping: Record<string, string>): string;

Restore original IDs from placeholders using the mapping from encode().

Features

Deduplication: Repeated IDs get the same placeholder
Case insensitive: 123E4567-... and 123e4567-... map to same placeholder
Unicode safe: Works with any surrounding text content
Type-safe: Full TypeScript support with exported types
Zero dependencies: Pure JavaScript, works anywhere

Performance

Native JavaScript implementation - 1.5-2.7x faster than Rust FFI for this workload.

| UUIDs | Roundtrip (μs) | | ----- | -------------- | | 1 | 0.85 | | 10 | 5.09 | | 50 | 26.66 | | 100 | 52.33 | | 500 | 258.19 | | 1000 | 560.70 |

Both encode and decode are O(n) linear time.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

prompt-identifiers

Installation

Quick Start

Why Use This?

Input Formats

Built-in Formats

Custom RegExp

Output Formats

Built-in Formats

SafeNumeric Format (Recommended)

Template Strings

Custom Functions

API Reference

encode(text, config)

decode(text, mapping)

Features

Performance

License

`encode(text, config)`

`decode(text, mapping)`