mint-id

v1.0.2

Published

a month ago

Cryptographically secure identifiers using single-token dictionary words

0High
0Medium
0Low

abir-taheer

id identifier token llm csprng wordlist diceware

mint-id

Cryptographically secure identifiers using single-token dictionary words. Zero runtime dependencies. Works in Node, Cloudflare Workers, Deno, and browsers.

import { mintId } from 'mint-id';

mintId();
// → "guide_pull_rich_deep_fast_sale_club_warm_bold_true_gain_plan"

Every word in the output tokenizes to exactly 1 token across all major LLM tokenizers. A pre-signed S3 URL costs 229 tokens. A mint-id with equivalent entropy costs 24.

Install

npm install mint-id

Quick Start

import { mintId, entropy, wordlist } from 'mint-id';

// Default: 12 words, ~133 bits entropy, universal compatibility
mintId();

// Optimize for a specific model (larger wordlist = more entropy per word)
mintId({ model: 'claude-opus-4-6' });

// Target a minimum entropy level
mintId({ minEntropy: 256 });

// Customize format
mintId({ words: 8, delimiter: '-' });

Extended Mode

Need a larger word pool? Import from mint-id/extended to unlock maxTokens:

import { mintId, wordlist } from 'mint-id/extended';

// Allow words up to 2 tokens each — much larger pool
mintId({ maxTokens: 2 });

wordlist({ maxTokens: 1 }).length;  // 2,220 (same as base)
wordlist({ maxTokens: 2 }).length;  // ~100k words
wordlist({ maxTokens: 3 }).length;  // ~195k words

The extended entry point includes all 234k dictionary words with per-tokenizer token counts. Tree-shakeable — your bundler only includes the extended data if you import from mint-id/extended.

| Entry point | Bundle size | Word pool | maxTokens | |-------------|-------------|-----------|-------------| | mint-id | ~173 KB | 10k single-token words | N/A (always 1) | | mint-id/extended | ~6.6 MB | 234k words | 1–∞ |

API

`mintId(options?): string`

Generate a cryptographically secure identifier.

| Option | Type | Default | Description | |--------|------|---------|-------------| | words | number | 12 | Number of words. Ignored if minEntropy is set. | | minEntropy | number | — | Minimum bits of entropy. Overrides words. | | delimiter | string | '_' | Separator between words. | | model | string \| string[] | — | Model(s) to optimize for. | | tokenizer | string \| string[] | — | Tokenizer(s) to target. Takes precedence over model. | | maxTokens | number | 1 | Max tokens per word. Extended only. |

When no model or tokenizer is specified, uses the universal wordlist (words that meet the token threshold in all supported tokenizers).

When multiple models or tokenizers are specified, uses words that meet the threshold in all of them (intersection).

`entropy(options?): EntropyInfo`

Calculate entropy for a given configuration without generating an ID.

entropy();
// → { bits: 133.4, poolSize: 2220, bitsPerWord: 11.12, words: 12 }

entropy({ model: 'claude-opus-4-6' });
// → { bits: 154.8, poolSize: 7629, bitsPerWord: 12.9, words: 12 }

`wordlist(options?): string[]`

Get the raw wordlist for a model/tokenizer configuration.

wordlist().length;                             // 2220 (universal)
wordlist({ model: 'claude-opus-4-6' }).length; // 7629
wordlist({ tokenizer: 'gpt2' }).length;        // 3495

Error Classes

All errors extend MintIdError for easy catch-all handling:

import { UnknownModelError, UnknownTokenizerError, EmptyWordlistError } from 'mint-id';

try {
  mintId({ model: 'nonexistent' });
} catch (e) {
  if (e instanceof UnknownModelError) {
    console.log(e.model); // "nonexistent"
  }
}

| Class | Thrown when | |-------|-----------| | UnknownModelError | Unrecognized model name | | UnknownTokenizerError | Unrecognized tokenizer name | | EmptyWordlistError | Wordlist is empty after filtering |

Supported Models & Tokenizers

| Model | Tokenizer | Pool size | Bits/word | |-------|-----------|-----------|-----------| | claude-opus-4-6, claude-sonnet-4-5, claude-haiku-4-5, etc. | claude | 7,629 | 12.90 | | gpt-4, gpt-4-turbo, gpt-3.5-turbo | cl100k_base | 4,914 | 12.26 | | gpt-4o, gpt-5, o1, o3, o4-mini, etc. | o200k_base | 5,640 | 12.46 | | gpt-2, gpt-3 | gpt2 | 3,495 | 11.77 | | (default — all models) | universal | 2,220 | 11.12 |

Tokenizer aliases: o200k_harmony → o200k_base, p50k_base/p50k_edit/r50k_base → gpt2

Why Single-Token Words?

LLM tokenizers split text into tokens. Most English words tokenize to 1 token, but random hex/UUIDs tokenize poorly:

| Identifier type | Example | Tokens | Entropy (bits) | Bits/token | |-----------------|---------|--------|----------------|------------| | Pre-signed S3 URL | https://bucket.s3...&Signature=abc123 | 229 | ~330 | 1.44 | | UUID v4 | 550e8400-e29b-41d4-a716-446655440000 | 25 | 122 | 4.88 | | SHA-256 (hex) | e3b0c44298fc1c149afb... | 43 | 256 | 5.95 | | mint-id (12 words) | guide_pull_rich_deep_fast_sale_... | 24 | 133 | 5.56 | | mint-id (6 words, claude) | guide_pull_rich_deep_fast_sale | 12 | 77 | 6.44 |

Single-token words achieve 5.5–6.4 bits per token vs UUIDs at 4.9 bits/token. Over thousands of API calls, this saves significant token budget.

Token Budget Formula

Total tokens = base_tokens + (2 × word_count)

Where base_tokens accounts for the delimiter overhead. For underscore-delimited IDs:

6-word ID: ~12 tokens
12-word ID: ~24 tokens

Entropy Guide

| Words | Bits (universal) | Comparable to | Use case | |-------|-----------------|---------------|----------| | 4 | 44.5 | — | Short-lived, internal | | 6 | 66.7 | — | Temporary URLs, session IDs | | 8 | 89.0 | — | API keys, medium-term | | 10 | 111.2 | — | Long-lived identifiers | | 12 | 133.4 | UUID v4 (122 bits) | Default. General purpose | | 16 | 177.9 | — | High security | | 23 | 255.8 | SHA-256 | Maximum security |

Security

CSPRNG: Uses crypto.randomInt() exclusively. No Math.random().
Diceware principle: Security depends on pool size and word count, not on keeping the wordlist secret.
What an attacker knows: The wordlist (it's public), the number of words, the delimiter.
What an attacker doesn't know: Which words were selected (chosen by OS CSPRNG).
Brute force: 2,220^12 ≈ 2^133 combinations at default settings.

Runtime Compatibility

No fs or Node.js-specific APIs beyond crypto.randomInt(). Works in:

Node.js >= 14.10
Cloudflare Workers (with nodejs_compat)
Deno
Bun

Data Source

Wordlists are generated from /usr/share/dict/words (macOS, 235,976 entries), filtered by token count. Token counts are verified against:

Claude: Anthropic token counting API
cl100k_base: tiktoken (GPT-4, GPT-3.5)
o200k_base: tiktoken (GPT-4o, GPT-5, o-series)
gpt2: tiktoken (GPT-2, GPT-3)

Source data and export scripts live in token-counter.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

mint-id

Install

Quick Start

Extended Mode

API

mintId(options?): string

entropy(options?): EntropyInfo

wordlist(options?): string[]

Error Classes

Supported Models & Tokenizers

Why Single-Token Words?

Token Budget Formula

Entropy Guide

Security

Runtime Compatibility

Data Source

License

`mintId(options?): string`

`entropy(options?): EntropyInfo`

`wordlist(options?): string[]`