@hyvmind/tiktoken-ts

v0.1.0

Published

3 months ago

A pure TypeScript implementation of OpenAI's tiktoken tokenizer, compatible with tiktoken-rs

0High
0Medium
0Low

nmindz

tiktoken tokenizer bpe openai gpt gpt-4 gpt-4o claude llm tokens encoding

tiktoken-ts

A pure TypeScript port of OpenAI's tiktoken-rs, providing exact BPE (Byte-Pair Encoding) tokenization compatible with OpenAI's models.

Features

Exact BPE tokenization - Direct port of tiktoken-rs algorithm, produces identical tokens
All OpenAI encodings - r50k_base, p50k_base, p50k_edit, cl100k_base, o200k_base, o200k_harmony
Zero dependencies - Pure TypeScript, works in Node.js and browsers
Lazy vocabulary loading - Vocabularies loaded on-demand from OpenAI CDN (~4-10MB each)
Caching - Vocabularies and tokenizer instances are cached for performance
Fast estimation API - Synchronous heuristic-based counting for quick estimates
Model-aware - Automatic encoding selection for GPT-4, GPT-4o, GPT-5, o-series, and more

Installation

npm install tiktoken-ts

Quick Start

Exact BPE Tokenization (Async)

Use this for exact token counts that match OpenAI's tokenizer:

import {
  getEncodingAsync,
  countTokensAsync,
  encodeAsync,
  decodeAsync,
} from "tiktoken-ts";

// Load encoding and tokenize
const tiktoken = await getEncodingAsync("cl100k_base");
const tokens = tiktoken.encode("Hello, world!");
console.log(tokens); // [9906, 11, 1917, 0]

// Decode back to text (round-trip works!)
const text = tiktoken.decode(tokens);
console.log(text); // "Hello, world!"

// Count tokens
const count = tiktoken.countTokens("Hello, world!");
console.log(count); // 4

// Or use convenience functions
const count2 = await countTokensAsync("Hello, world!", "cl100k_base");
const tokens2 = await encodeAsync("Hello!", "o200k_base");
const decoded = await decodeAsync(tokens2, "o200k_base");

For a Specific Model (Async)

import {
  getEncodingForModelAsync,
  countTokensForModelAsync,
} from "tiktoken-ts";

// Automatically selects the correct encoding for the model
const tiktoken = await getEncodingForModelAsync("gpt-4o");
const tokens = tiktoken.encode("Hello!");

// Or count directly
const count = await countTokensForModelAsync("Hello!", "gpt-4o");

Token Estimation (Sync)

Use this for fast approximate counts when exact accuracy isn't required:

import {
  countTokens,
  estimateMaxTokens,
  getTokenEstimation,
  fitsInContext,
} from "tiktoken-ts";

// Fast token estimation (no vocabulary loading)
const count = countTokens("Hello, world!", { model: "gpt-4o" });

// Estimate safe max_tokens to avoid truncation
const maxTokens = estimateMaxTokens(promptText, "gpt-4o", {
  desiredOutputTokens: 1000,
  safetyMargin: 0.1,
});

// Get detailed estimation with warnings
const estimation = getTokenEstimation(promptText, "gpt-4o");
if (!estimation.fitsInContext) {
  console.warn(estimation.warning);
}

// Check if text fits in context
if (fitsInContext(longText, "gpt-4o", 1000)) {
  // Text fits with 1000 tokens reserved for output
}

API Reference

Exact BPE API (Async)

`getEncodingAsync(encodingName)`

Load an encoding by name. Returns a Tiktoken instance.

const tiktoken = await getEncodingAsync("cl100k_base");

`getEncodingForModelAsync(modelName)`

Get the appropriate encoding for a model.

const tiktoken = await getEncodingForModelAsync("gpt-4o");
// Uses o200k_base for GPT-4o

`Tiktoken` Class Methods

const tiktoken = await getEncodingAsync("cl100k_base");

// Encode text to tokens
const tokens = tiktoken.encode("Hello!"); // [9906, 0]
const tokens = tiktoken.encodeOrdinary("Hello!"); // Same, no special token handling
const tokens = tiktoken.encodeWithSpecialTokens("<|endoftext|>"); // Handles special tokens

// Decode tokens to text
const text = tiktoken.decode(tokens); // "Hello!"
const bytes = tiktoken.decodeBytes(tokens); // Uint8Array

// Count tokens
const count = tiktoken.countTokens("Hello!"); // 2

// Properties
tiktoken.vocabSize; // Vocabulary size (excluding special tokens)
tiktoken.totalVocabSize; // Total vocabulary size
tiktoken.loaded; // Whether vocabulary is loaded
tiktoken.name; // Encoding name

// Special tokens
tiktoken.getSpecialTokens(); // Set of special token strings
tiktoken.isSpecialToken(100257); // Check if token ID is special

Convenience Functions

// Encode/decode without managing instances
const tokens = await encodeAsync("Hello!", "cl100k_base");
const text = await decodeAsync(tokens, "cl100k_base");
const count = await countTokensAsync("Hello!", "cl100k_base");
const count = await countTokensForModelAsync("Hello!", "gpt-4o");

Estimation API (Sync)

`countTokens(text, options?)`

Fast heuristic-based token counting.

// With default encoding (o200k_base)
const count = countTokens("Hello, world!");

// With specific model
const count = countTokens("Hello, world!", { model: "gpt-4o" });

// With specific encoding
const count = countTokens("Hello, world!", { encoding: "cl100k_base" });

`countChatTokens(messages, model?)`

Count tokens in chat messages, including message overhead.

const messages = [
  { role: "system", content: "You are helpful." },
  { role: "user", content: "Hello!" },
];
const count = countChatTokens(messages, "gpt-4o");

`estimateMaxTokens(promptText, model, options?)`

Estimate a safe max_tokens value for API calls.

const maxTokens = estimateMaxTokens(prompt, "gpt-4o", {
  desiredOutputTokens: 1000,
  safetyMargin: 0.1, // 10% safety margin
  minOutputTokens: 100,
  maxOutputTokensCap: 4096,
});

`getTokenEstimation(promptText, model, options?)`

Get detailed estimation with context fit analysis.

const estimation = getTokenEstimation(longPrompt, "gpt-4o", {
  desiredOutputTokens: 2000,
});

console.log({
  promptTokens: estimation.promptTokens,
  recommendedMaxTokens: estimation.recommendedMaxTokens,
  contextLimit: estimation.contextLimit,
  fitsInContext: estimation.fitsInContext,
  warning: estimation.warning,
});

Utility Functions

// Check context fit
fitsInContext(text, "gpt-4o", 1000); // reservedOutputTokens

// Truncate to fit
const truncated = truncateToTokenLimit(longText, 1000, "gpt-4o");

// Split into chunks
const chunks = splitIntoChunks(longText, 500, 100, "gpt-4o"); // maxTokens, overlap

Model Configuration

import {
  getModelConfig,
  getModelContextLimit,
  getModelMaxOutputTokens,
  getEncodingForModel,
  listModels,
} from "tiktoken-ts";

// Get full model config
const config = getModelConfig("gpt-4o");
// { name: "gpt-4o", encoding: "o200k_base", contextLimit: 128000, maxOutputTokens: 16384, family: "gpt-4o" }

// Get specific values
getModelContextLimit("gpt-4o"); // 128000
getModelMaxOutputTokens("gpt-4o"); // 16384
getEncodingForModel("gpt-4o"); // "o200k_base"

// List all supported models
listModels(); // ["gpt-5", "gpt-4o", "gpt-4", ...]

Encoding Selection Guide

This section explains which encoding to use for each model and why.

Quick Reference Table

| Model Family | Encoding | Type | Accuracy | When to Use | | -------------------------------- | ------------------- | ---------- | -------------- | --------------------------------- | | GPT-4o, GPT-4.1, GPT-5, o-series | o200k_base | Exact BPE | 100% | Billing, debugging, decode needed | | GPT-4, GPT-3.5-turbo | cl100k_base | Exact BPE | 100% | Billing, debugging, decode needed | | Claude (all versions) | claude_estimation | Estimation | ~80-90% (safe) | Context management, API limits | | DeepSeek, Gemini | cl100k_base | Estimation | ~70-85% | Rough estimates only | | Legacy GPT-3 | r50k_base | Exact BPE | 100% | Legacy applications | | Codex | p50k_base | Exact BPE | 100% | Legacy code models |

Detailed Encoding Guide

`o200k_base` - Modern OpenAI Models (Recommended)

Use for: GPT-4o, GPT-4o-mini, GPT-4.1, GPT-4.1-mini, GPT-5, GPT-5-mini, o1, o3, o4-mini

// Exact tokenization (async, loads vocabulary)
const tiktoken = await getEncodingAsync("o200k_base");
const tokens = tiktoken.encode("Hello!"); // Exact tokens

// Or use model name (auto-selects o200k_base)
const tiktoken = await getEncodingForModelAsync("gpt-4o");

Characteristics:

200,000 token vocabulary
Most efficient for modern text (~4 chars/token)
Required for exact billing calculations
Supports round-trip encode/decode

`cl100k_base` - GPT-4 Era Models

Use for: GPT-4, GPT-4-turbo, GPT-3.5-turbo, text-embedding-ada-002, text-embedding-3-*

const tiktoken = await getEncodingAsync("cl100k_base");

Characteristics:

100,256 token vocabulary
Slightly less efficient than o200k_base
Still widely used for embeddings

`claude_estimation` - Anthropic Claude Models

Use for: All Claude models (claude-4.5-, claude-4.1-, claude-4-, claude-3.5-, claude-3-, claude-2.)

// Automatic (recommended)
const count = countTokens("Hello!", { model: "claude-3-5-sonnet" });

// Explicit encoding
const count = countTokens("Hello!", { encoding: "claude_estimation" });

// Content-aware (for code, adds extra safety margin)
import { estimateClaudeTokens } from "tiktoken-ts";
const codeCount = estimateClaudeTokens(pythonCode, "code");

IMPORTANT - Claude is estimation only:

Claude uses a proprietary tokenizer (not publicly available)
We apply a 1.25x safety multiplier to prevent API truncation
Estimates are intentionally conservative (over-count)
For exact counts, use Anthropic's Token Counting API

Why 1.25x multiplier?

Research shows Claude produces 16-30% more tokens than GPT-4
English text: +16%, Math: +21%, Code: +30%
1.25x covers worst-case while remaining practical

`p50k_base` / `p50k_edit` - Legacy Codex

Use for: code-davinci-002, text-davinci-003, text-davinci-edit-001

const tiktoken = await getEncodingAsync("p50k_base");

`r50k_base` - Legacy GPT-3

Use for: davinci, curie, babbage, ada (original GPT-3 models)

const tiktoken = await getEncodingAsync("r50k_base");

Decision Flowchart

Is the model from OpenAI?
├─ YES → Is it GPT-4o, GPT-4.1, GPT-5, or o-series?
│        ├─ YES → Use o200k_base (exact)
│        └─ NO → Is it GPT-4 or GPT-3.5?
│                 ├─ YES → Use cl100k_base (exact)
│                 └─ NO → Is it Codex or text-davinci?
│                          ├─ YES → Use p50k_base (exact)
│                          └─ NO → Use r50k_base (exact)
├─ Is the model from Anthropic (Claude)?
│  └─ YES → Use claude_estimation (safe estimate, 1.25x multiplier)
└─ Other (DeepSeek, Gemini, etc.)
   └─ Use cl100k_base estimation (rough approximation only)

Exact vs Estimation: When to Use Which

| Scenario | Use Exact (Async) | Use Estimation (Sync) | | ------------------------- | ----------------- | --------------------- | | Billing/cost calculation | ✅ | ❌ | | Debugging tokenization | ✅ | ❌ | | Need to decode tokens | ✅ | ❌ | | Context window management | Either | ✅ (faster) | | Real-time UI feedback | ❌ (too slow) | ✅ | | Claude models | N/A | ✅ (only option) | | Batch processing | ✅ | Either |

Supported Encodings

| Encoding | Vocab Size | Type | Models | | ------------------- | ---------- | ---------- | --------------------------------- | | o200k_base | 200,000 | Exact BPE | GPT-4o, GPT-4.1, GPT-5, o-series | | o200k_harmony | 200,000 | Exact BPE | gpt-oss | | cl100k_base | 100,256 | Exact BPE | GPT-4, GPT-3.5-turbo, embeddings | | p50k_base | 50,257 | Exact BPE | Code-davinci, text-davinci-003 | | p50k_edit | 50,257 | Exact BPE | text-davinci-edit-001 | | r50k_base | 50,257 | Exact BPE | GPT-3 (davinci, curie, etc.) | | claude_estimation | ~22,000* | Estimation | All Claude models (safe estimate) |

*Claude's actual vocabulary size is estimated at ~22,000 based on research, but the encoding uses cl100k_base patterns with a safety multiplier.

Supported Models

OpenAI

GPT-5 series: gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-turbo
GPT-4.1 series: gpt-4.1, gpt-4.1-mini, gpt-4.1-nano (1M context!)
GPT-4o series: gpt-4o, gpt-4o-mini, chatgpt-4o-latest
GPT-4 series: gpt-4, gpt-4-turbo, gpt-4-32k
GPT-3.5 series: gpt-3.5-turbo, gpt-3.5-turbo-16k
o-series: o1, o1-mini, o3, o3-mini, o4-mini (reasoning models)
Embeddings: text-embedding-ada-002, text-embedding-3-small/large
Fine-tuned: ft:gpt-4o, ft:gpt-4, ft:gpt-3.5-turbo

Anthropic Claude (Safe Estimation)

Claude models use a dedicated claude_estimation encoding that provides safe token estimates with a built-in safety margin. This is designed to prevent API truncation by intentionally over-counting tokens.

Why is Claude different?

Claude uses a proprietary tokenizer that is NOT publicly available. Based on research:

Claude 3+ uses ~22,000 token vocabulary (vs OpenAI's 100K-200K)
Claude produces 16-30% MORE tokens than GPT-4 for equivalent content
Average ~3.5 characters per token (vs GPT-4's ~4)

Our solution:

The claude_estimation encoding applies a 1.25x safety multiplier to ensure estimates err on over-counting. This prevents API truncation while still providing useful estimates.

import {
  countTokens,
  usesClaudeEstimation,
  estimateClaudeTokens,
} from "tiktoken-ts";

// Automatic safe estimation for Claude models
const count = countTokens("Hello, Claude!", { model: "claude-4-5-sonnet" });

// Check if model uses Claude estimation
if (usesClaudeEstimation("claude-3-opus")) {
  console.log("This uses safe Claude estimation");
}

// Content-aware estimation (code has additional +10% multiplier)
const codeCount = estimateClaudeTokens(pythonCode, "code");

For exact Claude token counts, use Anthropic's official API:

Token Counting API

Supported Claude models:

Claude 4.5, 4.1, 4, 3.5, 3, 2 series

Others (Estimation only)

DeepSeek, Gemini (using cl100k_base approximation)

Accuracy

Exact BPE API

The async API produces identical tokens to OpenAI's tiktoken and tiktoken-rs. Use this when:

You need exact token counts for billing
You're debugging tokenization issues
You need to decode tokens back to text

Estimation API

The sync estimation API uses heuristics and is:

Fast - No vocabulary loading (instant)
Approximate - Typically within ±10-15% for English (OpenAI models)
Conservative - Tends to slightly over-estimate, safer for API calls

Use estimation when:

You need quick approximate counts
You're doing context window management
Exact counts aren't critical

Claude Estimation

For Claude models, the estimation is intentionally conservative with a 1.25x safety multiplier because:

Claude's tokenizer is proprietary (not publicly available)
Claude produces 16-30% more tokens than GPT-4 for equivalent content
Over-estimation is safer than under-estimation for API limits

For exact Claude counts, use Anthropic's Token Counting API.

Browser Usage

The exact BPE API works in browsers but requires fetching vocabulary files (~4-10MB each). Vocabularies are cached after first load.

// Works in browsers
const tiktoken = await getEncodingAsync("cl100k_base");
const tokens = tiktoken.encode("Hello!");

For bundle-size-sensitive applications, consider:

Using the estimation API (zero network requests)
Pre-loading vocabularies at app startup
Using a service worker to cache vocabularies

Comparison with Other Libraries

| Library | Exact BPE | Sync API | Bundle Size | Dependencies | | --------------- | --------- | --------------- | ----------- | -------------- | | tiktoken-ts | ✅ | ✅ (estimation) | ~50KB | 0 | | tiktoken (WASM) | ✅ | ✅ | ~4MB | WASM | | gpt-tokenizer | ✅ | ✅ | ~10MB | Embedded vocab | | gpt-3-encoder | ❌ | ✅ | ~2MB | r50k only |

Development

# Install dependencies
npm install

# Build
npm run build

# Run tests
npm test

# Type check
npm run typecheck

# Lint
npm run lint

# Format
npm run format

Architecture

See ARCHITECTURE.md for detailed implementation notes.

Key design decisions:

Vocabularies loaded from CDN (not embedded) to keep package small
Dual API: exact async + fast sync estimation
Direct port of tiktoken-rs BPE algorithm for correctness
Global caching of vocabularies and instances

License

MIT

Credits

tiktoken-rs - Original Rust implementation
tiktoken - OpenAI's Python implementation