ai-token-estimator
v1.7.1
Published
Estimate and count tokens (incl. exact OpenAI BPE) and input costs for LLM API calls
Downloads
60
Maintainers
Readme
ai-token-estimator
The best way to estimate tokens + input cost for LLM calls — with exact OpenAI tokenization (tiktoken-compatible BPE), a pure TypeScript SentencePiece tokenizer (T5, ALBERT, XLNet, Gemma, LLaMA 2, and more), and optional provider-backed token counting for Anthropic + Gemini.
Zero external dependencies — pure TypeScript implementation of both BPE and SentencePiece tokenizers.
Features
- Exact OpenAI tokenization (tiktoken-compatible BPE):
encode()/decode()/openai_exact - Chat-aware tokenization:
encodeChat()returns exact token IDs for chat messages using ChatML format - Fast token limit checking:
isWithinTokenLimit()/isChatWithinTokenLimit()with early-exit optimization (up to 1000x faster for large texts) - Generator-based streaming:
encodeGenerator()/encodeChatGenerator()/decodeGenerator()/decodeAsyncGenerator()for memory-efficient tokenization - OpenAI chat completion token counting (legacy
functionsAPI):countChatCompletionTokens()with optional per-message breakdown - Pure TypeScript SentencePiece tokenizer (no native dependencies):
- Supports
.modelfiles (protobuf format) - Supports
tokenizer.jsonfiles (HuggingFace format, validated configs) - Unigram + SentencePiece-style BPE, plus merges-based JSON-BPE (when represented in
tokenizer.json) - Works in Node.js and browsers
- Supports
- Official provider token counting (async):
- Anthropic
POST /v1/messages/count_tokens(anthropic_count_tokens) - Gemini
models/:countTokens(gemini_count_tokens)
- Anthropic
- Fast local fallback options:
- Heuristic (
heuristic, default) - Local SentencePiece tokenization for Gemma/LLaMA/T5 models
- Automatic fallback to heuristic on provider failures (
fallbackToHeuristicOnError)
- Heuristic (
- Cost estimation using a weekly auto-updated pricing/model list (GitHub Actions)
- TypeScript-first, ships ESM + CJS, zero runtime dependencies
Installation
npm install ai-token-estimatorUsage
import { countTokens, estimate, getAvailableModels } from 'ai-token-estimator';
// Basic usage
const result = estimate({
text: 'Hello, world! This is a test message.',
model: 'gpt-4o'
});
console.log(result);
// {
// model: 'gpt-4o',
// characterCount: 38,
// estimatedTokens: 10,
// estimatedInputCost: 0.000025,
// charsPerToken: 4
// }
// List available models
console.log(getAvailableModels());
// ['gpt-5.2', 'gpt-4o', 'claude-opus-4.5', 'gemini-3-pro', ...]
// Exact tokens for OpenAI, heuristic for others
console.log(countTokens({ text: 'Hello, world!', model: 'gpt-5.1' }));
// { tokens: 4, exact: true, encoding: 'o200k_base' }Quick Recipes
Encode chat messages to tokens (ChatML format)
import { encodeChat, decode } from 'ai-token-estimator';
const tokens = encodeChat([
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'Hello!' }
], { model: 'gpt-4o' });
console.log(tokens); // [200264, 9125, 200266, 2610, 525, 11190, 13, 200265, ...]
console.log(decode(tokens, { encoding: 'o200k_base' }));
// <|im_start|>system<|im_sep|>You are helpful.<|im_end|>...OpenAI chat completion tokens (legacy functions API)
import { countChatCompletionTokens } from 'ai-token-estimator';
const { totalTokens } = countChatCompletionTokens({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello!' }],
});Fast token limit checking (early exit)
import { isWithinTokenLimit, isChatWithinTokenLimit } from 'ai-token-estimator';
// Plain text - returns token count or false if exceeded
const count = isWithinTokenLimit(longText, 4096, { model: 'gpt-4o' });
if (count === false) console.log('Text exceeds limit');
// Chat messages - same early-exit optimization
const chatCount = isChatWithinTokenLimit({
messages: [{ role: 'user', content: longText }],
model: 'gpt-4o',
tokenLimit: 4096,
});Generator-based streaming tokenization
import { encodeGenerator, decodeAsyncGenerator } from 'ai-token-estimator';
// Stream-encode large text (memory efficient)
let tokenCount = 0;
for (const tokenChunk of encodeGenerator(hugeText, { model: 'gpt-4o' })) {
tokenCount += tokenChunk.length;
// Process chunk...
}
// Decode streaming LLM response
async function decodeLLMStream(tokenStream: AsyncIterable<number>) {
for await (const text of decodeAsyncGenerator(tokenStream, { model: 'gpt-4o' })) {
process.stdout.write(text);
}
}Local SentencePiece token counting
import { countSentencePieceTokensAsync } from 'ai-token-estimator';
const tokens = await countSentencePieceTokensAsync('Hello!', {
modelPath: './path/to/spiece.model',
});Provider-backed counts (server-side)
import { estimateAsync } from 'ai-token-estimator';
const out = await estimateAsync({
model: 'claude-sonnet-4.5',
text: 'Hello!',
tokenizer: 'anthropic_count_tokens',
});Exact OpenAI tokenization (BPE)
This package includes exact tokenization for OpenAI models using a built-in tiktoken-compatible BPE tokenizer.
Notes:
- Exact tokenization is slower than heuristic estimation;
estimate()defaults to'heuristic'to keep existing behavior fast. - For distribution/build compatibility, all OpenAI vocabularies are bundled (trade-off: larger bundle size).
- Pure TypeScript implementation: works in Node.js and browsers (no native deps, no WASM).
import { encode, decode } from 'ai-token-estimator';
const text = 'Hello, world!';
const tokens = encode(text, { model: 'gpt-5.1' }); // exact OpenAI token IDs
const roundTrip = decode(tokens, { model: 'gpt-5.1' });
console.log(tokens.length);
console.log(roundTrip); // "Hello, world!"Supported encodings:
r50k_base, p50k_base, p50k_edit, cl100k_base, o200k_base, o200k_harmony
SentencePiece Tokenizer (T5, ALBERT, XLNet, Gemma, LLaMA 2, etc.)
This package includes a pure TypeScript SentencePiece tokenizer with zero native dependencies. It supports models that use Unigram or SentencePiece-style BPE tokenization (plus validated tokenizer.json configurations), including T5, ALBERT, XLNet, Gemma, LLaMA 2, and many HuggingFace models.
Basic Usage
import {
loadSentencePieceTokenizer,
getSentencePieceTokenizer,
encodeSentencePiece,
decodeSentencePiece,
countSentencePieceTokens
} from 'ai-token-estimator';
// Async API (Node.js) - load from file
const tokenizer = await loadSentencePieceTokenizer({
modelPath: './path/to/tokenizer.model'
});
const tokens = tokenizer.encode('Hello, world!');
const text = tokenizer.decode(tokens);
console.log(tokens); // [8774, 6, 296, 55]
console.log(text); // "Hello, world!"
console.log(tokenizer.vocabSize); // 32000
console.log(tokenizer.algorithm); // "unigram" or "bpe"
// Sync API (browser/serverless) - from ArrayBuffer
const response = await fetch('/models/tokenizer.model');
const modelData = new Uint8Array(await response.arrayBuffer());
const tokenizer2 = getSentencePieceTokenizer({ modelData });Supported Model Formats
| Format | Extension | Description |
|--------|-----------|-------------|
| SentencePiece Protobuf | .model | Native SentencePiece format (T5, ALBERT, XLNet, Gemma) |
| HuggingFace JSON | tokenizer.json | HuggingFace tokenizers format (many models) |
Notes:
tokenizer.jsonsupport is intentionally scoped to validated configs (Unigram and merges-based BPE with Metaspace-style whitespace handling). If a tokenizer JSON uses ByteLevel/byte-fallback pipelines (GPT-2-style), this library will throw a helpful error.
// Load .model file (protobuf)
const tokenizer = await loadSentencePieceTokenizer({
modelPath: './t5-base/spiece.model'
});
// Load tokenizer.json (HuggingFace format)
const tokenizer = await loadSentencePieceTokenizer({
modelPath: './my-hf-model/tokenizer.json',
format: 'json' // optional, auto-detected from extension
});Model Download Helper
For convenience, you can automatically download known models (opt-in network access):
import { ensureSentencePieceModel, MODEL_REGISTRY } from 'ai-token-estimator';
// Download a known model (cached locally). No network calls unless allowDownload: true.
const modelPath = await ensureSentencePieceModel({
tokenizer: 't5-base',
allowDownload: true,
cacheDir: './models', // optional; default: ~/.cache/sentencepiece (or SENTENCEPIECE_MODEL_CACHE_DIR)
});
const tokenizer = await loadSentencePieceTokenizer({ modelPath });
// Available pre-configured models (registry can be extended)
console.log(Object.keys(MODEL_REGISTRY));
// ['t5-base', 'albert-base-v2', 'xlnet-base-cased', 'gemma', 'llama2', ...]Notes:
- Downloads are disabled by default (
allowDownload: false) to avoid surprise network calls. - Some registry entries may be gated and require HuggingFace authentication (
HF_TOKEN/HUGGINGFACE_HUB_TOKENorauthTokenoption).
Convenience Functions
import {
encodeSentencePieceAsync,
decodeSentencePieceAsync,
countSentencePieceTokensAsync
} from 'ai-token-estimator';
// One-liner encoding/decoding (loads model each time - use tokenizer instance for batch)
const tokens = await encodeSentencePieceAsync('Hello!', { modelPath: './model.model' });
const text = await decodeSentencePieceAsync(tokens, { modelPath: './model.model' });
const count = await countSentencePieceTokensAsync('Hello!', { modelPath: './model.model' });Algorithm Support
| Algorithm | Description | Models |
|-----------|-------------|--------|
| Unigram | Probabilistic subword segmentation | T5, ALBERT, XLNet, mT5 |
| SentencePiece BPE | Score-based BPE used in .model files | Gemma, LLaMA 2 (and other SP-BPE models) |
| JSON-BPE (merges-based) | BPE defined by vocab + merges[] in tokenizer.json | Some HuggingFace tokenizers (validated configs) |
The algorithm is automatically detected from the model file.
Advanced: Working with Custom Models
import { parseModelProto, UnigramEncoder, BPEEncoder } from 'ai-token-estimator';
// Low-level: parse model protobuf directly
const modelBytes = fs.readFileSync('./custom.model');
const model = parseModelProto(new Uint8Array(modelBytes));
console.log(model.pieces.length); // vocabulary size
console.log(model.trainerSpec?.modelType); // 1 = UNIGRAM, 2 = BPE
console.log(model.normalizerSpec?.name); // e.g., 'nmt_nfkc'
// Create encoder directly
const encoder = new UnigramEncoder(model.pieces, {
trainerSpec: model.trainerSpec
});
const tokens = encoder.encode('Hello, world!');Normalization
The tokenizer handles SentencePiece normalization automatically:
- Dummy prefix: Adds space before text (configurable)
- Whitespace escaping: Converts spaces to
▁(U+2581) - NFKC normalization: Unicode normalization
- Extra whitespace removal: Collapses multiple spaces
// Access normalizer directly
import { Normalizer } from 'ai-token-estimator';
const normalizer = new Normalizer({
normalizerSpec: model.normalizerSpec,
denormalizerSpec: model.denormalizerSpec,
});
const normalized = normalizer.normalize('Hello World');
// "▁Hello▁World" (with dummy prefix and escaped spaces)
const denormalized = normalizer.denormalize('▁Hello▁World');
// "Hello World"Browser Usage
The SentencePiece tokenizer works in browsers without any polyfills:
// Fetch model and create tokenizer
async function loadTokenizer(modelUrl: string) {
const response = await fetch(modelUrl);
const modelData = new Uint8Array(await response.arrayBuffer());
return getSentencePieceTokenizer({ modelData });
}
const tokenizer = await loadTokenizer('/models/t5.model');
const tokens = tokenizer.encode('Browser tokenization!');Caching
Model parsing is automatically cached for performance:
import { clearModelCache } from 'ai-token-estimator';
// Clear cache if needed (e.g., for memory management)
clearModelCache();Using the exact tokenizer with estimate()
estimate() is heuristic by default (fast). If you want to use exact OpenAI token counting:
import { estimate } from 'ai-token-estimator';
const result = estimate({
text: 'Hello, world!',
model: 'gpt-5.1',
tokenizer: 'openai_exact',
});
console.log(result.tokenizerMode); // "openai_exact"
console.log(result.encodingUsed); // "o200k_base"Or use tokenizer: 'auto' to use exact counting for OpenAI models and heuristic for everything else.
Provider token counting (Claude / Gemini)
If you want more accurate token counts for Anthropic or Gemini models, you can call their official token counting endpoints
via estimateAsync(). This requires API keys, and therefore should be used server-side (never in the browser).
If you want these modes to fail open (fallback to heuristic estimation) when the provider API is throttled/unavailable or the API key is invalid,
set fallbackToHeuristicOnError: true.
Anthropic: POST /v1/messages/count_tokens
- Env var:
ANTHROPIC_API_KEY
import { estimateAsync } from 'ai-token-estimator';
const out = await estimateAsync({
text: 'Hello, Claude',
model: 'claude-sonnet-4.5',
tokenizer: 'anthropic_count_tokens',
fallbackToHeuristicOnError: true,
anthropic: {
// apiKey: '...' // optional; otherwise uses process.env.ANTHROPIC_API_KEY
system: 'You are a helpful assistant',
},
});
console.log(out.estimatedTokens);Gemini: models/:countTokens (Google AI Studio)
- Env var:
GEMINI_API_KEY
import { estimateAsync } from 'ai-token-estimator';
const out = await estimateAsync({
text: 'The quick brown fox jumps over the lazy dog.',
model: 'gemini-2.0-flash',
tokenizer: 'gemini_count_tokens',
fallbackToHeuristicOnError: true,
gemini: {
// apiKey: '...' // optional; otherwise uses process.env.GEMINI_API_KEY
},
});
console.log(out.estimatedTokens);Local Gemini option: Gemma SentencePiece
If you want a local tokenizer option for Gemini-like models, you can use a SentencePiece tokenizer model (e.g. Gemma's tokenizer.model) with our pure TypeScript SentencePiece implementation.
import { estimateAsync, countGemmaSentencePieceTokens } from 'ai-token-estimator';
// Via estimateAsync
const out = await estimateAsync({
text: 'Hello!',
model: 'gemini-2.0-flash',
tokenizer: 'gemma_sentencepiece',
gemma: {
modelPath: '/path/to/tokenizer.model',
},
});
console.log(out.estimatedTokens);
// Or use directly
const count = await countGemmaSentencePieceTokens({
modelPath: '/path/to/tokenizer.model',
text: 'Hello, world!'
});Note:
- This is not an official Gemini tokenizer; treat it as an approximation unless you have verified equivalence for your models.
- Uses our pure TypeScript SentencePiece implementation (no native dependencies).
API Reference
estimate(input: EstimateInput): EstimateOutput
Estimates token count and cost for the given text and model.
Parameters:
interface EstimateInput {
text: string; // The text to estimate tokens for
model: string; // Model ID (e.g., 'gpt-4o', 'claude-opus-4.5')
rounding?: 'ceil' | 'round' | 'floor'; // Rounding strategy (default: 'ceil')
tokenizer?: 'heuristic' | 'openai_exact' | 'auto'; // Token counting strategy (default: 'heuristic')
// Extended cost estimation (optional)
outputTokens?: number; // Output tokens for cost calculation
cachedInputTokens?: number; // Cached input tokens (OpenAI only, must be <= estimatedTokens)
mode?: 'standard' | 'batch'; // Pricing mode (default: 'standard')
}Note:
- Provider-backed modes (
anthropic_count_tokens,gemini_count_tokens,gemma_sentencepiece) are only supported inestimateAsync(). - When
outputTokens,cachedInputTokens, ormodeis provided, the model must have the corresponding pricing available or an error is thrown.
Returns:
interface EstimateOutput {
model: string; // The model used
characterCount: number; // Number of Unicode code points
estimatedTokens: number; // Estimated token count (integer)
estimatedInputCost: number; // Estimated input cost in USD
charsPerToken: number; // The ratio used for this model
tokenizerMode?: 'heuristic' | 'openai_exact' | 'auto'; // Which strategy was used
encodingUsed?: string; // OpenAI encoding when using exact tokenization
// Extended cost fields (when cost inputs are provided)
outputTokens?: number; // Echoed from input
estimatedOutputCost?: number; // Output token cost in USD
estimatedCachedInputCost?: number; // Cached input cost in USD
estimatedTotalCost: number; // Total cost (input + output + cached)
}estimateAsync(input: EstimateAsyncInput): Promise<EstimateOutput>
Async estimator that supports provider token counting modes:
anthropic_count_tokens(Anthropic token count endpoint)gemini_count_tokens(Gemini token count endpoint)gemma_sentencepiece(local SentencePiece tokenization using built-in pure TypeScript implementation)
API keys should be provided via env vars (ANTHROPIC_API_KEY, GEMINI_API_KEY) or passed explicitly in the config objects.
If you pass fallbackToHeuristicOnError: true, provider-backed modes will fall back to heuristic estimation on:
- invalid/expired API key (401/403)
- rate limiting (429)
- provider errors (5xx) or network issues
countTokens(input: TokenCountInput): TokenCountOutput
Counts tokens for a given model:
- OpenAI models: exact BPE tokenization
- Other providers: heuristic estimate
import { countTokens } from 'ai-token-estimator';
const result = countTokens({ text: 'Hello, world!', model: 'gpt-5.1' });
// { tokens: 4, exact: true, encoding: 'o200k_base' }countChatCompletionTokens(input: ChatCompletionTokenCountInput): ChatCompletionTokenCountOutput
Counts tokens for an OpenAI chat completion request, including messages, function definitions, and function_call controls. Achieves exact parity with OpenAI's actual token counting for normal text inputs.
Important limitations:
- Legacy functions API only — supports
functionsandfunction_callparameters - Tools API not supported — throws if
tools,tool_choice,tool_calls, ortool_call_idare present - Text content only — throws for multimodal content (arrays, images)
- Chat models only — rejects non-chat models like
davinci-002
import { countChatCompletionTokens } from 'ai-token-estimator';
const result = countChatCompletionTokens({
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is the weather in Paris?' }
],
model: 'gpt-4o',
functions: [{
name: 'get_weather',
description: 'Get weather for a location',
parameters: {
type: 'object',
properties: {
location: { type: 'string', description: 'City name' }
},
required: ['location']
}
}],
function_call: 'auto',
includeBreakdown: true // optional: get per-message token breakdown
});
console.log(result);
// {
// totalTokens: 75,
// messageTokens: 25,
// completionOverheadTokens: 3,
// functionTokens: 42,
// functionCallTokens: 0,
// exact: true,
// encoding: 'o200k_base',
// messageBreakdown: [...] // when includeBreakdown: true
// }Parameters:
interface ChatCompletionTokenCountInput {
messages: ChatMessage[]; // Chat messages
model: string; // OpenAI chat model (e.g., 'gpt-4o')
encoding?: OpenAIEncoding; // Override encoding for new models
functions?: FunctionDefinition[]; // Legacy function definitions
function_call?: 'auto' | 'none' | { name: string }; // Function calling control
includeBreakdown?: boolean; // Include per-message token breakdown
}Returns:
interface ChatCompletionTokenCountOutput {
totalTokens: number; // Total tokens in the request
messageTokens: number; // Tokens from messages (including overhead)
completionOverheadTokens: number; // Reply priming tokens (always 3)
functionTokens: number; // Tokens from function definitions
functionCallTokens: number; // Tokens from function_call control
exact: true; // Always exact for this function
encoding: OpenAIEncoding; // Encoding used
messageBreakdown?: Array<{ // Per-message breakdown (if requested)
role: string;
stringTokens: number;
overheadTokens: number;
totalTokens: number;
}>;
}getAvailableModels(): string[]
Returns an array of all supported model IDs.
encode(text: string, options?: EncodeOptions): number[]
Encodes text into OpenAI token IDs using tiktoken-compatible BPE tokenization.
decode(tokens: Iterable<number>, options?: { encoding?: OpenAIEncoding; model?: string }): string
Decodes OpenAI token IDs back into text using the selected encoding/model.
encodeChat(messages: ChatMessage[], options?: EncodeChatOptions): number[]
Encodes chat messages into exact token IDs using ChatML format. Returns the ChatML message prompt tokens (messages + optional assistant priming), including special delimiter tokens (<|im_start|>, <|im_sep|>, <|im_end|>).
import { encodeChat, decode } from 'ai-token-estimator';
const tokens = encodeChat([
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'Hello!' }
], { model: 'gpt-4o' });
// Tokens include ChatML structure:
// <|im_start|>system<|im_sep|>You are helpful.<|im_end|>
// <|im_start|>user<|im_sep|>Hello!<|im_end|>
// <|im_start|>assistant<|im_sep|> (priming)Parameters:
interface EncodeChatOptions {
model?: string; // OpenAI model (e.g., 'gpt-4o')
encoding?: OpenAIEncoding; // Explicit encoding override
primeAssistant?: boolean; // Append assistant priming (default: true)
}Supported encodings:
cl100k_base(GPT-4, GPT-3.5-turbo)o200k_base(GPT-4o, GPT-4o-mini)o200k_harmony(experimental)
Limitations:
- OpenAI models only — throws for claude-, gemini-
- Legacy functions API only — throws for tool_calls, tool_call_id
- Text content only — throws for multimodal content (arrays)
Note on function_call: Messages with function_call are encoded with the function name and arguments as content. The token count differs from countChatCompletionTokens() because the latter includes FUNCTION_CALL_METADATA_TOKEN_OVERHEAD (3 tokens) for API accounting. The exact difference depends on whether both name and arguments are present (2 token difference due to newline separator) or only one field is present (3 token difference).
Note on o200k_harmony: Support for o200k_harmony encoding is experimental. The token structure may not match actual API behavior.
isWithinTokenLimit(text, tokenLimit, options?): false | number
Checks if text is within a token limit with early exit optimization. Returns false if the limit is exceeded, or the actual token count if within limit.
This is significantly faster than full tokenization when the limit is exceeded early in the text (up to 1000x+ faster for large texts with small limits).
import { isWithinTokenLimit } from 'ai-token-estimator';
// Returns token count if within limit
const count = isWithinTokenLimit('Hello, world!', 100, { model: 'gpt-4o' });
if (count !== false) {
console.log(`Text has ${count} tokens`);
}
// Returns false if exceeds limit (with early exit)
const result = isWithinTokenLimit(longText, 10, { model: 'gpt-4o' });
if (result === false) {
console.log('Text exceeds 10 tokens');
}Parameters:
interface IsWithinTokenLimitOptions {
model?: string; // OpenAI model (e.g., 'gpt-4o')
encoding?: OpenAIEncoding; // Explicit encoding override
allowSpecial?: SpecialTokenHandling; // How to handle special tokens
}Throws:
ErroriftokenLimitis invalid (NaN, Infinity, negative, non-integer)Errorifmodelis a known non-OpenAI model (claude-, gemini-)
isChatWithinTokenLimit(input): false | number
Checks if chat messages are within a token limit with early exit optimization. Returns false if exceeded, or the actual token count if within limit.
Uses the same token counting logic as countChatCompletionTokens() but exits early when the limit is exceeded.
import { isChatWithinTokenLimit } from 'ai-token-estimator';
const result = isChatWithinTokenLimit({
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' }
],
model: 'gpt-4o',
tokenLimit: 100,
functions: [{ name: 'get_weather', parameters: { type: 'object' } }],
});
if (result === false) {
console.log('Messages exceed token limit');
} else {
console.log(`Messages use ${result} tokens`);
}Parameters:
interface IsChatWithinTokenLimitInput {
messages: ChatMessage[];
model: string;
tokenLimit: number;
encoding?: OpenAIEncoding;
functions?: FunctionDefinition[];
function_call?: FunctionCallOption;
}Throws:
ErroriftokenLimitis invalid (NaN, Infinity, negative, non-integer)Errorif model is not an OpenAI model (unless encoding override provided)Errorif tools, tool_choice, tool_calls, or tool_call_id are presentErrorif any message has non-string content
Generator APIs
Generator-based APIs for memory-efficient streaming tokenization.
encodeGenerator(text, options?): Generator<number[], number, undefined>
Encode text yielding token chunks. Memory-efficient for large inputs.
- Yields:
number[]— token IDs per regex-matched piece (word/punctuation) - Returns:
number— total token count when iteration completes
import { encodeGenerator } from 'ai-token-estimator';
// Stream-encode large text
let tokenCount = 0;
for (const tokenChunk of encodeGenerator(hugeText, { model: 'gpt-4o' })) {
tokenCount += tokenChunk.length;
}
// Or get total count from return value
const gen = encodeGenerator(text, { model: 'gpt-4o' });
let result = gen.next();
while (!result.done) result = gen.next();
console.log('Total tokens:', result.value);encodeChatGenerator(messages, options?): Generator<number[], number, undefined>
Encode chat messages yielding token chunks per message component.
- Yields:
number[]— token IDs per component (special tokens, role, content chunks, etc.) - Returns:
number— total token count
import { encodeChatGenerator } from 'ai-token-estimator';
const messages = [
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'Hello!' }
];
for (const tokenChunk of encodeChatGenerator(messages, { model: 'gpt-4o' })) {
console.log('Chunk:', tokenChunk);
}decodeGenerator(tokens, options?): Generator<string, void, void>
Decode tokens yielding text chunks. Uses TextDecoder streaming mode — may yield empty strings when buffering incomplete UTF-8 sequences.
import { encode, decodeGenerator } from 'ai-token-estimator';
const tokens = encode('Hello, world!', { model: 'gpt-4o' });
for (const textChunk of decodeGenerator(tokens, { model: 'gpt-4o' })) {
process.stdout.write(textChunk);
}decodeAsyncGenerator(tokens, options?): AsyncGenerator<string, void, void>
Decode async token stream yielding text chunks. Accepts AsyncIterable<number | number[]> for flexibility with streaming APIs.
import { decodeAsyncGenerator } from 'ai-token-estimator';
// Decode streaming LLM response
async function decodeLLMStream(tokenStream: AsyncIterable<number>) {
for await (const text of decodeAsyncGenerator(tokenStream, { model: 'gpt-4o' })) {
process.stdout.write(text);
}
}getModelConfig(model: string): ModelConfig
Returns the configuration for a specific model. Throws if the model is not found.
interface ModelConfig {
charsPerToken: number; // Characters per token ratio
inputCostPerMillion: number; // USD per 1M input tokens
outputCostPerMillion?: number; // USD per 1M output tokens (when available)
cachedInputCostPerMillion?: number; // USD per 1M cached input tokens (OpenAI)
batchInputCostPerMillion?: number; // USD per 1M batch input tokens (OpenAI)
batchOutputCostPerMillion?: number; // USD per 1M batch output tokens (OpenAI)
}DEFAULT_MODELS
Read-only object containing all model configurations. Frozen to prevent runtime mutation.
Cost Estimation API
estimateCost(options): CostEstimate
Calculate cost from explicit token counts. Provides detailed cost breakdown for input, output, cached, and batch pricing.
import { estimateCost } from 'ai-token-estimator';
const result = estimateCost({
model: 'gpt-4o',
inputTokens: 1_000_000,
outputTokens: 500_000,
cachedInputTokens: 200_000, // optional
mode: 'standard', // or 'batch'
});
console.log(result);
// {
// model: 'gpt-4o',
// mode: 'standard',
// tokens: { input: 1000000, cachedInput: 200000, nonCachedInput: 800000, output: 500000 },
// costs: { input: 2.0, cachedInput: 0.25, output: 5.0, total: 7.25 },
// rates: { inputPerMillion: 2.5, outputPerMillion: 10.0, cachedInputPerMillion: 1.25, ... }
// }Throws if:
- Model is unknown
- Token counts are negative or non-integer
cachedInputTokens > inputTokens- Required pricing is missing (output/cached/batch)
mode: 'batch'withcachedInputTokens > 0
estimateCostFromText(options): CostEstimate
Sync version that counts input tokens from text. Uses heuristic/exact tokenization based on model.
import { estimateCostFromText } from 'ai-token-estimator';
const result = estimateCostFromText({
model: 'gpt-4o',
inputText: 'Hello, world!',
outputText: 'Hi there!', // optional: auto-count output tokens
outputTokens: 100, // or: explicit output count (takes precedence)
cachedInputTokens: 0,
mode: 'standard',
});estimateCostFromTextAsync(options): Promise<CostEstimate>
Async version that supports provider-backed tokenizers for accurate counts.
import { estimateCostFromTextAsync } from 'ai-token-estimator';
const result = await estimateCostFromTextAsync({
model: 'claude-sonnet-4',
inputText: 'Hello, world!',
outputText: 'Hi there!',
tokenizer: 'anthropic_count_tokens',
anthropic: { apiKey: process.env.ANTHROPIC_API_KEY },
});getTotalCost(model, inputTokens, outputTokens?): number
Quick helper to get total cost for a model.
import { getTotalCost } from 'ai-token-estimator';
const cost = getTotalCost('gpt-4o', 1_000_000, 500_000);
// 7.5 (USD)SentencePiece API
loadSentencePieceTokenizer(options: FileOptions): Promise<SentencePieceTokenizer>
Loads a SentencePiece tokenizer from a file path (Node.js async API).
interface FileOptions {
modelPath: string; // Path to .model or tokenizer.json file
format?: 'protobuf' | 'json'; // Auto-detected from extension if omitted
}
interface SentencePieceTokenizer {
encode(text: string): number[]; // Encode text to token IDs
decode(tokens: number[]): string; // Decode token IDs to text
readonly vocabSize: number; // Vocabulary size
readonly algorithm: 'bpe' | 'unigram'; // Tokenization algorithm
}getSentencePieceTokenizer(options: DataOptions): SentencePieceTokenizer
Creates a tokenizer from in-memory model data (sync API, browser-compatible).
interface DataOptions {
modelData: Uint8Array | ArrayBuffer; // Model file bytes
format?: 'protobuf' | 'json'; // Auto-detected if omitted
}ensureSentencePieceModel(options: DownloadOptions): Promise<string>
Downloads a known tokenizer model from HuggingFace and returns the local path.
type KnownTokenizer = keyof typeof MODEL_REGISTRY; // e.g. 't5-base', 'albert-base-v2', 'xlnet-base-cased', ...
interface DownloadOptions {
tokenizer: KnownTokenizer;
cacheDir?: string; // Cache directory (default: ~/.cache/sentencepiece or SENTENCEPIECE_MODEL_CACHE_DIR)
allowDownload?: boolean; // Default: false (no surprise network calls)
verifyHash?: boolean; // Default: true (when registry hash is present)
authToken?: string; // HuggingFace auth token (or HF_TOKEN / HUGGINGFACE_HUB_TOKEN env vars)
customUrl?: string; // Optional mirror/override URL (hash still verified)
}encodeSentencePiece(text: string, options: DataOptions): number[]
Encode text to tokens (sync, from in-memory model data).
decodeSentencePiece(tokens: number[], options: DataOptions): string
Decode tokens to text (sync, from in-memory model data).
countSentencePieceTokens(text: string, options: DataOptions): number
Count tokens in text (sync, from in-memory model data).
encodeSentencePieceAsync(text: string, options: FileOptions): Promise<number[]>
Encode text to tokens (async, from file path).
decodeSentencePieceAsync(tokens: number[], options: FileOptions): Promise<string>
Decode tokens to text (async, from file path).
countSentencePieceTokensAsync(text: string, options: FileOptions): Promise<number>
Count tokens in text (async, from file path).
parseModelProto(buffer: Uint8Array): ModelProto
Low-level: Parse a SentencePiece .model file (protobuf format).
interface ModelProto {
pieces: SentencePiece[]; // Vocabulary pieces
trainerSpec?: TrainerSpec; // Training configuration
normalizerSpec?: NormalizerSpec; // Normalization settings
}
interface SentencePiece {
piece: string; // Token string
score: number; // Log probability score
type: SentencePieceType; // NORMAL, UNKNOWN, CONTROL, etc.
}clearModelCache(): void
Clears the internal model parsing cache (useful for memory management).
Rounding Options
By default, token counts are rounded up (ceil) for conservative budgeting. You can override this:
// Round up (default) - conservative for budgeting
estimate({ text, model: 'gpt-4o', rounding: 'ceil' });
// Round down - optimistic estimate
estimate({ text, model: 'gpt-4o', rounding: 'floor' });
// Round to nearest - balanced estimate
estimate({ text, model: 'gpt-4o', rounding: 'round' });Character Counting
This package counts Unicode code points, not UTF-16 code units. This means:
- Emojis count as 1 character (not 2)
- Accented characters count correctly
- Most source code characters count as 1
Benchmarks (repo only)
This repository includes a small benchmark script to compare heuristic vs exact OpenAI tokenization:
npm run benchmark:tokenizerSupported Models
Auto-updated weekly via GitHub Actions from provider pricing pages.
OpenAI Models
| Model | Chars/Token | Input Cost (per 1M tokens) | |-------|-------------|---------------------------| | babbage-002 | 4 | $0.40 | | chatgpt-4o-latest | 4 | $5.00 | | chatgpt-image-latest | 4 | $5.00 | | codex-mini-latest | 4 | $1.50 | | computer-use-preview | 4 | $3.00 | | davinci-002 | 4 | $2.00 | | gpt-3.5-0301 | 4 | $1.50 | | gpt-3.5-turbo | 4 | $0.50 | | gpt-3.5-turbo-0125 | 4 | $0.50 | | gpt-3.5-turbo-0613 | 4 | $1.50 | | gpt-3.5-turbo-1106 | 4 | $1.00 | | gpt-3.5-turbo-16k-0613 | 4 | $3.00 | | gpt-3.5-turbo-instruct | 4 | $1.50 | | gpt-4-0125-preview | 4 | $10.00 | | gpt-4-0314 | 4 | $30.00 | | gpt-4-0613 | 4 | $30.00 | | gpt-4-1106-preview | 4 | $10.00 | | gpt-4-1106-vision-preview | 4 | $10.00 | | gpt-4-32k | 4 | $60.00 | | gpt-4-turbo-2024-04-09 | 4 | $10.00 | | gpt-4.1 | 4 | $2.00 | | gpt-4.1-mini | 4 | $0.40 | | gpt-4.1-nano | 4 | $0.10 | | gpt-4o | 4 | $2.50 | | gpt-4o-2024-05-13 | 4 | $5.00 | | gpt-4o-audio-preview | 4 | $2.50 | | gpt-4o-mini | 4 | $0.15 | | gpt-4o-mini-audio-preview | 4 | $0.15 | | gpt-4o-mini-realtime-preview | 4 | $0.60 | | gpt-4o-mini-search-preview | 4 | $0.15 | | gpt-4o-realtime-preview | 4 | $5.00 | | gpt-4o-search-preview | 4 | $2.50 | | gpt-5 | 4 | $1.25 | | gpt-5-chat-latest | 4 | $1.25 | | gpt-5-codex | 4 | $1.25 | | gpt-5-mini | 4 | $0.25 | | gpt-5-nano | 4 | $0.05 | | gpt-5-pro | 4 | $15.00 | | gpt-5-search-api | 4 | $1.25 | | gpt-5.1 | 4 | $1.25 | | gpt-5.1-chat-latest | 4 | $1.25 | | gpt-5.1-codex | 4 | $1.25 | | gpt-5.1-codex-max | 4 | $1.25 | | gpt-5.1-codex-mini | 4 | $0.25 | | gpt-5.2 | 4 | $1.75 | | gpt-5.2-chat-latest | 4 | $1.75 | | gpt-5.2-codex | 4 | $1.75 | | gpt-5.2-pro | 4 | $21.00 | | gpt-audio | 4 | $2.50 | | gpt-audio-mini | 4 | $0.60 | | gpt-image-1 | 4 | $5.00 | | gpt-image-1-mini | 4 | $2.00 | | gpt-image-1.5 | 4 | $5.00 | | gpt-realtime | 4 | $4.00 | | gpt-realtime-mini | 4 | $0.60 | | o1 | 4 | $15.00 | | o1-mini | 4 | $1.10 | | o1-pro | 4 | $150.00 | | o3 | 4 | $2.00 | | o3-deep-research | 4 | $10.00 | | o3-mini | 4 | $1.10 | | o3-pro | 4 | $20.00 | | o4-mini | 4 | $1.10 | | o4-mini-deep-research | 4 | $2.00 |
Anthropic Claude Models
| Model | Chars/Token | Input Cost (per 1M tokens) | |-------|-------------|---------------------------| | claude-haiku-3 | 3.5 | $0.25 | | claude-haiku-3.5 | 3.5 | $0.80 | | claude-haiku-4.5 | 3.5 | $1.00 | | claude-opus-3 | 3.5 | $15.00 | | claude-opus-4 | 3.5 | $15.00 | | claude-opus-4.1 | 3.5 | $15.00 | | claude-opus-4.5 | 3.5 | $5.00 | | claude-sonnet-4 | 3.5 | $3.00 | | claude-sonnet-4.5 | 3.5 | $3.00 |
Google Gemini Models
| Model | Chars/Token | Input Cost (per 1M tokens) | |-------|-------------|---------------------------| | gemini-2.0-flash | 4 | $0.10 | | gemini-2.0-flash-lite | 4 | $0.08 | | gemini-2.5-computer-use-preview-10-2025 | 4 | $1.25 | | gemini-2.5-flash | 4 | $0.30 | | gemini-2.5-flash-lite | 4 | $0.10 | | gemini-2.5-flash-lite-preview-09-2025 | 4 | $0.10 | | gemini-2.5-flash-native-audio-preview-12-2025 | 4 | $0.50 | | gemini-2.5-flash-preview-09-2025 | 4 | $0.30 | | gemini-2.5-flash-preview-tts | 4 | $0.50 | | gemini-2.5-pro | 4 | $1.25 | | gemini-2.5-pro-preview-tts | 4 | $1.00 | | gemini-3-flash | 4 | $0.50 | | gemini-3-pro | 4 | $2.00 |
Last updated: 2026-01-19
Pricing Updates
Model pricing is automatically updated weekly via GitHub Actions. The update script fetches the latest prices directly from:
You can check when prices were last updated:
import { LAST_UPDATED } from 'ai-token-estimator';
console.log(LAST_UPDATED); // e.g. '2026-01-14'License
MIT
