llm-token-estimator
v1.0.4
Published
Fast, offline token estimation for popular LLMs
Maintainers
Readme
llm-token-estimator
Offline token estimation for Large Language Models.
Installation
npm install llm-token-estimatorUsage
Basic usage
const { estimateTokens } = require("llm-token-estimator");
const result = estimateTokens(
"Explain transformers like I'm five.",
{ model: "gpt-4o" }
);
console.log(result);Output:
{
tokens: 9,
characters: 39,
model: "gpt-4o",
maxTokens: 128000,
vendor: "openai",
warning: null
}Language-aware estimation
For better accuracy with non-English content:
const result = estimateTokens(
"Bonjour, comment allez-vous?",
{
model: "gpt-4o",
language: "fr" // French
}
);
// Supported languages: en, es, fr, de, it, pt, ru, zh, ja, ko, ar, hi, codeUsing chat-style inputs (array of strings)
Useful when estimating prompts made of multiple messages:
estimateTokens(
[
"You are a helpful assistant.",
"Summarize the following text:",
articleText
],
{ model: "claude-3-sonnet" }
);Handling context limit warnings
const { warning } = estimateTokens(longPrompt, {
model: "gpt-4"
});
if (warning) {
console.warn(warning);
}Listing supported models
const { listModels } = require("llm-token-estimator");
console.log(listModels());Accuracy and Limitations
This library provides approximate token counts based on character-to-token ratios. While fast and dependency-free, it has limitations:
- ✅ Good for: Quick estimates, cost approximation, context limit checks
- ❌ Limitations: Language variations, content types, model-specific tokenization
For production applications requiring high accuracy, consider using:
tiktokenfor OpenAI models- Model-specific tokenizers for other providers
Supported Models
Includes 100+ models from major providers:
OpenAI: GPT-5.2, GPT-5, GPT-4.1, o3, o4-mini, GPT-OSS models, and more Anthropic: Claude 4 series (Opus 4.6, Sonnet 4.5, Haiku 4.5) Google: Gemini 3, Gemini 2.5 series Meta: LLaMA 3.x series Mistral: Large 3, Medium 3.1, Ministral 3 series Others: xAI Grok, Cohere, Alibaba Qwen, DeepSeek, Amazon Nova, and more
Use listModels() to see all supported models.
Default behavior
- Default model:
gpt-3.5-turbo - Default language:
en(English) - Input can be:
- a string
- an array of strings
- Output tokens are not included (input only)
Example use cases
- Pre-flight prompt validation
- CI checks for context overflows
- Prompt truncation logic
- Cost estimation (approximate)
- Multi-language content estimation
- Model comparison and selection
- Rate limiting based on token counts
API Reference
estimateTokens(input, options)
Parameters:
input(string | string[]): Text to estimate tokens foroptions(object):model(string): Model name (default: "gpt-3.5-turbo")language(string): Language code for better estimation (default: "en")
Returns: Object with tokens, characters, model, maxTokens, vendor, warning
listModels()
Returns: Array of all supported model names
Contributing
We welcome contributions! Feel free to:
- Add new models
- Improve estimation accuracy
- Add new languages
- Fix bugs or enhance documentation
