npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

tokenx

v1.2.1

Published

Fast token estimation at 94% accuracy of a full tokenizer in a 2kB bundle

Readme

tokenx

npm version

Fast and lightweight token count estimation for any LLM without requiring a full tokenizer. This library provides quick approximations that are good enough for most use cases while keeping your bundle size minimal.

For advanced use cases requiring precise token counts, please use a full tokenizer like gpt-tokenizer.

Benchmarks

The following table shows the accuracy of the token count approximation for different input texts:

| Description | Actual GPT Token Count | Estimated Token Count | Token Count Deviation | | --- | --- | --- | --- | | Short English text | 10 | 11 | 10.00% | | German text with umlauts | 48 | 49 | 2.08% | | Metamorphosis by Franz Kafka (English) | 31796 | 35705 | 12.29% | | Die Verwandlung by Franz Kafka (German) | 35309 | 35069 | 0.68% | | 道德經 by Laozi (Chinese) | 11712 | 12059 | 2.96% | | TypeScript ES5 Type Declarations (~ 4000 loc) | 49293 | 52340 | 6.18% |

Features

  • 94% accuracy compared to full tokenizers (see benchmarks below)
  • 📦 Just 2kB bundle size with zero dependencies
  • 🌍 Multi-language support with configurable language rules
  • 🗣️ Built-in support for accented characters (German, French, Spanish, etc.)
  • 🔧 Configurable and extensible

Installation

Run the following command to add tokenx to your project.

# npm
npm install tokenx

# pnpm
pnpm add tokenx

# yarn
yarn add tokenx

Usage

import { estimateTokenCount, isWithinTokenLimit, splitByTokens } from 'tokenx'

const text = 'Your text goes here.'

// Estimate the number of tokens in the text
const estimatedTokens = estimateTokenCount(text)
console.log(`Estimated token count: ${estimatedTokens}`)

// Check if text is within a specific token limit
const tokenLimit = 1024
const withinLimit = isWithinTokenLimit(text, tokenLimit)
console.log(`Is within token limit: ${withinLimit}`)

// Split text into token-based chunks
const chunks = splitByTokens(text, 100)
console.log(`Split into ${chunks.length} chunks`)

// Use custom options for different languages or models
const customOptions = {
  defaultCharsPerToken: 4, // More conservative estimation
  languageConfigs: [
    { pattern: /[你我他]/g, averageCharsPerToken: 1.5 }, // Custom Chinese rule
  ]
}

const customEstimate = estimateTokenCount(text, customOptions)
console.log(`Custom estimate: ${customEstimate}`)

API

estimateTokenCount

Estimates the number of tokens in a given input string using heuristic rules that work across multiple languages and text types.

Usage:

const estimatedTokens = estimateTokenCount('Hello, world!')

// With custom options
const customEstimate = estimateTokenCount('Bonjour le monde!', {
  defaultCharsPerToken: 4,
  languageConfigs: [
    { pattern: /[éèêëàâîï]/i, averageCharsPerToken: 3 }
  ]
})

Type Declaration:

function estimateTokenCount(
  text?: string,
  options?: TokenEstimationOptions
): number

interface TokenEstimationOptions {
  /** Default average characters per token when no language-specific rule applies */
  defaultCharsPerToken?: number
  /** Custom language configurations to override defaults */
  languageConfigs?: LanguageConfig[]
}

interface LanguageConfig {
  /** Regular expression to detect the language */
  pattern: RegExp
  /** Average number of characters per token for this language */
  averageCharsPerToken: number
}

isWithinTokenLimit

Checks if the estimated token count of the input is within a specified token limit.

Usage:

const withinLimit = isWithinTokenLimit('Check this text against a limit', 100)
// With custom options
const customCheck = isWithinTokenLimit('Text', 50, { defaultCharsPerToken: 3 })

Type Declaration:

function isWithinTokenLimit(
  text: string,
  tokenLimit: number,
  options?: TokenEstimationOptions
): boolean

sliceByTokens

Extracts a portion of text based on token positions, similar to Array.prototype.slice(). Supports both positive and negative indices.

Usage:

const text = 'Hello, world! This is a test sentence.'

const firstThree = sliceByTokens(text, 0, 3)
const fromSecond = sliceByTokens(text, 2)
const lastTwo = sliceByTokens(text, -2)
const middle = sliceByTokens(text, 1, -1)

// With custom options
const customSlice = sliceByTokens(text, 0, 5, {
  defaultCharsPerToken: 4,
  languageConfigs: [
    { pattern: /[éèêëàâîï]/i, averageCharsPerToken: 3 }
  ]
})

Type Declaration:

function sliceByTokens(
  text: string,
  start?: number,
  end?: number,
  options?: TokenEstimationOptions
): string

Parameters:

  • text - The input text to slice
  • start - The start token index (inclusive). If negative, treated as offset from end. Default: 0
  • end - The end token index (exclusive). If negative, treated as offset from end. If omitted, slices to the end
  • options - Token estimation options (same as estimateTokenCount)

Returns:

The sliced text portion corresponding to the specified token range.

splitByTokens

Splits text into chunks based on token count. Useful for chunking documents for RAG, batch processing, or staying within context windows.

Usage:

const text = 'Long text that needs to be split into smaller chunks...'

// Basic splitting
const chunks = splitByTokens(text, 100)
console.log(`Split into ${chunks.length} chunks`)

// With overlap for semantic continuity
const overlappedChunks = splitByTokens(text, 100, { overlap: 10 })

// With custom options
const customChunks = splitByTokens(text, 50, {
  defaultCharsPerToken: 4,
  overlap: 5
})

Type Declaration:

interface SplitByTokensOptions extends TokenEstimationOptions {
  /** Number of tokens to overlap between consecutive chunks (default: 0) */
  overlap?: number
}

function splitByTokens(
  text: string,
  tokensPerChunk: number,
  options?: SplitByTokensOptions
): string[]

Parameters:

  • text - The input text to split
  • tokensPerChunk - Maximum number of tokens per chunk
  • options - Token estimation options with optional overlap

Returns:

An array of text chunks, each containing approximately tokensPerChunk tokens.

License

MIT License © 2023-PRESENT Johann Schopplich