npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

llm-token-compare

v0.1.2

Published

Count tokens for the same text across GPT-4o, Claude, Gemini, and Llama tokenizers, side-by-side — with cost translation and visualization.

Readme

llm-token-compare

Side-by-side token counts and cost across GPT-4o, Claude, Gemini, and Llama. One API. No keys. Same prompt — see what it costs everywhere.

npm types license bundle


import { compare, findings } from 'llm-token-compare';

const r = await compare(
  'Summarize this article in three bullet points: ...',
  { callsPerMonth: 1_000_000 },
);

r.find((m) => m.model === 'gpt-4o')?.costAtRate;            // 217.50
r.find((m) => m.model === 'gemini-1.5-flash')?.costAtRate;  //  6.52

findings(r);
// → [
//   { level: 'good', text: 'Cheapest: gemini-1.5-flash — 40× less than claude-3.5-sonnet' },
//   { level: 'good', text: 'Most efficient: gpt-4o (4.22 chars/token)' },
//   { level: 'info', text: 'Token-count spread of 12% across models on this text' },
// ]

Runs locally. No API keys. Lazy-loaded tokenizers per family — only the ones you ask about are touched. Tree-shakable subpath imports for every family.

When you'd use this

llm-token-compare answers the four questions every prompt engineer asks before picking a model:

  • What does this prompt cost across providers? A single call returns costPer1kCalls and projected monthly cost for every model.
  • Which tokenizer is most efficient for my content? Especially relevant for non-English text — Korean tokenization can swing 2× across models on the same string.
  • Will this fit in the context window? contextUsedPct for each model, and cheapestThatFits() for hard constraints.
  • Which model gives me the best price for the constraint I have? cheapestThatFits(text, { needContext: 128_000 }) returns the cheapest model that fits.

It is not a model router and won't predict generation cost — for that, plug the input/output token counts back into your billing model.

Install

npm i llm-token-compare

Node 18+. ESM and CJS, types included. Works in browsers via the same conditional exports.

Or run it directly without installing:

npx llm-token-compare "your prompt here"

Quick start

Library

import {
  compare,
  cheapestThatFits,
  findings,
  formatTable,
} from 'llm-token-compare';

const text = 'How do I sort a list in Python?';

const results = await compare(text, {
  models: ['gpt-4o', 'claude-3.5-sonnet', 'gemini-1.5-flash', 'llama-3.1'],
  callsPerMonth: 1_000_000,
});

console.log(formatTable(results));
console.log(findings(results, text));

const winner = await cheapestThatFits(text, { needContext: 128_000 });
console.log(`Cheapest fit: ${winner.model} (${winner.costPer1kCalls.toFixed(4)}/1k)`);

CLI

# Default: compare 4 models, print colored table + token visualization
llm-token-compare "your prompt here"

# Run on built-in showcase corpus (Python, Korean, JSON, prose)
llm-token-compare samples

# Find the cheapest model that fits a context constraint
llm-token-compare cheapest "your text" --context 128k

# Project monthly cost
llm-token-compare "your text" --calls-per-month 1m

# Markdown table — clipboard-friendly for tweets / PRs / Slack
llm-token-compare "your text" --share

# Standalone HTML report with token visualization
llm-token-compare "your text" --html report.html

# JSON for scripting
llm-token-compare "your text" --json

# Pipe stdin or read a file
cat prompt.txt | llm-token-compare
llm-token-compare -f prompt.txt

# Restrict / expand the model set
llm-token-compare "..." --models gpt-4o,claude-3.5-sonnet
llm-token-compare "..." --all

Models

| Model | Family | Encoding | Context window | Input $/M tokens | Accuracy | | --- | --- | --- | ---: | ---: | --- | | gpt-4o | OpenAI | o200k_base | 128k | $2.50 | exact | | gpt-4-turbo | OpenAI | cl100k_base | 128k | $10.00 | exact | | gpt-3.5-turbo | OpenAI | cl100k_base | 16k | $0.50 | exact | | claude-3.5-sonnet | Anthropic | claude-bpe | 200k | $3.00 | approx | | claude-3-opus | Anthropic | claude-bpe | 200k | $15.00 | approx | | claude-3-haiku | Anthropic | claude-bpe | 200k | $0.25 | approx | | gemini-1.5-pro | Google | sentencepiece-est | 2M | $1.25 | approx | | gemini-1.5-flash | Google | sentencepiece-est | 1M | $0.075 | approx | | llama-3.1 | Meta | tiktoken-llama3 | 128k | local ($0) | exact | | llama-3 | Meta | tiktoken-llama3 | 8k | local ($0) | exact |

The default 4-model preset (when models is omitted) is gpt-4o, claude-3.5-sonnet, gemini-1.5-flash, llama-3.1 — one per family, picked to be the most representative current generation. Use models: [...] for an explicit list, or --all on the CLI.

Accuracy: exact vs approximate

This is the part most token tools quietly fudge. The truth:

| Family | Library | What it gives you | | --- | --- | --- | | OpenAI | js-tiktoken (BPE) | Exact. Identical to OpenAI's reference tokenizer for cl100k_base and o200k_base. | | Meta (Llama 3 / 3.1) | llama3-tokenizer-js | Exact. Vocab baked into the package. BOS/EOS suppressed for fair comparison. | | Anthropic (Claude 3 / 3.5) | @anthropic-ai/tokenizer (Claude-2 BPE) | Approx. Anthropic publishes no local tokenizer for Claude 3+. The Claude-2 tokenizer is the best local approximation; expect ±10% drift on English, more on multilingual content. | | Google (Gemini 1.5) | cl100k_base proxy | Approx. Google publishes no local tokenizer at all. We use OpenAI's cl100k as a proxy; expect ±15% drift on English content and larger drift on multilingual text. |

Every output (table, JSON, markdown, HTML) tags each row with accuracy: 'exact' | 'approx'. The auto-generated findings will flag a non-ASCII text warning when calibration drift is most likely.

If you need exact Claude / Gemini counts and you have API keys, that's planned for v0.2 (apiMode: 'anthropic' | 'google'). For v0.1 the design choice is fully local, no setup, transparent labels.

Cost: pricing data

Pricing lives in src/data/pricing.json with an as_of date, currency, and per-model input_per_1m / output_per_1m / context_window / optional self_hosted flag. The package exports the snapshot date so you can surface it in your own UI:

import { PRICING_AS_OF, PRICING_CURRENCY } from 'llm-token-compare';

console.log(`Pricing as of ${PRICING_AS_OF} (${PRICING_CURRENCY})`);

To check freshness:

npm run sync-pricing
# warns to stderr if the snapshot is more than 30 days old, exits 0

Pull requests against pricing.json are very welcome — see Contributing.

Output formats

A single compare() call feeds five formatters:

import {
  compare,
  formatTable,         // colored terminal table
  formatVisualization, // colored token-by-token strip (terminal)
  formatMarkdown,      // markdown table — clipboard / PRs / blogs
  formatHtml,          // standalone HTML report with token viz
  formatJson,          // JSON envelope { results, findings? }
} from 'llm-token-compare';

const results = await compare(text);

formatTable(results, { color: true });
await formatVisualization(text, 'gpt-4o', { color: true });
formatMarkdown(results);
await formatHtml(text, results);   // returns a full <!doctype html> string
formatJson({ results });

The HTML output is a single self-contained file (inline CSS, no scripts, no external assets) suitable for embedding in blog posts or sending in Slack.

Subpath imports

If you only need one family, import its tokenizer directly. These are synchronous (the lazy-loading registry is bypassed) and tree-shakable:

import { count, tokenize } from 'llm-token-compare/openai';
import { count as llamaCount } from 'llm-token-compare/meta';
import { count as claudeCount } from 'llm-token-compare/anthropic';
import { count as geminiCount } from 'llm-token-compare/google';

count('hello world', 'gpt-4o');             // 2
llamaCount('hello world', 'llama-3.1');     // 2
claudeCount('hello world', 'claude-3-haiku');
geminiCount('hello world', 'gemini-1.5-flash');

Each subpath only pulls in its own tokenizer dependency — useful when you're shipping to the browser and don't want all four.

API

function compare(text: string, opts?: CompareOptions): Promise<CompareResult[]>;
function count(text: string, model: ModelId): Promise<number>;
function tokenize(text: string, model: ModelId): Promise<TokenizeResult>;
function cost(text: string, model: ModelId, opts?: { calls?: number }): Promise<CostResult>;
function cheapestThatFits(text: string, opts?: CheapestOptions): Promise<CheapestResult>;
function findings(results: CompareResult[], sourceText?: string): Finding[];

interface CompareOptions {
  models?: ModelId[];          // default: 4-model preset
  callsPerMonth?: number;      // populates costAtRate per model
}

interface CompareResult {
  model: ModelId;
  family: 'openai' | 'anthropic' | 'google' | 'meta';
  encoding: string;
  tokens: number;
  characters: number;
  charsPerToken: number;
  contextWindow: number;
  contextUsedPct: number;
  costPer1kCalls: number;      // USD
  costAtRate?: number;         // present iff callsPerMonth was provided
  accuracy: 'exact' | 'approx';
  selfHosted: boolean;         // true for llama-* (input cost = 0)
}

interface TokenizeResult {
  ids: number[];
  pieces: string[];            // decoded per-token strings (visualization-ready)
}

interface CheapestOptions {
  needContext?: number;        // require contextWindow >= n
  mustBeExact?: boolean;       // restrict to exact tokenizers (openai, meta)
  callsPerMonth?: number;
}

interface CheapestResult {
  model: ModelId;
  costPer1kCalls: number;
  contextUsedPct: number;
  savingsVs?: { model: ModelId; costPer1kCalls: number; pct: number };
}

interface Finding {
  level: 'good' | 'info' | 'warn';
  text: string;
}

type ModelId =
  | 'gpt-4o' | 'gpt-4-turbo' | 'gpt-3.5-turbo'
  | 'claude-3.5-sonnet' | 'claude-3-opus' | 'claude-3-haiku'
  | 'gemini-1.5-pro' | 'gemini-1.5-flash'
  | 'llama-3.1' | 'llama-3';

Helpers:

import {
  ALL_MODELS,        // ModelId[] of every supported model
  DEFAULT_MODELS,    // the 4-model preset
  models,            // Record<ModelId, ModelInfo>
  getModel,          // (id) => ModelInfo, throws on unknown
  isModelId,         // type guard
  samples,           // { python, korean, json, prose } — built-in showcase texts
  costPer1kCalls,    // (tokens, $/M) => USD
  tokensToCost,      // (tokens, $/M) => USD per call
} from 'llm-token-compare';

CLI reference

llm-token-compare [text] [options]

  -f, --file <path>            Read input from a file
  -m, --models <list>          Comma-separated model ids (default: 4-model preset)
      --all                    Compare all 10 supported models
      --calls-per-month <n>    Project monthly cost at this volume (accepts 1k, 1m, 1b)
      --no-viz                 Skip the token visualization panel
      --no-color               Disable colored output
      --share                  Print a markdown table (clipboard-friendly)
      --json                   Emit JSON: { results, findings }
      --html <path>            Write a standalone HTML report

llm-token-compare samples [options]
  -m, --models <list>          Same flags as above
      --all
      --no-color

llm-token-compare cheapest <text> [options]
  -c, --context <n>            Required context window (e.g. 128k, 1m)
      --exact                  Restrict to families with exact local tokenizers

llm-token-compare --version
llm-token-compare --help

Benchmarks

Run npm run bench to reproduce locally. Benched on Apple M-series, Node 20.

Compare across 4 default models (initial tokenizer load amortized):

| Sample | Chars | ms / call | | --- | ---: | ---: | | python (a quicksort snippet) | 350 | ~25 ms | | korean (paragraph of CJK) | 171 | ~26 ms | | json (nested user object) | 437 | ~25 ms | | prose (Gettysburg, opening) | 309 | ~26 ms |

Compare across all 10 models, single prose call: ~75 ms.

Built-in showcase corpus (llm-token-compare samples) — same text, four models, four content types. Notable values:

| Sample | gpt-4o tokens | claude-3.5 tokens | gemini-1.5-flash tokens | llama-3.1 tokens | | --- | ---: | ---: | ---: | ---: | | python (350 chars) | 117 | 119 | 117 | 117 | | korean (171 chars) | 88 | 177 | 149 | 94 | | json (437 chars) | 155 | 161 | 155 | 155 | | prose (309 chars) | 62 | 63 | 62 | 62 |

Korean is the killer line: same input, 2× spread between Claude 3.5 and GPT-4o, with proportional cost impact. This is the kind of number you can only see by comparing.

Bundle size

| Entry | Code only | | --- | ---: | | llm-token-compare (full barrel, ESM) | 24 KB | | llm-token-compare/openai | 0.85 KB | | llm-token-compare/anthropic | 0.81 KB | | llm-token-compare/google | 0.80 KB | | llm-token-compare/meta | 0.66 KB | | CLI (minified) | 29 KB |

Tokenizer dependencies (js-tiktoken, llama3-tokenizer-js, @anthropic-ai/tokenizer) ship as runtime dependencies and load lazily — the full barrel only initializes the families it's asked about.

FAQ

Do I need API keys? No. Everything runs locally. v0.2 will add an opt-in API mode for exact Claude / Gemini counts; v0.1 is fully offline.

How wrong are the approx numbers? For English: ±10% on Anthropic, ±15% on Google, in our internal tests. For multilingual content the drift can be larger — findings() flags this when non-ASCII content exceeds 30%.

Why is gemini-1.5-flash so cheap in the comparison? Because it actually is. Google's pricing on Flash is roughly 1/30th of Claude 3.5 Sonnet's input rate. The package surfaces that, but the call you'd make on quality is yours — token-cost parity is not output-quality parity.

Why bundle several tokenizers — isn't this big? The lazy registry only initializes the families you actually compare against, so a compare(text, { models: ['gpt-4o'] }) call never touches Llama or Anthropic code. For browser bundles, use the subpath imports (/openai, /meta, etc.) to avoid pulling in tokenizers you don't need.

Does it work in browsers? Yes. The dual ESM/CJS build ships conditional exports; js-tiktoken and llama3-tokenizer-js are pure JS; the Anthropic tokenizer is WASM-backed and works in modern browsers. The formatHtml() output is a self-contained HTML file — perfect for embedding in static sites.

How do I count output (completion) tokens? v0.1 is input-only. For most prompt-engineering decisions the input dominates; for end-to-end cost you'll multiply your generation tokens by output_per_1m from pricing.json. Built-in support is on the v0.3 roadmap.

Why isn't gpt-4o-mini / claude-haiku-4 / gemini-2.5 here? v0.1 ships one entry per current generation across the four families — see the table in Models. Adding a new variant is one entry in pricing.json plus an entry in the encoding map; PRs welcome.

Are the token IDs / pieces accurate for approx families? The pieces returned by tokenize() for Anthropic are real — but from the Claude-2 vocab. For Google they reflect cl100k_base, which is a proxy. The visualization is a useful approximation, not a ground-truth tokenization.

Roadmap

  • v0.2 — opt-in API mode for exact Claude / Gemini counts via Anthropic's count_tokens and Google's countTokens endpoints. Same compare() shape, accuracy upgrades to 'exact' when keys are present.
  • v0.3 — output token estimation + completion-cost surfacing. Add outputCost, costPerCallEstimated columns and an --out-tokens CLI flag.
  • v0.4 — companion entry points: an MCP server (llm-token-compare/mcp) for Claude Desktop / Cursor, a GitHub Action that comments on PRs with token deltas across models for changed prompt files, and a Vercel-hosted demo page.

Contributing

The single biggest way to improve this package is to keep its data current.

  • src/data/pricing.json — input/output dollars per million tokens, context window, snapshot date. Add a new model, bump a price, refresh as_of after sweeping the values.
  • src/data/samples/index.ts — the four showcase texts. New languages or content types are welcome (rule of thumb: pick something whose tokenization differs visibly across providers).
  • test/fixtures/known-counts.json — hand-verified token counts driving the regression tests. Add new cases here when you want to lock down behavior.

Code-level contributions equally welcome — the codebase is small (under 1k LoC), strictly typed, and Biome-formatted. npm run lint && npm test before opening a PR.

Author

Cihangir Bozdogan[email protected]

License

MIT © 2026 — see LICENSE.