npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

tokenometer

v1.1.0

Published

Tokenometer CLI — LLM token cost + latency benchmarking across Claude, GPT-4o, Gemini, Mistral, and Cohere. Multi-format, empirical mode, vision tokens, SARIF output.

Downloads

1,480

Readme

tokenometer

npm tokenometer License: MIT

Empirical token-cost + latency benchmarking for LLM prompts. Tells you what your prompt actually costs and how fast each provider responds across Claude, GPT-4o, Gemini, Mistral, and Cohere — in every format.

See the root README for findings, methodology, and the full project overview.

Live playground: tokenometer.vercel.app · Source · MIT

npx tokenometer ./prompt.md --model claude-opus-4-7,gpt-4o
model            format    tokens  est. cost  tokenizer
---------------  --------  ------  ---------  --------------
claude-opus-4-7  json         ~78  $0.001170  cl100k_base
claude-opus-4-7  yaml         ~84  $0.001260  cl100k_base
gpt-4o           json          77  $0.000192  o200k_base
gpt-4o           yaml          83  $0.000208  o200k_base

Cheapest: gpt-4o as json ($0.000192)
Priciest: claude-opus-4-7 as yaml ($0.001260, 6.74x more)

A leading ~ marks an approximate count (offline mode for Claude / Gemini / Mistral-Tekken / Cohere, since none of those vendors publishes a public production tokenizer that ships in JS).

Flags

| Flag | Default | Notes | |---|---|---| | --model <id[,id…]> | claude-opus-4-7 (or auto-detected) | Any registered model id (63 across 5 providers). | | --format <fmt[,fmt…]> | json,yaml,xml,markdown,text | Subset of supported formats. | | --output <fmt> | table | table | json | sarif. | | --by-file | off | Append a per-file token/USD table (multi-file only). | | --image <path> | none | Add vision-token cost for the image (repeatable). | | --config <path> | none | Load this exact config file (skips walk-up). | | --no-config | off | Skip .tokenometer.yml loading entirely. | | --empirical | off | Use provider countTokens APIs (free, exact). | | --latency | off | Measure real generation latency (TTFT, total ms, tokens/sec). Implies --empirical. | | --latency-trials <n> | 3 | Trials per cell when --latency is set (1–10). | | --max-spend <usd> | 0.05 (or 0.25 with --latency) | Hard ceiling for empirical / latency mode. | | --offline | off | Force offline path (overrides --empirical). | | -h, --help | | Print help. | | -v, --version | | Print version. |

tokenometer <file> [options]
echo "prompt" | tokenometer - [options]

Models supported

63 models across 5 providers. Run tokenometer --help for the full list at runtime, or browse the Cost Atlas for sortable per-model pages.

| Provider | Examples | Offline tokenizer | Empirical | |---|---|---|---| | Anthropic | claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5, Claude 3.x family | gpt-tokenizer cl100k_base (approximate) | messages.countTokens (free, exact) | | OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo, o1 family | gpt-tokenizer o200k_base (exact) | same o200k_base (matches production) | | Google | gemini-2.5-pro, gemini-2.5-flash, gemini-1.5-pro, gemini-1.5-flash | chars / 4 (approximate) | model.countTokens (free, exact) | | Mistral (19 models) | open-mistral-7b, open-mixtral-8x22b, mistral-large-latest, codestral-latest, mistral-nemo, pixtral-large-latest, mistral-medium-2505, magistral-small, ministral-3b-latest, devstral-small-2505 | mistral-tokenizer-js for SentencePiece V1/V2/V3 (exact); chars/4 for Tekken (approximate) | unsupported (no public token-count API) | | Cohere | command-r, command-r-plus | chars / 4 (approximate) | POST /v1/tokenize (free, exact, requires COHERE_API_KEY) |

Pricing comes from the tokenlens registry with a small set of local overrides for bleeding-edge models. Cohere pricing lives entirely in LOCAL_OVERRIDES because @tokenlens/models doesn't yet ship a Cohere catalog at v1.3.0.

Empirical mode

For exact, vendor-billed counts on Claude, Gemini, and Cohere, set the right env var and pass --empirical. The tool calls each provider's free countTokens-equivalent endpoint — no charge.

ANTHROPIC_API_KEY=… GOOGLE_API_KEY=… COHERE_API_KEY=… \
  npx tokenometer ./prompt.md --empirical --model claude-opus-4-7,gemini-2.5-pro,command-r-plus

OpenAI's empirical path uses tiktoken o200k_base locally — that encoding matches OpenAI's production count exactly, so no API call is needed. Mistral has no public token-count endpoint; the offline mistral-tokenizer-js path is used regardless.

Auto provider detection

When --model is omitted, tokenometer picks a default based on which provider key is set in your environment:

  • ANTHROPIC_API_KEY only → claude-opus-4-7
  • OPENAI_API_KEY only → gpt-4o
  • GOOGLE_API_KEY / GEMINI_API_KEY only → first known gemini-* model (falls back to gemini-2.5-pro)
  • MISTRAL_API_KEY only → first known mistral-* model
  • COHERE_API_KEY only → command-r-plus
  • Multiple keys set → falls back to claude-opus-4-7 and prints a stderr note. Pass --model to disambiguate.
  • No keys set → existing default (claude-opus-4-7).

This means npx tokenometer prompt.md does the right thing in any of those environments without you having to remember model names.

.tokenometer.yml config

Drop a .tokenometer.yml (or .yaml) at the project root and tokenometer will pick it up automatically (walks up from the cwd, stopping at .git):

models: [claude-opus-4-7, gpt-4o, mistral-large-latest]
formats: [json, yaml, markdown]
paths: [prompts/**/*.md]
budgets:
  total: 0.50
  per-file: 0.10

User-passed CLI flags always win over config defaults. Use --config <path> to load an explicit file (skips the walk-up). Use --no-config to skip config loading entirely.

Output formats

The --output flag picks the display format (separate from --format, which controls how the prompt body is converted before tokenization):

  • --output table (default) — the human-readable per-cell table you've been seeing.
  • --output json — emits a TokenometerResult JSON shape: { files: [{ path, results: [...] }] }. One entry per input file. Pipe to jq for filtering.
  • --output sarif — emits SARIF 2.1.0 with one result per (file, model, format) cell. Drop the file into GitHub Code Scanning or any SARIF viewer.
npx tokenometer ./prompt.md --output sarif > tokenometer.sarif
npx tokenometer ./prompt.md --output json | jq '.files[].results | map(.inputCost) | add'

Latency

--latency measures real generation latency in addition to token cost. For each (model, format) cell, tokenometer streams n real chat completions (default n=3, override with --latency-trials 1..10) capped at max_tokens=200, and reports:

  • TTFT — time to first streamed token (ms)
  • Total — wall-clock from request start to stream end (ms)
  • tokens/secoutput_tokens / (total - ttft)

Numbers are reported as p50 / p95 / mean over the trials. Full per-trial data is included in --output json.

ANTHROPIC_API_KEY=… OPENAI_API_KEY=… \
  npx tokenometer ./prompt.md --latency --model claude-opus-4-7,gpt-4o

--latency implies --empirical (offline mode can't measure real latency). The default --max-spend ceiling is bumped from $0.05 to $0.25 to cover the n × 200-token generations; pass --max-spend explicitly to override.

Supported providers: Anthropic (messages.stream), OpenAI (/v1/chat/completions SSE), Google (generateContentStream), Cohere (/v1/chat NDJSON), Mistral (/v1/chat/completions SSE). Each trial retries once on transient failures.

Per-file attribution

--by-file appends a per-file token + USD summary table when you pass multiple input files (single-file inputs are a no-op):

By file:
  File              Tokens   USD
  ────────────────  ───────  ───────
  prompts/agent.md  1,243    $0.0186
  prompts/router.md   872    $0.0131

Useful for figuring out which prompt files dominate the cost of a multi-file pipeline. The aggregator that produces this table is also what powers the GitHub Action's per-file Δ comment, and is unit-tested in packages/action.

Vision tokens

Pass --image <path> (repeatable) to factor image-based vision tokens into the cost estimate alongside your prompt text:

npx tokenometer ./prompt.md --image ./screenshot.png --image ./diagram.jpg

Each image's dimensions are read with image-size (no native deps), then dispatched to the provider-specific vision-token estimator:

  • Claude → Anthropic's (width × height) / 750, capped at 1600 tokens.
  • GPT-4o → OpenAI's high-detail tiling: 85 + 170 × ceil(w/512) × ceil(h/512) after the 2048/768 resize step.
  • Gemini → Google's 258 × ceil(w/768) × ceil(h/768) (with a flat 258 for ≤384×384 images).

Mistral and Cohere don't have published vision-token formulas, so vision images are skipped for those providers (with a stderr note). Vision-token cells are always marked approximate: true since they're formula-derived. Each image also gets its own row in the --by-file table as a virtual file <image-path> [vision].

Why not just tiktoken?

tiktoken's cl100k_base (the encoding most "Claude tokenizer" libraries fall back on) under-counts Opus 4.7 by a median of +62% across a 10-prompt benchmark. Sonnet 4.6 and Haiku 4.5 are closer (~17%). Format choice is a wash. Model choice swings cost by 12×. See README for the dataset findings.

License

MIT