npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@emstack/token-tally

v1.0.0

Published

Scan a project, count LLM tokens, and estimate cost using live pricing.

Readme

token-tally

Scan a project, count LLM tokens, and estimate cost — before you ship the call.

npx token-tally . --model gpt-4o
┌─────────────────────────┬─────────────────┐
│ Metric                  │           Value │
├─────────────────────────┼─────────────────┤
│ Model                   │ gpt-4o (openai) │
│ Files scanned           │              42 │
│ Total input tokens      │          18,204 │
│ Estimated output tokens │               0 │
│ Input price / 1M tok    │         $2.5000 │
│ Output price / 1M tok   │        $10.0000 │
│ Input cost              │         $0.0455 │
│ Output cost             │       $0.000000 │
│ Total cost              │         $0.0455 │
└─────────────────────────┴─────────────────┘

Features

  • Live pricing — fetches the latest model prices from LiteLLM on every run, cached locally for 24 hours.
  • Provider-aware tokenizers — exact counts for OpenAI (js-tiktoken) and optionally for Anthropic and Gemini via their APIs. DeepSeek uses cl100k_base as a close approximation.
  • Interactive model picker — run without arguments to launch a wizard with ↑/↓ navigation, live fuzzy search across all available models, and prompts for every option.
  • .gitignore-aware scanner — skips ignored files by default; supports --include / --exclude globs across 40+ file extensions.
  • CI-friendly--json output, --budget exit code 2, --offline for hermetic builds.
  • Context window guard--warn-context flags when total tokens exceed the model's limit.

Install

Run on demand (no install required):

npx token-tally . --model gpt-4o

Install globally:

npm i -g @emstack/token-tally

Usage

Non-interactive

token-tally [path] --model <model> [options]

Interactive wizard

Run without a model to launch the interactive picker:

token-tally
# or during development:
bun run src/cli.ts

The wizard lists all available models with ↑/↓ navigation and live search, then prompts for every option — press Enter to accept the shown default.

Options

| Flag | Default | Description | |---|---|---| | [path] | . | Directory to scan. Defaults to the current working directory. | | -m, --model <name> | — | LLM model ID used for tokenization and pricing. e.g. gpt-4o, claude-3-5-sonnet-20241022. | | -i, --include <glob...> | all code files | Glob patterns for files to include. Multiple patterns are space-separated. | | -e, --exclude <glob...> | — | Glob patterns for files to skip. | | --no-gitignore | gitignore respected | Disables .gitignore filtering. | | --max-files <n> | unlimited | Caps the total number of files scanned. | | --output-tokens <n> | 20% of input | Estimated output tokens to include in the total cost calculation. See Output tokens below. | | --budget <usd> | — | Exit with code 2 if total cost exceeds this USD amount. | | --warn-context | false | Warn when total tokens exceed the model's max_input_tokens. | | --json | false | Emit machine-readable JSON instead of a table. | | -v, --verbose | false | Show a per-file token and cost breakdown. | | --refresh | false | Force re-fetch of the remote price table, bypassing the 24-hour cache. | | --offline | false | Use only the local cache or bundled static prices; never hit the network. | | --concurrency <n> | min(8, cpus) | Number of parallel file workers. | | --anthropic-api-key <key> | $ANTHROPIC_API_KEY | Use the Anthropic messages.count_tokens API for exact Claude 3+ counts. | | --gemini-api-key <key> | $GOOGLE_API_KEY | Use the Google countTokens API for exact Gemini counts. |

Output tokens

LLM APIs charge for both the tokens you send (input) and the tokens the model returns (output). Because token-tally scans your source files statically, it cannot know how long the model's response will be.

When --output-tokens is not set, token-tally uses 20% of the total input token count as a default estimate. This is a conservative heuristic based on the observation that typical LLM responses are 10–30% the size of the input context.

Override it whenever you know your expected response length:

| Scenario | Suggested value | |---|---| | Quick summary or classification | 500–1 000 | | Moderate answer with explanation | 2 000–4 000 | | Long code generation / detailed analysis | 8 000–16 000 | | Full context-window response | up to max_output_tokens of the model |

# use a fixed output token count
token-tally . --model claude-opus-4 --output-tokens 4000

# disable the output cost estimate entirely
token-tally . --model claude-opus-4 --output-tokens 0

Examples

Per-file breakdown:

token-tally src --model gpt-4o -v

CI cost gate (fail if total exceeds $0.05):

token-tally . --model gpt-4o --budget 0.05 --json

Warn if the project won't fit in a single context window:

token-tally . --model claude-3-5-sonnet-20241022 --warn-context

Force-refresh prices and stay offline after:

token-tally . --model gpt-4o --refresh
token-tally . --model gpt-4o --offline

Exact token counts for Claude 3+ via API:

ANTHROPIC_API_KEY=sk-... token-tally . --model claude-3-5-sonnet-20241022

GitHub Actions

- name: Check token cost
  run: npx token-tally . --model gpt-4o --budget 1.00 --json > tally.json

How it works

1 — Token counting

Each provider uses a different tokenization strategy. The tool picks the right one automatically based on the model name.

OpenAI — exact via js-tiktoken

The same BPE library OpenAI uses internally. The encoder is selected per model family:

| Model family | Encoder | |---|---| | gpt-4o, o1, o3, o4, gpt-4.1, gpt-5 | o200k_base | | gpt-4, gpt-3.5, older | cl100k_base |

Result matches the API token counter to the token.

Anthropic — approximate by default, exact with API key

Without a key, the legacy Claude 2 BPE tokenizer (@anthropic-ai/tokenizer) is used offline. It was accurate for Claude 2, but drifts ~5–10% on Claude 3+ because Anthropic updated their tokenizer.

# enable exact counting via the official API
ANTHROPIC_API_KEY=sk-... token-tally . --model claude-3-5-sonnet-20241022

Gemini — rough approximation by default, exact with API key

Google does not publish an offline tokenizer. The fallback formula is:

tokens ≈ ceil(characters / 4)

This holds reasonably for average English text (~4 chars/token) but can diverge by ±20–40% on code, non-Latin scripts, or very short strings.

# enable exact counting via the Generative Language API
GOOGLE_API_KEY=... token-tally . --model gemini-1.5-pro

DeepSeek — close approximation

Uses cl100k_base (GPT-4 family BPE). DeepSeek's tokenizer is derived from the same family and produces near-identical results in practice, but it is not identical — expect ~2–5% drift.

No API-based exact mode is available for DeepSeek.

A warning is printed in the output whenever counts are approximate.

2 — Pricing

On every run, token-tally fetches LiteLLM's community-maintained price table and caches it at ~/.cache/token-tally/prices.json for 24 hours.

  • --refresh forces a re-fetch.
  • --offline skips the network entirely, using the cache or the bundled static fallback.
  • If the network fetch fails, the stale cache is used with a warning.

Prices are taken directly from the input_cost_per_token and output_cost_per_token fields in the LiteLLM table — no rounding or transformation is applied.

3 — Cost formula

total = (input_tokens  × input_cost_per_token)
      + (output_tokens × output_cost_per_token)

Output tokens default to 0. Pass --output-tokens <n> to include an expected response length in the estimate.

Note: The formula does not account for system prompts billed separately, API call overhead, caching discounts, or streaming surcharges. Use it as a planning estimate, not a billing guarantee.

Accuracy summary

| Provider | Token accuracy | How to get exact counts | |---|---|---| | OpenAI | 100% — exact | Built-in, no key needed | | Anthropic | ~90–95% without key | Pass --anthropic-api-key | | Gemini | ~60–80% without key | Pass --gemini-api-key | | DeepSeek | ~95–98% | No exact mode available |

Pricing accuracy depends on LiteLLM's community table being up to date. Major models are typically current; niche or very new models may lag by a few days.

Development

bun install          # install dependencies
bun run dev          # run CLI locally
bun test             # run tests
bun run typecheck    # TypeScript check
bun run lint         # ESLint
bun run build        # build dist/cli.js

Community

License

MIT