npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

modelpricing-ai

v2026.4.3

Published

TypeScript client for ModelPricing.ai — estimate LLM usage costs

Readme

modelpricing-ai

licensie

TypeScript client for the ModelPricing.ai API — estimate LLM usage costs and track spending with a single call.

Zero runtime dependencies. Uses native fetch — works in Node.js 18+, Deno, Bun, Cloudflare Workers, and browsers.

Installation

npm install modelpricing-ai

Quick Start

import { ModelPricingClient } from 'modelpricing-ai'

const client = new ModelPricingClient({ apiKey: 'YOUR_API_KEY' })

const estimate = await client.estimate({
    model: 'gpt-4o-mini',
    tokensIn: 1000,
    tokensOut: 500,
    traceId: { requestId: 'abc-123' }
})

console.log(`Cost: $${estimate.total.toFixed(6)}`)

From a provider SDK response

If you already have a response object from the Anthropic or OpenAI SDK, pass it straight to estimateFromResponse — the client pulls the model name, token counts, and any cache-token fields for you. Works with Anthropic Messages, OpenAI Chat Completions, and the OpenAI Responses API. SDK class instances are honored via their .toJSON() method, and plain objects with the same shape work too.

import Anthropic from '@anthropic-ai/sdk'
import { ModelPricingClient } from 'modelpricing-ai'

const anthropic = new Anthropic()
const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-6',
    max_tokens: 1024,
    messages: [{ role: 'user', content: 'hello' }]
})

const client = new ModelPricingClient({ apiKey: 'YOUR_API_KEY' })
const estimate = await client.estimateFromResponse(response)
console.log(`Cost: $${estimate.total.toFixed(6)}`)
import OpenAI from 'openai'
import { ModelPricingClient } from 'modelpricing-ai'

const openai = new OpenAI()
const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [{ role: 'user', content: 'hello' }]
})

const client = new ModelPricingClient({ apiKey: 'YOUR_API_KEY' })
const estimate = await client.estimateFromResponse(response, {
    traceId: { requestId: 'abc-123' }
})

estimateFromResponse accepts the same traceId option as estimate() and returns the same EstimateResponse shape.

Cache tokens

estimate() accepts three optional fields that map 1:1 to the cache rates the API tracks. They are additivetokensIn is fresh non-cached input, and the cache fields stack on top:

await client.estimate({
    model: 'claude-sonnet-4-6',
    tokensIn: 1000, // fresh input
    tokensOut: 500,
    cacheReadTokens: 50_000, // tokens billed at the cache-read rate
    cacheWrite5mTokens: 10_000, // 5-minute TTL writes (Anthropic)
    cacheWrite1hTokens: 2_000 // 1-hour TTL writes (Anthropic)
})

Rules:

  • All three fields are optional. Zero or undefined is treated as absent — the client only sends metrics.cache when at least one field is non-zero.
  • Negative, non-finite, and non-integer values throw before the request is sent (no silent truncation that could miscount billed tokens).
  • Models without cache pricing for a given field will drop it server-side and log a "Cache tokens dropped" entry. The estimate still succeeds.

When you use estimateFromResponse, all three fields are populated automatically from the SDK response — you never have to assemble them by hand.

Provider response shape reference

The extractor reads these fields from each supported response shape. Knowing the shapes is useful when constructing test fixtures or debugging unexpected costs.

| Provider / Endpoint | Detected by usage keys | tokensIn | tokensOut | cacheRead | cacheWrite5m / cacheWrite1h | | ------------------------------------ | ------------------------------------ | -------------------------------------------------------- | ------------------- | ------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Anthropic Messages | input_tokens, output_tokens | input_tokens (already excludes cache reads and writes) | output_tokens | cache_read_input_tokens (additive — reported separately from input_tokens) | Nested cache_creation.ephemeral_5m_input_tokens / ephemeral_1h_input_tokens. Falls back to the aggregate cache_creation_input_tokens (treated as 5m) when the nested object is absent. | | OpenAI Responses API | input_tokens, output_tokens | input_tokenscached_tokens | output_tokens | input_tokens_details.cached_tokens (subset of input_tokens, subtracted to keep additive semantics) | Not surfaced by this shape — left at 0. | | OpenAI Chat Completions (modern) | prompt_tokens, completion_tokens | prompt_tokenscached_tokens | completion_tokens | prompt_tokens_details.cached_tokens (subset of prompt_tokens, subtracted) | Not surfaced — left at 0. | | OpenAI Chat Completions (legacy) | prompt_tokens, completion_tokens | prompt_tokenscached_tokens | completion_tokens | Top-level cached_tokens (older SDKs that didn't nest under prompt_tokens_details) | Not surfaced — left at 0. |

Concrete examples of each shape (these are the inputs extractUsage understands — a real SDK response has more fields, all ignored):

// Anthropic Messages — current shape with TTL split
{
    model: 'claude-sonnet-4-6',
    usage: {
        input_tokens: 100,
        output_tokens: 50,
        cache_read_input_tokens: 200,
        cache_creation_input_tokens: 700,
        cache_creation: {
            ephemeral_5m_input_tokens: 500,
            ephemeral_1h_input_tokens: 200
        }
    }
}
// → tokensIn=100, tokensOut=50, cacheRead=200, cacheWrite5m=500, cacheWrite1h=200

// OpenAI Responses API
{
    model: 'gpt-5',
    usage: {
        input_tokens: 1024,
        output_tokens: 200,
        input_tokens_details: { cached_tokens: 256 }
    }
}
// → tokensIn=768, tokensOut=200, cacheRead=256

// OpenAI Chat Completions
{
    model: 'gpt-5',
    usage: {
        prompt_tokens: 1024,
        completion_tokens: 200,
        prompt_tokens_details: { cached_tokens: 256 }
    }
}
// → tokensIn=768, tokensOut=200, cacheRead=256

TTL note for Anthropic: The SDK doesn't tag individual cache-write tokens with a TTL — Anthropic returns a nested cache_creation object that pre-splits writes into 5m vs 1h buckets. If you write to a 1h cache directly via estimate() and bypass estimateFromResponse, pass the count under cacheWrite1hTokens yourself.

Anthropic vs OpenAI Responses (same top-level keys, different cache semantics): Both shapes use input_tokens / output_tokens, but their cache surfaces don't overlap — the extractor reads both safely. Anthropic's cache_read_input_tokens is additive (already excluded from input_tokens), so it's added to cacheRead as-is. OpenAI's input_tokens_details.cached_tokens is a subset of input_tokens, so it's subtracted from tokensIn and reported under cacheRead. Whichever provider sent the response, the other's fields are absent → 0, so the math works out either way.

Standalone extractor

If you want to inspect what would be sent without making a request, import extractUsage directly:

import { extractUsage } from 'modelpricing-ai'

const usage = extractUsage(response)
// {
//   model: 'claude-sonnet-4-6',
//   tokensIn: 100,
//   tokensOut: 50,
//   cacheRead: 200,
//   cacheWrite5m: 500,
//   cacheWrite1h: 200,
// }

It throws if the response shape is unrecognized or model is missing.

Response Structure

Both estimate() and estimateFromResponse() return an EstimateResponse:

interface EstimateResponse {
    total: number // total USD cost
    model: string // canonical model name returned by the server
    traceId: Record<string, unknown> | null // pass-through trace data
    trace: string // server-assigned trace identifier
    breakdown: EstimateBreakdownGroup
}

interface EstimateBreakdownGroup {
    input: EstimateBreakdown
    output: EstimateBreakdown
    cache?: EstimateCacheBreakdownGroup // present only when cache tokens were billed
}

interface EstimateCacheBreakdownGroup {
    read?: EstimateBreakdown
    write5m?: EstimateBreakdown
    write1h?: EstimateBreakdown
}

interface EstimateBreakdown {
    unit: string // e.g. "per-1M-input", "per-1M-cache-read"
    branch: string // pricing tier that matched ("flat", "low", "high")
    qty: number // tokens
    rate: number // per-million USD rate
    subtotal: number // computed cost for this line
}

A model that wasn't sent any cache tokens will omit breakdown.cache entirely. A model with no cache pricing for a given field (e.g. OpenAI for write5m) will silently drop it and you'll see only the priced fields.

Configuration

| Parameter | Default | Description | | ------------ | ------------------------------- | ------------------------------------------------------------------------ | | apiKey | required | Your ModelPricing.ai API key (also reads MODELPRICING_API_KEY env var) | | baseUrl | "https://api.modelpricing.ai" | API base URL (also reads MODELPRICING_BASE_URL env var) | | timeout | 30000 | Request timeout in milliseconds | | maxRetries | 3 | Maximum retry attempts for transient errors | | fetch | globalThis.fetch | Optional custom fetch function for testing or custom HTTP |

Parameters are resolved in order: constructor option > environment variable > default.

const client = new ModelPricingClient({
    apiKey: 'YOUR_API_KEY',
    baseUrl: 'https://api.modelpricing.ai',
    timeout: 30000,
    maxRetries: 3
})

Error Handling

The client raises typed exceptions for different failure modes:

| Exception | HTTP Status | When | | ----------------- | ----------- | ----------------------------- | | Unauthorized | 401 | Invalid or missing API key | | ValidationError | 422 | Invalid model name or metrics | | NotFound | 404 | Unknown endpoint | | ServerError | 5xx | Server-side failures |

All exceptions inherit from ModelPricingError and include a statusCode property.

import { Unauthorized, ValidationError, ServerError } from 'modelpricing-ai'

try {
    const estimate = await client.estimate({
        model: 'gpt-4o-mini',
        tokensIn: 1000,
        tokensOut: 500
    })
} catch (error) {
    if (error instanceof Unauthorized) {
        console.log('Check your API key')
    } else if (error instanceof ValidationError) {
        console.log(`Bad request: ${error.message}`)
    } else if (error instanceof ServerError) {
        console.log('Server error — will be retried automatically')
    }
}

estimateFromResponse additionally throws a plain Error if the response shape is unrecognized or model is missing — those cases never make it to the network.

Retry Behavior

The client automatically retries on transient errors with exponential backoff:

  • Retries: 5xx server errors and network errors (TypeError from fetch)
  • No retry: 4xx client errors (401, 404, 422)
  • Default: 3 retries with exponential backoff + jitter
// Increase retries for unreliable networks
const client = new ModelPricingClient({ apiKey: 'YOUR_API_KEY', maxRetries: 5 })

// Disable retries (no retry attempts)
const client = new ModelPricingClient({ apiKey: 'YOUR_API_KEY', maxRetries: 0 })

License

MIT