modelpricing-ai
v2026.4.3
Published
TypeScript client for ModelPricing.ai — estimate LLM usage costs
Maintainers
Readme
modelpricing-ai
TypeScript client for the ModelPricing.ai API — estimate LLM usage costs and track spending with a single call.
Zero runtime dependencies. Uses native fetch — works in Node.js 18+, Deno, Bun, Cloudflare Workers, and browsers.
Installation
npm install modelpricing-aiQuick Start
import { ModelPricingClient } from 'modelpricing-ai'
const client = new ModelPricingClient({ apiKey: 'YOUR_API_KEY' })
const estimate = await client.estimate({
model: 'gpt-4o-mini',
tokensIn: 1000,
tokensOut: 500,
traceId: { requestId: 'abc-123' }
})
console.log(`Cost: $${estimate.total.toFixed(6)}`)From a provider SDK response
If you already have a response object from the Anthropic or OpenAI SDK, pass it straight to estimateFromResponse — the client pulls the model name, token counts, and any cache-token fields for you. Works with Anthropic Messages, OpenAI Chat Completions, and the OpenAI Responses API. SDK class instances are honored via their .toJSON() method, and plain objects with the same shape work too.
import Anthropic from '@anthropic-ai/sdk'
import { ModelPricingClient } from 'modelpricing-ai'
const anthropic = new Anthropic()
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
messages: [{ role: 'user', content: 'hello' }]
})
const client = new ModelPricingClient({ apiKey: 'YOUR_API_KEY' })
const estimate = await client.estimateFromResponse(response)
console.log(`Cost: $${estimate.total.toFixed(6)}`)import OpenAI from 'openai'
import { ModelPricingClient } from 'modelpricing-ai'
const openai = new OpenAI()
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'hello' }]
})
const client = new ModelPricingClient({ apiKey: 'YOUR_API_KEY' })
const estimate = await client.estimateFromResponse(response, {
traceId: { requestId: 'abc-123' }
})estimateFromResponse accepts the same traceId option as estimate() and returns the same EstimateResponse shape.
Cache tokens
estimate() accepts three optional fields that map 1:1 to the cache rates the API tracks. They are additive — tokensIn is fresh non-cached input, and the cache fields stack on top:
await client.estimate({
model: 'claude-sonnet-4-6',
tokensIn: 1000, // fresh input
tokensOut: 500,
cacheReadTokens: 50_000, // tokens billed at the cache-read rate
cacheWrite5mTokens: 10_000, // 5-minute TTL writes (Anthropic)
cacheWrite1hTokens: 2_000 // 1-hour TTL writes (Anthropic)
})Rules:
- All three fields are optional. Zero or
undefinedis treated as absent — the client only sendsmetrics.cachewhen at least one field is non-zero. - Negative, non-finite, and non-integer values throw before the request is sent (no silent truncation that could miscount billed tokens).
- Models without cache pricing for a given field will drop it server-side and log a "Cache tokens dropped" entry. The estimate still succeeds.
When you use estimateFromResponse, all three fields are populated automatically from the SDK response — you never have to assemble them by hand.
Provider response shape reference
The extractor reads these fields from each supported response shape. Knowing the shapes is useful when constructing test fixtures or debugging unexpected costs.
| Provider / Endpoint | Detected by usage keys | tokensIn | tokensOut | cacheRead | cacheWrite5m / cacheWrite1h |
| ------------------------------------ | ------------------------------------ | -------------------------------------------------------- | ------------------- | ------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Anthropic Messages | input_tokens, output_tokens | input_tokens (already excludes cache reads and writes) | output_tokens | cache_read_input_tokens (additive — reported separately from input_tokens) | Nested cache_creation.ephemeral_5m_input_tokens / ephemeral_1h_input_tokens. Falls back to the aggregate cache_creation_input_tokens (treated as 5m) when the nested object is absent. |
| OpenAI Responses API | input_tokens, output_tokens | input_tokens − cached_tokens | output_tokens | input_tokens_details.cached_tokens (subset of input_tokens, subtracted to keep additive semantics) | Not surfaced by this shape — left at 0. |
| OpenAI Chat Completions (modern) | prompt_tokens, completion_tokens | prompt_tokens − cached_tokens | completion_tokens | prompt_tokens_details.cached_tokens (subset of prompt_tokens, subtracted) | Not surfaced — left at 0. |
| OpenAI Chat Completions (legacy) | prompt_tokens, completion_tokens | prompt_tokens − cached_tokens | completion_tokens | Top-level cached_tokens (older SDKs that didn't nest under prompt_tokens_details) | Not surfaced — left at 0. |
Concrete examples of each shape (these are the inputs extractUsage understands — a real SDK response has more fields, all ignored):
// Anthropic Messages — current shape with TTL split
{
model: 'claude-sonnet-4-6',
usage: {
input_tokens: 100,
output_tokens: 50,
cache_read_input_tokens: 200,
cache_creation_input_tokens: 700,
cache_creation: {
ephemeral_5m_input_tokens: 500,
ephemeral_1h_input_tokens: 200
}
}
}
// → tokensIn=100, tokensOut=50, cacheRead=200, cacheWrite5m=500, cacheWrite1h=200
// OpenAI Responses API
{
model: 'gpt-5',
usage: {
input_tokens: 1024,
output_tokens: 200,
input_tokens_details: { cached_tokens: 256 }
}
}
// → tokensIn=768, tokensOut=200, cacheRead=256
// OpenAI Chat Completions
{
model: 'gpt-5',
usage: {
prompt_tokens: 1024,
completion_tokens: 200,
prompt_tokens_details: { cached_tokens: 256 }
}
}
// → tokensIn=768, tokensOut=200, cacheRead=256TTL note for Anthropic: The SDK doesn't tag individual cache-write tokens with a TTL — Anthropic returns a nested
cache_creationobject that pre-splits writes into 5m vs 1h buckets. If you write to a 1h cache directly viaestimate()and bypassestimateFromResponse, pass the count undercacheWrite1hTokensyourself.
Anthropic vs OpenAI Responses (same top-level keys, different cache semantics): Both shapes use
input_tokens/output_tokens, but their cache surfaces don't overlap — the extractor reads both safely. Anthropic'scache_read_input_tokensis additive (already excluded frominput_tokens), so it's added tocacheReadas-is. OpenAI'sinput_tokens_details.cached_tokensis a subset ofinput_tokens, so it's subtracted fromtokensInand reported undercacheRead. Whichever provider sent the response, the other's fields are absent → 0, so the math works out either way.
Standalone extractor
If you want to inspect what would be sent without making a request, import extractUsage directly:
import { extractUsage } from 'modelpricing-ai'
const usage = extractUsage(response)
// {
// model: 'claude-sonnet-4-6',
// tokensIn: 100,
// tokensOut: 50,
// cacheRead: 200,
// cacheWrite5m: 500,
// cacheWrite1h: 200,
// }It throws if the response shape is unrecognized or model is missing.
Response Structure
Both estimate() and estimateFromResponse() return an EstimateResponse:
interface EstimateResponse {
total: number // total USD cost
model: string // canonical model name returned by the server
traceId: Record<string, unknown> | null // pass-through trace data
trace: string // server-assigned trace identifier
breakdown: EstimateBreakdownGroup
}
interface EstimateBreakdownGroup {
input: EstimateBreakdown
output: EstimateBreakdown
cache?: EstimateCacheBreakdownGroup // present only when cache tokens were billed
}
interface EstimateCacheBreakdownGroup {
read?: EstimateBreakdown
write5m?: EstimateBreakdown
write1h?: EstimateBreakdown
}
interface EstimateBreakdown {
unit: string // e.g. "per-1M-input", "per-1M-cache-read"
branch: string // pricing tier that matched ("flat", "low", "high")
qty: number // tokens
rate: number // per-million USD rate
subtotal: number // computed cost for this line
}A model that wasn't sent any cache tokens will omit breakdown.cache entirely. A model with no cache pricing for a given field (e.g. OpenAI for write5m) will silently drop it and you'll see only the priced fields.
Configuration
| Parameter | Default | Description |
| ------------ | ------------------------------- | ------------------------------------------------------------------------ |
| apiKey | required | Your ModelPricing.ai API key (also reads MODELPRICING_API_KEY env var) |
| baseUrl | "https://api.modelpricing.ai" | API base URL (also reads MODELPRICING_BASE_URL env var) |
| timeout | 30000 | Request timeout in milliseconds |
| maxRetries | 3 | Maximum retry attempts for transient errors |
| fetch | globalThis.fetch | Optional custom fetch function for testing or custom HTTP |
Parameters are resolved in order: constructor option > environment variable > default.
const client = new ModelPricingClient({
apiKey: 'YOUR_API_KEY',
baseUrl: 'https://api.modelpricing.ai',
timeout: 30000,
maxRetries: 3
})Error Handling
The client raises typed exceptions for different failure modes:
| Exception | HTTP Status | When |
| ----------------- | ----------- | ----------------------------- |
| Unauthorized | 401 | Invalid or missing API key |
| ValidationError | 422 | Invalid model name or metrics |
| NotFound | 404 | Unknown endpoint |
| ServerError | 5xx | Server-side failures |
All exceptions inherit from ModelPricingError and include a statusCode property.
import { Unauthorized, ValidationError, ServerError } from 'modelpricing-ai'
try {
const estimate = await client.estimate({
model: 'gpt-4o-mini',
tokensIn: 1000,
tokensOut: 500
})
} catch (error) {
if (error instanceof Unauthorized) {
console.log('Check your API key')
} else if (error instanceof ValidationError) {
console.log(`Bad request: ${error.message}`)
} else if (error instanceof ServerError) {
console.log('Server error — will be retried automatically')
}
}estimateFromResponse additionally throws a plain Error if the response shape is unrecognized or model is missing — those cases never make it to the network.
Retry Behavior
The client automatically retries on transient errors with exponential backoff:
- Retries: 5xx server errors and network errors (
TypeErrorfromfetch) - No retry: 4xx client errors (401, 404, 422)
- Default: 3 retries with exponential backoff + jitter
// Increase retries for unreliable networks
const client = new ModelPricingClient({ apiKey: 'YOUR_API_KEY', maxRetries: 5 })
// Disable retries (no retry attempts)
const client = new ModelPricingClient({ apiKey: 'YOUR_API_KEY', maxRetries: 0 })License
MIT
