@calcis/vercel-ai

v1.0.4

Published

7 days ago

Calcis cost estimation middleware for Vercel AI SDK

0High
0Medium
0Low

calcis

calcis vercel ai sdk llm cost middleware

@calcis/vercel-ai

Calcis cost-estimation middleware for the Vercel AI SDK.

Live pricing for 25+ models, side-by-side comparisons, and a web estimator: https://calcis.dev
Full price index: https://calcis.dev/models
Compare models: https://calcis.dev/compare
API reference: https://calcis.dev/api-docs

Wrap any LanguageModelV1 with calcisMiddleware and get a one-line cost estimate for every generateText / streamText call, plus a rolling session total.

Install

npm install @calcis/vercel-ai ai

You also need a Calcis API key (Pro tier or above). Get one at calcis.dev/dashboard. Browse every supported model at calcis.dev/models.

Usage

import { generateText, wrapLanguageModel } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { calcisMiddleware } from "@calcis/vercel-ai";

const calcis = calcisMiddleware({
  apiKey: process.env.CALCIS_API_KEY!,
});

const model = wrapLanguageModel({
  model: anthropic("claude-sonnet-4-6"),
  middleware: calcis,
});

const { text } = await generateText({
  model,
  prompt: "Explain quantum computing in three bullets",
});

console.log(`Session total: $${calcis.sessionTotal().toFixed(4)}`);

Every call now emits:

[calcis] claude-sonnet-4-6 · 243 in · ~650 out · $0.0105 · session: $0.0105 (1 calls)

Streaming

streamText works the same: the estimate fires once at stream start.

const { textStream } = await streamText({
  model,
  prompt: "Write a limerick about quantum computing",
});
for await (const chunk of textStream) process.stdout.write(chunk);

Config

calcisMiddleware({
  apiKey: string,                    // required: calc_… key
  verbose?: boolean,                 // default true: logs per-call summary
  onEstimate?: (e) => void,          // optional structured sink
});

The returned middleware is a LanguageModelV1Middleware (so it composes with any other wrapLanguageModel middleware) and exposes three helpers:

calcis.sessionTotal(); // cumulative cost across calls
calcis.callCount();    // number of LLM calls so far
calcis.resetSession(); // zero the counters (e.g. per request)

How it works

The middleware attaches to both wrapGenerate and wrapStream. On each call:

Flatten the SDK prompt (role-tagged messages, multi-modal parts) into a single text string.
Read the model ID from model.modelId.
Fire a POST to https://www.calcis.dev/api/v1/estimate in the background: not awaited: so the real LLM call runs in parallel. Calcis latency never blocks generation.
When the estimate resolves, log it and invoke onEstimate.

Non-text prompt parts (images, files, tool results) are skipped - their cost contribution is provider-specific and Calcis estimates from the text portion today.

Failure mode

The middleware never throws. If the Calcis API is unreachable, returns an error, or the response is malformed, the handler silently skips the estimate for that call: the underlying generateText keeps running. Cost estimation is a nice-to-have; your product is not.

Compatibility

Tested against ai 4.3. The LanguageModelV1Middleware interface has been stable across the 3.x → 4.x series. If a future v2 interface lands, this package will grow a v2 code path rather than be rewritten.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@calcis/vercel-ai

Install

Usage

Streaming

Config

How it works

Failure mode

Compatibility

Links

License