@calcis/vercel-ai
v1.0.4
Published
Calcis cost estimation middleware for Vercel AI SDK
Maintainers
Readme
@calcis/vercel-ai
Calcis cost-estimation middleware for the Vercel AI SDK.
Live pricing for 25+ models, side-by-side comparisons, and a web estimator: https://calcis.dev
- Full price index: https://calcis.dev/models
- Compare models: https://calcis.dev/compare
- API reference: https://calcis.dev/api-docs
Wrap any LanguageModelV1 with calcisMiddleware and get a one-line
cost estimate for every generateText / streamText call, plus a
rolling session total.
Install
npm install @calcis/vercel-ai aiYou also need a Calcis API key (Pro tier or above). Get one at calcis.dev/dashboard. Browse every supported model at calcis.dev/models.
Usage
import { generateText, wrapLanguageModel } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { calcisMiddleware } from "@calcis/vercel-ai";
const calcis = calcisMiddleware({
apiKey: process.env.CALCIS_API_KEY!,
});
const model = wrapLanguageModel({
model: anthropic("claude-sonnet-4-6"),
middleware: calcis,
});
const { text } = await generateText({
model,
prompt: "Explain quantum computing in three bullets",
});
console.log(`Session total: $${calcis.sessionTotal().toFixed(4)}`);Every call now emits:
[calcis] claude-sonnet-4-6 · 243 in · ~650 out · $0.0105 · session: $0.0105 (1 calls)Streaming
streamText works the same: the estimate fires once at stream
start.
const { textStream } = await streamText({
model,
prompt: "Write a limerick about quantum computing",
});
for await (const chunk of textStream) process.stdout.write(chunk);Config
calcisMiddleware({
apiKey: string, // required: calc_… key
verbose?: boolean, // default true: logs per-call summary
onEstimate?: (e) => void, // optional structured sink
});The returned middleware is a LanguageModelV1Middleware (so it
composes with any other wrapLanguageModel middleware) and exposes
three helpers:
calcis.sessionTotal(); // cumulative cost across calls
calcis.callCount(); // number of LLM calls so far
calcis.resetSession(); // zero the counters (e.g. per request)How it works
The middleware attaches to both wrapGenerate and wrapStream. On
each call:
- Flatten the SDK prompt (role-tagged messages, multi-modal parts) into a single text string.
- Read the model ID from
model.modelId. - Fire a POST to
https://www.calcis.dev/api/v1/estimatein the background: not awaited: so the real LLM call runs in parallel. Calcis latency never blocks generation. - When the estimate resolves, log it and invoke
onEstimate.
Non-text prompt parts (images, files, tool results) are skipped - their cost contribution is provider-specific and Calcis estimates from the text portion today.
Failure mode
The middleware never throws. If the Calcis API is unreachable,
returns an error, or the response is malformed, the handler silently
skips the estimate for that call: the underlying generateText
keeps running. Cost estimation is a nice-to-have; your product is
not.
Compatibility
Tested against ai 4.3. The LanguageModelV1Middleware interface
has been stable across the 3.x → 4.x series. If a future v2 interface
lands, this package will grow a v2 code path rather than be rewritten.
Links
- Web App
- API Docs
- CLI (
calcis) - LangChain integration (
@calcis/langchain) - LlamaIndex integration (
@calcis/llamaindex) - MCP server (
@calcis/mcp-server)
License
MIT
