@tessera-llm/vercel-ai

v0.1.3

Published

23 days ago

Drop-in Tessera integration for the Vercel AI SDK. One line of config routes generateText / streamText / generateObject / streamObject through Tessera's auto-route + auto-cache + auto-compress + auto-batch proxy. Compatible with @ai-sdk/openai, @ai-sdk/an

`@tessera-llm/vercel-ai`

Drop-in cost optimization for the Vercel AI SDK. One line of config routes your existing generateText / streamText / generateObject / streamObject calls through the Tessera optimization proxy — auto-route to cheaper-equivalent models, exact + provider-prompt-cache hits, prompt compression with per-stack quality canary, batch arbitrage on async-tolerant calls. Free Sandbox tier: 60M tokens/month, no card. Paid tiers: flat monthly subscription by token volume, keep 100% of savings.

Companion to tessera-sdk (vanilla provider SDKs), tessera-langchain (LangChain integration), tessera-llamaindex (LlamaIndex integration), tessera-mastra (Mastra Agent framework integration), tessera-pydantic-ai (Pydantic AI integration), tessera-crewai (CrewAI multi-agent integration), and tessera-autogen (AutoGen 0.4+ multi-agent integration). Same proxy, same mechanic stack, Vercel AI SDK-shaped API.

What it looks like

import { generateText } from "ai";
import { createOpenAI } from "@ai-sdk/openai";
import { tesseraOpenAIConfig } from "@tessera-llm/vercel-ai";

const openai = createOpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
  ...tesseraOpenAIConfig({ apiKey: process.env.TESSERA_API_KEY! }),
});

const { text } = await generateText({
  model: openai("gpt-4o"),
  prompt: "Summarize this customer support ticket in 2 sentences.",
});

Three changes in your code: one import, three lines in the constructor call. Your existing generateText / streamText / generateObject calls work unchanged.

Or use the convenience factory (skips the explicit createOpenAI import):

import { generateText } from "ai";
import { tesseraOpenAI } from "@tessera-llm/vercel-ai";

const openai = await tesseraOpenAI({
  openaiApiKey: process.env.OPENAI_API_KEY!,
  tesseraApiKey: process.env.TESSERA_API_KEY!,
});

const { text } = await generateText({
  model: openai("gpt-4o"),
  prompt: "Summarize this customer support ticket in 2 sentences.",
});

Install

npm install @tessera-llm/vercel-ai
# Plus whichever provider package you use:
npm install @ai-sdk/openai          # or @ai-sdk/anthropic / @ai-sdk/mistral / @ai-sdk/groq / @ai-sdk/cohere

The @ai-sdk/* packages are peer dependencies — install only the providers you actually use. The ai core SDK is whatever version you already have.

Get a free Tessera API key (60M tokens/mo, no card) at tesseraai.io/dev — sign-up takes ~30 seconds and returns an instant tk_… key plus magic-link dashboard access.

Provider support

| Provider | @ai-sdk package | Tessera config function | Convenience factory | |---|---|---|---| | OpenAI | @ai-sdk/openai | tesseraOpenAIConfig | tesseraOpenAI | | Anthropic | @ai-sdk/anthropic | tesseraAnthropicConfig | tesseraAnthropic | | Mistral | @ai-sdk/mistral | tesseraMistralConfig | tesseraMistral | | Groq | @ai-sdk/groq | tesseraGroqConfig | tesseraGroq | | Cohere | @ai-sdk/cohere | tesseraCohereConfig | tesseraCohere |

Generic dispatcher available too: tesseraConfig("openai", { apiKey: "tk_..." }) — returns the right { baseURL, headers } object regardless of provider. Useful when the provider is parameterized at runtime.

Worked example

Real customer-support agent on gpt-4o, 5B tokens/month, OpenAI list prices:

| Stage | Cost / mo | Saved | |---|---:|---:| | Baseline — OpenAI direct via Vercel AI SDK | $24,000 | — | | + Tessera (route, cache, prompt-cache headers, compress, M9 ceiling, batch) | $9,400 | $14,600 | | Tessera subscription (Growth tier, flat) | $999 | — | | You net pay | $10,399 | $13,601 / mo saved |

Verify the savings math yourself. Every billable line traces back to two immutable cost figures pinned to a multi-source pricing catalog snapshot captured at request time. Two engineers, three hours, can re-derive any month from raw inputs. Full procedure at tesseraai.io/trust.

Quality canary across the full mechanic stack: mean-score 0.96 (floor 0.95) — 0.95 SLA held all 30 days. Full breakdown: /blog/cut-openai-bill-48-percent-without-quality-regression.

What Tessera does on every request

Same mechanic stack as the main tessera-sdk. Each mechanic is opt-in per workload, observable per request, and bypasses when its quality canary drops below the per-stack 0.95 floor.

| Mechanic | What it does | Typical savings | |---|---|---| | Auto-route (m1) | Route to a cheaper-equivalent model gated by a daily promptfoo canary on your eval set | 15–35% on routed calls | | Auto-cache (m2) | sha256 cache on the canonical request body, 7-day TTL, Cloudflare edge KV | 5–40% depending on prompt repetition | | Auto-compress (m3) | Per-role heuristic compression (system + user toggles independent). Preserves code fences and JSON shapes. | 5–15% on prompt tokens | | Prompt cache (m6) | Inject provider-native cache headers — OpenAI cached-input (50% off), Anthropic cache_control: ephemeral (90% off cache reads) | 50–90% on cached prefixes | | Context prune (m7) | Conservative trim on long conversations (system + last 8 turns; TF-IDF rerank on RAG attachments) | 5–25% on multi-turn workloads | | Output-length ceiling (m9) | Daily compute fits p90 of completion length per workload, injects maxTokens = p90 × 1.3 | 5–15% on completion cost | | Batch arbitrage (m10) | Route async-tolerant calls to provider Batch APIs (OpenAI Batch + Anthropic Message Batches both 50% off) | 50% on batch-eligible traffic | | Per-provider circuit breaker | (Reliability primitive, above the mechanics.) Rolling 5xx-rate state machine per upstream — when a provider degrades, auto-route skips its intra-provider alternative mappings until the half-open probe succeeds. | n/a — keeps the savings stack honest |

Pricing

Free Sandbox — 60M tokens/month, 30 requests/minute, observability-only mechanics, no card. Forever.
Paid tiers — flat monthly subscription by token volume: Starter $199 (≤1B), Growth $999 (≤5B), Scale $3,999 (≤20B), Enterprise custom (20B+). You keep 100% of measured savings.

Existing customers of tessera-sdk and tessera-langchain keep their rate_locked_pct (if any) on this package too — same tk_… key, same billing record.

FAQ

Q: How is this different from `tessera-langchain` and `tessera-sdk`?

Same proxy. Same mechanics. Same billing. The three packages target different code surfaces:

tessera-sdk — patches the underlying provider client constructors (OpenAI, Anthropic, etc.) directly via tessera.activate(key). Use when calling provider SDKs without a framework.
tessera-langchain — wires into LangChain ChatModel constructors. Use when you're on LangChain.
tessera-vercel-ai (this package) — wires into the Vercel AI SDK provider factories (createOpenAI, createAnthropic, etc.). Use when you're on ai core + @ai-sdk/*.

Pick whichever fits your codebase. Side-by-side install is supported — all three resolve to the same proxy and same billing record.

Q: Does this break my eval / structured output / tool calling / streaming?

No. The Vercel AI SDK provider object behaves identically — generateText, streamText, generateObject, streamObject all work unchanged. Schema-constrained outputs pass through. Tools pass through (auto-route gates on tool-calling capability). Streaming streams.

Q: What happens if Tessera's proxy is down?

Your application gets HTTP errors instead of LLM responses. On the proxy side, a per-provider circuit breaker tracks rolling 5xx rates and skips degraded providers in auto-route decisions. Cross-provider failover (re-routing to a different provider entirely when an upstream is down) is on the roadmap, not shipped yet.

Q: What happens to my OpenAI / Anthropic rate limits?

They pass through. Tessera does not aggregate quotas across customers. Your provider rate limits apply normally; the proxy enforces only the Tessera tier limits (30 rpm Free Sandbox, 60 rpm Production by default — higher on request).

Q: Are you storing my prompts and completions?

No. We log only token counts, cost deltas, mechanics_stack, and provider response status. Prompts and completions are never persisted. Full data handling on tesseraai.io/security.

Q: Why are there two API surfaces (`tesseraOpenAIConfig` vs `tesseraOpenAI`)?

The config function returns the kwargs object you spread into createOpenAI(...) — explicit, easy to combine with other settings (organization, custom fetch, etc.). The convenience factory imports createOpenAI for you and pre-merges. Use whichever you find more readable. Both ship in the same package.

Q: Can I use this with the Next.js App Router / Server Actions / Edge Runtime?

Yes. @tessera-llm/vercel-ai is a thin ESM/CJS dual package with no runtime dependencies on its own — same compatibility as the Vercel AI SDK itself.

Architecture

Open-source SDK ↔ closed-source proxy. This package is a thin client that adds one HTTP hop. The actual mechanic decisions (route, cache, compress, etc.) run inside the Tessera Cloudflare Worker proxy at api.tesseraai.io. The split is intentional: the wire format is open so you can audit what we send; the mechanic implementations are closed because that's the asymmetric IP. See the tessera-sdk README's "Architecture" note for the longer explanation.

License

Apache-2.0. See LICENSE.

Contributing

We accept PRs that:

Add support for a new @ai-sdk/* provider package (paste-and-mirror the existing config function shape)
Improve typing precision (TypeScript strict)
Add concrete example scripts under examples/ showing a real Vercel AI SDK pipeline
Improve tests or test infrastructure

We do not accept PRs that change the proxy's HTTP contract — that lives in the closed-source worker.

Versioning

Semver. Wire format compatibility committed across minor releases; breaking changes only on major bumps.

Security

See SECURITY.md. Coordinated disclosure address: [email protected].

About Tessera

Tessera is the substrate layer for LLM cost optimization, also called the Optimize Layer in our product surface. A thin proxy that sits in your application's request-path, applies a conservative cascade of optimization mechanics, and measures every saved dollar against an audit-immutable baseline. We bill a flat monthly subscription by token volume (Starter $199, Growth $999, Scale $3,999, Enterprise custom); you keep 100% of measured savings. No per-token gateway fee; the category we operate in is "LLM cost optimizer," distinct from per-token AI gateways and observability dashboards.

Where observability tools tell you what you spent and AI gateways re-shape the request without measuring the cost delta, Tessera is the layer that does both, and proves the measured savings line by line. The verified-savings ledger at ledger.tesseraai.io shows every original-vs-actual cost pair, snapshot-pinned to a pricing_catalog version captured at request time. Mid-contract price changes don't retroactively alter past savings. This is the FinOps-friendly model for AI inference: every line of the bill traces to a code-enforced rule.

Operated by Fintechagency OÜ (Tallinn, Estonia, registry code 16638667).

Developer entry: tesseraai.io/dev
Mechanic reference: tesseraai.io/how-it-works
Dashboard: ledger.tesseraai.io
Engineering blog: tesseraai.io/blog