@tessera-llm/mastra
v0.1.2
Published
Drop-in Tessera integration for the Mastra agent framework. One line of config routes every Mastra Agent's LLM calls through Tessera's auto-route + auto-cache + auto-compress + auto-batch proxy. Compatible with @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/m
Maintainers
Readme
@tessera-llm/mastra
Drop-in cost optimization for the Mastra agent framework. One line of config routes every Mastra Agent's LLM calls through the Tessera optimization proxy: auto-route to cheaper-equivalent models, exact + provider-prompt-cache hits, prompt compression with per-stack quality canary, batch arbitrage on async-tolerant calls. Free Sandbox tier: 60M tokens/month, no card. Production: 20% of measured savings, $0 if we save you nothing.
Companion to tessera-sdk (vanilla provider SDKs), tessera-langchain (LangChain integration), tessera-vercel-ai (Vercel AI SDK integration), tessera-llamaindex (LlamaIndex integration), tessera-pydantic-ai (Pydantic AI integration), tessera-crewai (CrewAI multi-agent integration), and tessera-autogen (AutoGen 0.4+ multi-agent integration). Same proxy, same mechanic stack, Mastra-shaped API.
What it looks like
import { Agent } from "@mastra/core/agent";
import { createOpenAI } from "@ai-sdk/openai";
import { tesseraOpenAIConfig } from "@tessera-llm/mastra";
const openai = createOpenAI({
apiKey: process.env.OPENAI_API_KEY!,
...tesseraOpenAIConfig({ apiKey: process.env.TESSERA_API_KEY! }),
});
export const supportAgent = new Agent({
name: "Support",
instructions: "Triage incoming support tickets in two sentences.",
model: openai("gpt-4o"),
});Three changes in your code: one import, three lines in the constructor call. Every agent.generate() and agent.stream() call lands on the Tessera proxy with auto-route + auto-cache + auto-compress + auto-batch applied.
Or use the convenience factory (skips the explicit createOpenAI import):
import { Agent } from "@mastra/core/agent";
import { tesseraOpenAI } from "@tessera-llm/mastra";
const openai = await tesseraOpenAI({
openaiApiKey: process.env.OPENAI_API_KEY!,
tesseraApiKey: process.env.TESSERA_API_KEY!,
});
export const supportAgent = new Agent({
name: "Support",
instructions: "Triage incoming support tickets in two sentences.",
model: openai("gpt-4o"),
});Mastra's model router accepts AI SDK provider modules natively, so any provider returned by Tessera's factory works as a drop-in for the model: field on new Agent({...}).
Install
npm install @tessera-llm/mastra @mastra/core
# Plus whichever provider package you use:
npm install @ai-sdk/openai # or @ai-sdk/anthropic / @ai-sdk/mistral / @ai-sdk/groq / @ai-sdk/cohereThe @ai-sdk/* and @mastra/core packages are peer dependencies. Install only the providers you actually use. The Mastra core SDK is whatever version you already have on the project.
Get a free Tessera API key (60M tokens/mo, no card) at tesseraai.io/dev. Sign-up takes ~30 seconds and returns an instant tk_… key plus magic-link dashboard access.
Provider support
| Provider | @ai-sdk package | Tessera config function | Convenience factory |
|---|---|---|---|
| OpenAI | @ai-sdk/openai | tesseraOpenAIConfig | tesseraOpenAI |
| Anthropic | @ai-sdk/anthropic | tesseraAnthropicConfig | tesseraAnthropic |
| Mistral | @ai-sdk/mistral | tesseraMistralConfig | tesseraMistral |
| Groq | @ai-sdk/groq | tesseraGroqConfig | tesseraGroq |
| Cohere | @ai-sdk/cohere | tesseraCohereConfig | tesseraCohere |
Generic dispatcher available too: tesseraConfig("openai", { apiKey: "tk_..." }) returns the right { baseURL, headers } object regardless of provider. Useful when the provider is parameterized at runtime.
Worked example
Real customer-support agent on gpt-4o, 5B tokens/month, OpenAI list prices:
| Stage | Cost / mo | Saved | |---|---:|---:| | Baseline (OpenAI direct via Mastra) | $24,000 | n/a | | + Tessera (route, cache, prompt-cache headers, compress, M9 ceiling, batch) | $9,400 | $14,600 | | Tessera fee (20% × savings) | $2,920 | n/a | | You net pay | $12,320 | $11,680 / mo saved |
Verify the savings math yourself. Every billable line traces back to two immutable cost figures pinned to a multi-source pricing catalog snapshot captured at request time. Two engineers, three hours, can re-derive any month from raw inputs. Full procedure at tesseraai.io/trust.
Quality canary across the full mechanic stack: mean-score 0.96 (floor 0.95). 0.95 SLA held all 30 days. Full breakdown: worked example with mechanic-level numbers + canary results.
What Tessera does on every Agent call
Same mechanic stack as the main tessera-sdk. Each mechanic is opt-in per workload, observable per request, and bypasses when its quality canary drops below the per-stack 0.95 floor.
| Mechanic | What it does | Typical savings |
|---|---|---|
| Auto-route (m1) | Route to a cheaper-equivalent model gated by a daily promptfoo canary on your eval set | 15–35% on routed calls |
| Auto-cache (m2) | sha256 cache on the canonical request body, 7-day TTL, Cloudflare edge KV | 5–40% depending on prompt repetition |
| Auto-compress (m3) | Per-role heuristic compression (system + user toggles independent). Preserves code fences and JSON shapes. | 5–15% on prompt tokens |
| Prompt cache (m6) | Inject provider-native cache headers: OpenAI cached-input (50% off), Anthropic cache_control: ephemeral (90% off cache reads) | 50–90% on cached prefixes |
| Context prune (m7) | Conservative trim on long conversations (system + last 8 turns; TF-IDF rerank on RAG attachments) | 5–25% on multi-turn workloads |
| Output-length ceiling (m9) | Daily compute fits p90 of completion length per workload, injects maxTokens = p90 × 1.3 | 5–15% on completion cost |
| Batch arbitrage (m10) | Route async-tolerant Agent calls to provider Batch APIs (OpenAI Batch + Anthropic Message Batches both 50% off) | 50% on batch-eligible traffic |
| Cross-provider failover (m11) | When primary upstream returns 5xx / connection error / timeout, retry on OpenRouter (opt-in, default OFF) | Reliability primitive, n/a cost |
| Per-provider circuit breaker | Rolling 5xx-rate state machine per upstream. When a provider degrades, auto-route skips its intra-provider alternative mappings until the half-open probe succeeds. | n/a (keeps the savings stack honest) |
Pricing
- Free Sandbox: 60M tokens/month, 30 requests/minute, observability-only mechanics, no card. Forever.
- Production: over 60M tokens/month or higher rate limit. 20% of measured savings only. Zero savings, zero fee. Prepaid Stripe balance, $100 minimum top-up. No subscription, no commit, no minimum monthly.
Existing customers of tessera-sdk, tessera-langchain, tessera-llamaindex, or tessera-vercel-ai keep their rate_locked_pct (if any) on this package too. Same tk_… key, same billing record.
FAQ
Q: How is this different from the other tessera-* packages?
Same proxy. Same mechanics. Same billing. The five packages target different code surfaces:
tessera-sdk: patches the underlying provider client constructors (OpenAI, Anthropic, etc.) directly viatessera.activate(key). Use when calling provider SDKs without a framework.tessera-langchain: wires into LangChain ChatModel constructors. Use when you're on LangChain.tessera-llamaindex: wires into LlamaIndexLLMadapter constructors. Use when you're on LlamaIndex.tessera-vercel-ai: wires into the Vercel AI SDK provider factories. Use when you're onaicore +@ai-sdk/*without Mastra.tessera-mastra(this package): same Vercel AI SDK provider factory shape, Mastra-shaped README and E2E gate. Use when you're on Mastra Agents.
Pick whichever fits your codebase. Side-by-side install is supported: all five resolve to the same proxy and same billing record.
Q: Does this break my Agent tools / structured outputs / streaming / RAG?
No. The Vercel AI SDK provider object that Mastra accepts is unchanged in shape (agent.generate(), agent.stream(), generateObject, schema-constrained tool calls, and RAG retrieval workflows all work unchanged). Auto-route gates on tool-calling capability so an agent using tools never gets routed to a non-tool-capable model.
Q: Does this work with Mastra's string-shorthand model ID ("openai/gpt-4o")?
Not directly. Mastra's string shorthand uses environment variables like OPENAI_API_KEY against the provider's canonical endpoint. Tessera needs a custom baseURL + custom x-tessera-api-key header, which the string shorthand doesn't surface. Use the AI SDK provider factory form (shown in the examples above); Mastra accepts it as a first-class alternative to the string shorthand.
Q: What happens if Tessera's proxy is down?
Your Agent gets HTTP errors instead of LLM responses. On the proxy side, a per-provider circuit breaker tracks rolling 5xx rates and skips degraded providers in auto-route decisions. Cross-provider failover (m11) is opt-in and re-routes to OpenRouter when the primary upstream is down. See the workload toggle in the dashboard.
Q: What happens to my OpenAI / Anthropic rate limits?
They pass through. Tessera does not aggregate quotas across customers. Your provider rate limits apply normally; the proxy enforces only the Tessera tier limits (30 rpm Free Sandbox, 60 rpm Production by default; higher on request).
Q: Are you storing my prompts and completions?
No. We log only token counts, cost deltas, mechanics_stack, and provider response status. Prompts and completions are never persisted. Full data handling on tesseraai.io/security.
Q: Why are there two API surfaces (tesseraOpenAIConfig vs tesseraOpenAI)?
The config function returns the kwargs object you spread into createOpenAI(...): explicit, easy to combine with other settings (organization, custom fetch, etc.). The convenience factory imports createOpenAI for you and pre-merges. Use whichever you find more readable. Both ship in the same package.
Links
- Dashboard + free signup: tesseraai.io/dev
- How it works (per-mechanic deep dives): tesseraai.io/how-it-works
- Security + data handling: tesseraai.io/security
- Worked-numbers blog post: Customer-support workload, 48% saved, quality held
- Mastra docs: mastra.ai
About Tessera
Tessera is the substrate layer for LLM cost optimization, also called the Optimize Layer in our product surface. A thin proxy that sits in your application's request-path, applies a conservative cascade of optimization mechanics, and measures every saved dollar against an audit-immutable baseline. We bill 20% of verified savings, prepaid. Zero savings = zero fee. No per-token gateway fee, no subscription, no minimum monthly commitment; the category we operate in is "success-fee LLM optimizer," distinct from per-token AI gateways and observability dashboards.
Where observability tools tell you what you spent and AI gateways re-shape the request without measuring the cost delta, Tessera is the layer that does both, and only takes a cut when the measured savings are positive. The verified-savings ledger at ledger.tesseraai.io shows every original-vs-actual cost pair, snapshot-pinned to a pricing_catalog version captured at request time. Mid-contract price changes don't retroactively alter past savings. This is the FinOps-friendly model for AI inference: every line of the bill traces to a code-enforced rule.
Apache-2.0. Operated by Fintechagency OÜ (Tallinn, Estonia, registry code 16638667). Issues: github.com/tessera-llm/tessera-mastra/issues.
- Developer entry: tesseraai.io/dev
- Mechanic reference: tesseraai.io/how-it-works
- Dashboard: ledger.tesseraai.io
- Engineering blog: tesseraai.io/blog
