@oncely/ai
v1.0.1
Published
AI SDK middleware for oncely idempotency - prevent duplicate LLM calls
Downloads
17
Maintainers
Readme
@oncely/ai
AI SDK middleware for idempotent LLM calls. Wrap once, every retry is free.
Before & After
import { openai } from '@ai-sdk/openai';
import { generateText, wrapLanguageModel } from 'ai';
import { idempotencyMiddleware } from '@oncely/ai';
// ❌ Before: Every call costs money
const model = openai('gpt-4-turbo');
await generateText({ model, prompt: 'Hello' }); // API call → $0.01
await generateText({ model, prompt: 'Hello' }); // API call → $0.01
await generateText({ model, prompt: 'Hello' }); // API call → $0.01
// Total: $0.03 for the same response 3x 💸
// ✅ After: Add one wrapper
const idempotentModel = wrapLanguageModel({
model: openai('gpt-4-turbo'),
middleware: idempotencyMiddleware(),
});
await generateText({ model: idempotentModel, prompt: 'Hello' }); // API call → $0.01
await generateText({ model: idempotentModel, prompt: 'Hello' }); // Cache hit → $0.00 ✨
await generateText({ model: idempotentModel, prompt: 'Hello' }); // Cache hit → $0.00 ✨
// Total: $0.01 — saved 66% 🎉Installation
npm install @oncely/ai @oncely/core aiFor production, add a storage adapter:
npm install @oncely/redis ioredis # Standard Redis
npm install @oncely/upstash # Serverless (Upstash, Vercel KV)Usage
Basic (Memory Storage)
import { wrapLanguageModel } from 'ai';
import { idempotencyMiddleware } from '@oncely/ai';
const model = wrapLanguageModel({
model: yourModel,
middleware: idempotencyMiddleware(),
});Production (Redis)
import { wrapLanguageModel } from 'ai';
import { idempotencyMiddleware } from '@oncely/ai';
import { redis } from '@oncely/redis';
const model = wrapLanguageModel({
model: openai('gpt-4-turbo'),
middleware: idempotencyMiddleware({
storage: redis(),
ttl: '5m',
}),
});Serverless (Upstash)
import { idempotencyMiddleware } from '@oncely/ai';
import { upstash } from '@oncely/upstash';
const model = wrapLanguageModel({
model: anthropic('claude-3-opus'),
middleware: idempotencyMiddleware({
storage: upstash(),
ttl: '10m',
}),
});How Keys Are Generated
By default, the middleware generates a cache key by hashing:
- Model ID
- Prompt/messages
- Temperature, max tokens, and other generation parameters
Same inputs = same key = cached response.
Explicit Keys
Pass an explicit key via providerOptions:
const result = await generateText({
model,
prompt: 'Hello',
providerOptions: {
oncely: { key: 'user-123-greeting' },
},
});Custom Key Function
const model = wrapLanguageModel({
model: yourModel,
middleware: idempotencyMiddleware({
getKey: (params) => {
// Your custom key logic
return `custom:${hashObject(params.prompt)}`;
},
}),
});Options
| Option | Type | Default | Description |
| ------------------- | ------------------------- | --------------- | ----------------------------- |
| storage | StorageAdapter | MemoryStorage | Storage backend |
| ttl | string \| number | '5m' | Cache duration |
| getKey | (params) => string | Auto-hash | Custom key generation |
| includeModelInKey | boolean | true | Include model ID in cache key |
| onHit | (key, response) => void | — | Callback on cache hit |
| onMiss | (key) => void | — | Callback on cache miss |
Per-Request Options
Override settings per-request via providerOptions.oncely:
const result = await generateText({
model,
prompt: 'Hello',
providerOptions: {
oncely: {
key: 'explicit-key', // Override auto-generated key
ttl: '1h', // Override TTL for this request
skip: true, // Skip idempotency entirely
},
},
});Works With Any Provider
The middleware works with any AI SDK provider:
- OpenAI (
@ai-sdk/openai) - Anthropic (
@ai-sdk/anthropic) - Google (
@ai-sdk/google) - Mistral (
@ai-sdk/mistral) - Cohere (
@ai-sdk/cohere) - Local models (Ollama, llama.cpp)
- Any custom provider
Streaming Support
Works with both generateText and streamText:
const result = await streamText({
model,
prompt: 'Write a poem',
});
// Cached streams are replayed from storage
for await (const chunk of result.textStream) {
console.log(chunk);
}Combining with Other Middleware
import { wrapLanguageModel } from 'ai';
import { idempotencyMiddleware } from '@oncely/ai';
const model = wrapLanguageModel({
model: openai('gpt-4-turbo'),
middleware: [
idempotencyMiddleware({ storage: redis() }),
loggingMiddleware(),
rateLimitMiddleware(),
],
});Real-World Examples
Chat Endpoint — Deduplicate Impatient Users
app.post('/api/chat', async (c) => {
const { message, userId } = await c.req.json();
// User can spam the send button — you only pay once
const { text } = await generateText({
model: idempotentModel,
prompt: message,
providerOptions: {
oncely: { key: `chat:${userId}:${hash(message)}` },
},
});
return c.json({ response: text });
});AI Agent — Exactly-Once Tool Execution
const sendEmail = tool({
description: 'Send an email',
parameters: z.object({ to: z.string(), subject: z.string(), body: z.string() }),
execute: async (params) => {
// Without idempotency: retry = duplicate email sent
// With idempotency: retry = cached result, no duplicate
return await emailService.send(params);
},
});
const { text } = await generateText({
model: idempotentModel,
tools: { sendEmail },
prompt: task,
providerOptions: {
oncely: { key: `agent:${taskId}` }, // Entire agent run is idempotent
},
});Batch Processing — Crash-Resilient
for (const item of items) {
// Crash at item 50/100? Restart and items 1-49 are instant from cache
const { text } = await generateText({
model: idempotentModel,
prompt: `Summarize: ${item.content}`,
providerOptions: {
oncely: { key: `batch:${item.id}` },
},
});
}Track Your Savings
let tokensSaved = 0;
const model = wrapLanguageModel({
model: openai('gpt-4-turbo'),
middleware: idempotencyMiddleware({
storage: redis(),
onHit: (key, response) => {
tokensSaved += response.data?.usage?.totalTokens ?? 0;
console.log(`💰 Saved ${tokensSaved} tokens so far`);
},
}),
});Use Cases
- Chatbots — Prevent duplicate completions on network retries
- Agents — Ensure tool calls execute exactly once
- Batch processing — Resume interrupted jobs without re-running
- Rate limit recovery — Retry without burning extra tokens
- Development — Cache expensive calls during iteration
Related Packages
- @oncely/core — Core library
- @oncely/redis — Redis adapter
- @oncely/upstash — Upstash adapter
License
MIT
