@oncely/ai

v1.0.1

Published

4 months ago

AI SDK middleware for oncely idempotency - prevent duplicate LLM calls

Downloads

0High
0Medium
0Low

stacks0x

oncely idempotency ai llm openai anthropic vercel-ai-sdk middleware caching deduplication

@oncely/ai

AI SDK middleware for idempotent LLM calls. Wrap once, every retry is free.

Before & After

import { openai } from '@ai-sdk/openai';
import { generateText, wrapLanguageModel } from 'ai';
import { idempotencyMiddleware } from '@oncely/ai';

// ❌ Before: Every call costs money
const model = openai('gpt-4-turbo');

await generateText({ model, prompt: 'Hello' }); // API call → $0.01
await generateText({ model, prompt: 'Hello' }); // API call → $0.01
await generateText({ model, prompt: 'Hello' }); // API call → $0.01
// Total: $0.03 for the same response 3x 💸

// ✅ After: Add one wrapper
const idempotentModel = wrapLanguageModel({
  model: openai('gpt-4-turbo'),
  middleware: idempotencyMiddleware(),
});

await generateText({ model: idempotentModel, prompt: 'Hello' }); // API call → $0.01
await generateText({ model: idempotentModel, prompt: 'Hello' }); // Cache hit → $0.00 ✨
await generateText({ model: idempotentModel, prompt: 'Hello' }); // Cache hit → $0.00 ✨
// Total: $0.01 — saved 66% 🎉

Installation

npm install @oncely/ai @oncely/core ai

For production, add a storage adapter:

npm install @oncely/redis ioredis   # Standard Redis
npm install @oncely/upstash         # Serverless (Upstash, Vercel KV)

Usage

Basic (Memory Storage)

import { wrapLanguageModel } from 'ai';
import { idempotencyMiddleware } from '@oncely/ai';

const model = wrapLanguageModel({
  model: yourModel,
  middleware: idempotencyMiddleware(),
});

Production (Redis)

import { wrapLanguageModel } from 'ai';
import { idempotencyMiddleware } from '@oncely/ai';
import { redis } from '@oncely/redis';

const model = wrapLanguageModel({
  model: openai('gpt-4-turbo'),
  middleware: idempotencyMiddleware({
    storage: redis(),
    ttl: '5m',
  }),
});

Serverless (Upstash)

import { idempotencyMiddleware } from '@oncely/ai';
import { upstash } from '@oncely/upstash';

const model = wrapLanguageModel({
  model: anthropic('claude-3-opus'),
  middleware: idempotencyMiddleware({
    storage: upstash(),
    ttl: '10m',
  }),
});

How Keys Are Generated

By default, the middleware generates a cache key by hashing:

Model ID
Prompt/messages
Temperature, max tokens, and other generation parameters

Same inputs = same key = cached response.

Explicit Keys

Pass an explicit key via providerOptions:

const result = await generateText({
  model,
  prompt: 'Hello',
  providerOptions: {
    oncely: { key: 'user-123-greeting' },
  },
});

Custom Key Function

const model = wrapLanguageModel({
  model: yourModel,
  middleware: idempotencyMiddleware({
    getKey: (params) => {
      // Your custom key logic
      return `custom:${hashObject(params.prompt)}`;
    },
  }),
});

Options

| Option | Type | Default | Description | | ------------------- | ------------------------- | --------------- | ----------------------------- | | storage | StorageAdapter | MemoryStorage | Storage backend | | ttl | string \| number | '5m' | Cache duration | | getKey | (params) => string | Auto-hash | Custom key generation | | includeModelInKey | boolean | true | Include model ID in cache key | | onHit | (key, response) => void | — | Callback on cache hit | | onMiss | (key) => void | — | Callback on cache miss |

Per-Request Options

Override settings per-request via providerOptions.oncely:

const result = await generateText({
  model,
  prompt: 'Hello',
  providerOptions: {
    oncely: {
      key: 'explicit-key', // Override auto-generated key
      ttl: '1h', // Override TTL for this request
      skip: true, // Skip idempotency entirely
    },
  },
});

Works With Any Provider

The middleware works with any AI SDK provider:

OpenAI (@ai-sdk/openai)
Anthropic (@ai-sdk/anthropic)
Google (@ai-sdk/google)
Mistral (@ai-sdk/mistral)
Cohere (@ai-sdk/cohere)
Local models (Ollama, llama.cpp)
Any custom provider

Streaming Support

Works with both generateText and streamText:

const result = await streamText({
  model,
  prompt: 'Write a poem',
});

// Cached streams are replayed from storage
for await (const chunk of result.textStream) {
  console.log(chunk);
}

Combining with Other Middleware

import { wrapLanguageModel } from 'ai';
import { idempotencyMiddleware } from '@oncely/ai';

const model = wrapLanguageModel({
  model: openai('gpt-4-turbo'),
  middleware: [
    idempotencyMiddleware({ storage: redis() }),
    loggingMiddleware(),
    rateLimitMiddleware(),
  ],
});

Real-World Examples

Chat Endpoint — Deduplicate Impatient Users

app.post('/api/chat', async (c) => {
  const { message, userId } = await c.req.json();

  // User can spam the send button — you only pay once
  const { text } = await generateText({
    model: idempotentModel,
    prompt: message,
    providerOptions: {
      oncely: { key: `chat:${userId}:${hash(message)}` },
    },
  });

  return c.json({ response: text });
});

AI Agent — Exactly-Once Tool Execution

const sendEmail = tool({
  description: 'Send an email',
  parameters: z.object({ to: z.string(), subject: z.string(), body: z.string() }),
  execute: async (params) => {
    // Without idempotency: retry = duplicate email sent
    // With idempotency: retry = cached result, no duplicate
    return await emailService.send(params);
  },
});

const { text } = await generateText({
  model: idempotentModel,
  tools: { sendEmail },
  prompt: task,
  providerOptions: {
    oncely: { key: `agent:${taskId}` }, // Entire agent run is idempotent
  },
});

Batch Processing — Crash-Resilient

for (const item of items) {
  // Crash at item 50/100? Restart and items 1-49 are instant from cache
  const { text } = await generateText({
    model: idempotentModel,
    prompt: `Summarize: ${item.content}`,
    providerOptions: {
      oncely: { key: `batch:${item.id}` },
    },
  });
}

Track Your Savings

let tokensSaved = 0;

const model = wrapLanguageModel({
  model: openai('gpt-4-turbo'),
  middleware: idempotencyMiddleware({
    storage: redis(),
    onHit: (key, response) => {
      tokensSaved += response.data?.usage?.totalTokens ?? 0;
      console.log(`💰 Saved ${tokensSaved} tokens so far`);
    },
  }),
});

Use Cases

Chatbots — Prevent duplicate completions on network retries
Agents — Ensure tool calls execute exactly once
Batch processing — Resume interrupted jobs without re-running
Rate limit recovery — Retry without burning extra tokens
Development — Cache expensive calls during iteration

Related Packages

@oncely/core — Core library
@oncely/redis — Redis adapter
@oncely/upstash — Upstash adapter

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@oncely/ai

Before & After

Installation

Usage

Basic (Memory Storage)

Production (Redis)

Serverless (Upstash)

How Keys Are Generated

Explicit Keys

Custom Key Function

Options

Per-Request Options

Works With Any Provider

Streaming Support

Combining with Other Middleware

Real-World Examples

Chat Endpoint — Deduplicate Impatient Users

AI Agent — Exactly-Once Tool Execution

Batch Processing — Crash-Resilient

Track Your Savings

Use Cases

Related Packages

License