npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@mzedstudio/llm-cache

v0.1.0

Published

LLM request/response caching with tiered TTL, time travel, and request normalization for Convex.

Readme

@mzedstudio/llm-cache

A Convex component that caches LLM API request/response pairs with tiered TTL, time travel, and built-in observability. Stop paying for duplicate calls — get instant responses for identical prompts.

Features

  • Deterministic cache keys — SHA-256 hash of normalized request parameters. Same prompt always hits the same cache entry, regardless of key order or whitespace.
  • Tiered TTL with auto-promotion — entries start at 24h, promote to 7 days on first hit, and can be pinned permanently.
  • Time travel — every time a cached request gets a new response, the old one is archived. Query the full response history for any request to see how model output has changed over time.
  • Request normalization — trims whitespace, lowercases model names, rounds floats, and sorts keys so {model: "GPT-4o", temperature: 0.70001} hits the same cache as {temperature: 0.7, model: "gpt-4o"}.
  • Flexible invalidation — delete by cache key, model name, model version, tag, or time range.
  • Query and inspect — filter cached entries by model, tag, or time range. Get hit counts, stats breakdowns, and storage metrics.
  • Configurable TTLs — set per-model and per-tag TTL overrides. Give ephemeral chat completions a short TTL and expensive embedding calls a long one.
  • OpenAI-compatible — works with any request format that has messages, model, and optional parameters like temperature, max_tokens, top_p.

Installation

npm install @mzedstudio/llm-cache convex

Setup

1. Register the component

// convex/convex.config.ts
import { defineApp } from "convex/server";
import llmCache from "@mzedstudio/llm-cache/convex.config.js";

const app = defineApp();
app.use(llmCache, { name: "llmCache" });
export default app;

2. Initialize the client

// convex/llm.ts
import { LLMCache } from "@mzedstudio/llm-cache";
import { components } from "./_generated/api";

const cache = new LLMCache(components.llmCache);

Usage

Cache an LLM call

import { action } from "./_generated/server";
import { v } from "convex/values";

export const chat = action({
  args: {
    messages: v.array(v.object({ role: v.string(), content: v.string() })),
  },
  handler: async (ctx, args) => {
    const request = {
      messages: args.messages,
      model: "gpt-4o",
      temperature: 0.7,
    };

    // Check cache first
    const cached = await cache.lookup(ctx, { request });
    if (cached) return cached.response;

    // Cache miss — call your LLM provider
    const response = await openai.chat.completions.create(request);

    // Store the response
    await cache.store(ctx, { request, response, tags: ["chat"] });

    return response;
  },
});

Pin important responses

Pinned entries never expire (TTL tier 2):

await cache.store(ctx, {
  request,
  response,
  pin: true,
  tags: ["system-prompt"],
});

Read-only peek (no hit counting)

Use peek from query contexts where mutations aren't available:

export const checkCache = query({
  args: { request: v.any() },
  handler: async (ctx, args) => {
    return await cache.peek(ctx, { request: args.request });
  },
});

Filter and browse cached entries

// All GPT-4o entries from the last hour
const entries = await cache.query(ctx, {
  model: "gpt-4o",
  after: Date.now() - 3600000,
});

// All entries tagged "summarize"
const summaries = await cache.query(ctx, { tag: "summarize", limit: 20 });

Time travel

When store() is called with a response that differs from the existing cached response for the same request, the old response is automatically archived. Use history() to retrieve the full timeline:

// Get every response this request has ever produced
const timeline = await cache.history(ctx, { request });

// timeline = [
//   { response: {...}, storedAt: 1706745600000, isCurrent: false },  // oldest
//   { response: {...}, storedAt: 1707000000000, isCurrent: false },
//   { response: {...}, storedAt: 1707350400000, isCurrent: true },   // current
// ]

Each entry includes isCurrent: boolean to distinguish the live cached response from archived ones. Archived entries preserve the model, modelVersion, tags, and metadata they had at the time of archival.

Identical responses are not archived — if the model returns the same output, no duplicate history entry is created.

Invalidate on model upgrade

// Clear all cached responses for a specific model
await cache.invalidate(ctx, { model: "gpt-4o" });

// Or invalidate by model version
await cache.invalidate(ctx, { modelVersion: "gpt-4o-2024-05-13" });

Scheduled cleanup

export const cleanupExpired = action({
  handler: async (ctx) => {
    const result = await cache.cleanup(ctx, { batchSize: 200 });
    console.log(`Deleted ${result.deletedCount} expired entries`);
    // result.hasMore === true means there are more to clean
  },
});

Preview before deleting with dryRun:

const preview = await cache.cleanup(ctx, { dryRun: true });
// preview.keys lists what *would* be deleted, but nothing is removed

Cache stats

const stats = await cache.getStats(ctx);
// {
//   totalEntries: 142,
//   totalHits: 891,
//   entriesByModel: { "gpt-4o": 98, "claude-sonnet-4-5-20250929": 44 },
//   hitsByModel: { "gpt-4o": 720, "claude-sonnet-4-5-20250929": 171 },
//   oldestEntry: 1706745600000,
//   newestEntry: 1707350400000,
// }

Configuration

Tune TTLs, toggle normalization, and set per-model or per-tag overrides:

await cache.setConfig(ctx, {
  config: {
    defaultTtlMs: 12 * 60 * 60 * 1000,            // 12 hours (default: 24h)
    promotionTtlMs: 14 * 24 * 60 * 60 * 1000,      // 14 days (default: 7d)
    ttlByModel: {
      "gpt-4o-mini": 1 * 60 * 60 * 1000,           // 1 hour for cheap models
      "gpt-4o": 48 * 60 * 60 * 1000,               // 2 days for expensive models
    },
    ttlByTag: {
      "embedding": 30 * 24 * 60 * 60 * 1000,       // 30 days for embeddings
    },
    normalizeRequests: true,                         // default: true
    maxEntries: 10000,
  },
});

TTL priority order: tag > model > default. When multiple tags match, the longest TTL wins.

Use replace: true to overwrite the entire config (unset fields revert to defaults):

await cache.setConfig(ctx, {
  config: { defaultTtlMs: 3600000 },
  replace: true,
});

How It Works

Cache Key Generation

Every request is hashed into a deterministic 64-character hex string (SHA-256). Before hashing, the request is normalized:

| Normalization step | Example | |---|---| | Sort object keys | {b:2, a:1} and {a:1, b:2} produce the same key | | Trim message content | " Hello " matches "Hello" | | Lowercase model name | "GPT-4o" matches "gpt-4o" | | Round floats to 2 places | 0.7000001 matches 0.7 | | Strip null/undefined fields | {model: "x", top_p: undefined} matches {model: "x"} |

Normalization can be disabled via config if you need exact-match behavior.

TTL Tiers

Cache entries automatically promote through three tiers based on access patterns:

| Tier | Name | Default Duration | Trigger | |---|---|---|---| | 0 | Default | 24 hours | Entry is first stored | | 1 | Promoted | 7 days | First cache hit (via lookup) | | 2 | Pinned | Never expires | store with pin: true |

On each lookup hit:

  • Tier 0 entries promote to Tier 1 (expiry extends to 7 days from now)
  • Tier 1 entries refresh their expiry (another 7 days from now)
  • Tier 2 entries are unaffected (pinned)

Popular requests naturally survive longer without manual intervention.

Response History (Time Travel)

The component maintains two tables:

  • cachedResponses — the current/active cached response for each unique request
  • responseHistory — archived responses that were replaced by newer ones

When store() is called for a request that already has a cached entry:

  1. If the new response differs from the existing one, the old response is archived to responseHistory
  2. The active entry is updated with the new response
  3. If the response is identical, no archive entry is created

This gives you a complete audit trail of how model outputs have changed over time for any given request.

Architecture

Your Convex action                     llm-cache component
──────────────────                     ────────────────────
                                       ┌───────────────────┐
cache.lookup(ctx, { request }) ───────>│ cachedResponses    │
  │                                    │   cacheKey (SHA-256)│
  ├─ hit ──> return cached response    │   request          │
  │   + increment hit count            │   response         │
  │   + promote TTL tier               │   hitCount, ttlTier│
  │                                    │   expiresAt        │
  └─ miss ──> call LLM API            │   model, tags      │
     then:                             └───────────────────┘
cache.store(ctx, {             ───────>        │
  request, response                            │ (if response changed)
})                                             ▼
                                       ┌───────────────────┐
cache.history(ctx, { request })───────>│ responseHistory    │
                                       │   cacheKey         │
                                       │   response         │
                                       │   model, tags      │
                                       │   storedAt         │
                                       └───────────────────┘

API Reference

LLMCache class

All methods are accessed through an LLMCache instance:

import { LLMCache } from "@mzedstudio/llm-cache";
const cache = new LLMCache(components.llmCache);

cache.lookup(ctx, { request, modelVersion? })

Find a cached response and increment the hit count. Promotes TTL tier on hit.

  • Context: mutation or action
  • Returns: CacheEntry | null
const cached = await cache.lookup(ctx, { request });
if (cached) {
  console.log(cached.response, cached.hitCount, cached.ttlTier);
}

cache.peek(ctx, { request, modelVersion? })

Read-only cache lookup. Does not increment hit count or promote TTL tier. Safe to use from queries.

  • Context: any (query, mutation, action)
  • Returns: CacheEntry | null

cache.store(ctx, { request, response, tags?, metadata?, pin?, modelVersion? })

Store an LLM response. If an entry with the same cache key already exists and the response differs, the old response is archived to the history table before the entry is updated.

  • Context: mutation or action
  • Returns: string (cache key)

| Parameter | Type | Description | |---|---|---| | request | Record<string, unknown> | The LLM request object | | response | unknown | The LLM response to cache | | tags | string[] | Optional tags for filtering and TTL overrides | | metadata | unknown | Optional arbitrary metadata | | pin | boolean | Pin entry (never expires, TTL tier 2) | | modelVersion | string | Version string for model-version-based invalidation |


cache.get(ctx, { cacheKey })

Direct lookup by cache key. No hit counting.

  • Context: any
  • Returns: CacheEntry | null

cache.query(ctx, { model?, tag?, after?, before?, limit? })

Filter and list cached entries. Results are ordered newest-first.

  • Context: any
  • Returns: CacheEntry[]

| Parameter | Type | Description | |---|---|---| | model | string | Filter by model name | | tag | string | Filter by tag | | after | number | Exclude entries created before this timestamp | | before | number | Exclude entries created after this timestamp | | limit | number | Max results (default 50, max 200) |


cache.history(ctx, { request })

Get the full response timeline for a request. Returns all archived responses plus the current one, ordered oldest-first.

  • Context: any
  • Returns: HistoryEntry[]

Each HistoryEntry contains:

| Field | Type | Description | |---|---|---| | cacheKey | string | The cache key for this request | | request | unknown | The original request | | response | unknown | The response at this point in time | | model | string | Model name | | modelVersion | string? | Model version at time of storage | | tags | string[]? | Tags at time of storage | | metadata | unknown? | Metadata at time of storage | | storedAt | number | Timestamp when this response was stored | | isCurrent | boolean | true for the active cached response, false for archived |


cache.invalidate(ctx, { cacheKey?, model?, modelVersion?, tag?, before? })

Delete matching cache entries. At least one filter parameter is required.

  • Context: mutation or action
  • Returns: number (count of deleted entries)

cache.cleanup(ctx, { batchSize?, dryRun? })

Remove expired entries in batches.

  • Context: action
  • Returns: CleanupResult

| Field | Type | Description | |---|---|---| | deletedCount | number | Entries deleted (0 if dryRun) | | keys | string[] | Cache keys that were (or would be) deleted | | hasMore | boolean | true if more expired entries remain |


cache.setConfig(ctx, { config, replace? })

Update cache configuration. By default, merges with existing config. Pass replace: true to overwrite entirely.

  • Context: mutation or action

cache.getConfig(ctx)

Read current cache configuration.

  • Context: any
  • Returns: CacheConfig

cache.getStats(ctx)

Get cache statistics.

  • Context: any
  • Returns: CacheStats

| Field | Type | Description | |---|---|---| | totalEntries | number | Total cached entries | | totalHits | number | Sum of all hit counts | | entriesByModel | Record<string, number> | Entry count per model | | hitsByModel | Record<string, number> | Hit count per model | | oldestEntry | number? | Oldest entry timestamp | | newestEntry | number? | Newest entry timestamp |

Configuration options

| Field | Type | Default | Description | |---|---|---|---| | defaultTtlMs | number | 86400000 (24h) | Default TTL for new entries | | promotionTtlMs | number | 604800000 (7d) | TTL after promotion to tier 1 | | ttlByModel | Record<string, number> | {} | Per-model TTL overrides | | ttlByTag | Record<string, number> | {} | Per-tag TTL overrides | | normalizeRequests | boolean | true | Enable request normalization | | maxEntries | number | none | Max cache entries (informational) |

Exported types

import type {
  CacheEntry,      // Full cache entry with metadata
  HistoryEntry,    // Time travel entry (archived or current)
  CacheConfig,     // Configuration document
  CacheStats,      // Statistics response
  CleanupResult,   // Cleanup action result
  ConfigUpdate,    // Partial config for updates
} from "@mzedstudio/llm-cache";

Validators are also exported for use in Convex function args/returns:

import {
  cacheEntryValidator,
  historyEntryValidator,
  configDocValidator,
  cacheStatsValidator,
  cleanupResultValidator,
  configUpdateValidator,
} from "@mzedstudio/llm-cache";

Example App

The example/ directory contains a full Next.js demo with:

  • Prompt — send prompts to LLM models, see cache hit/miss status
  • Explorer — browse and filter all cached entries
  • History — time travel UI showing the full response timeline for a request
  • Admin — configure TTLs, invalidate entries, run cleanup
  • Normalize — see how request normalization maps variants to the same cache key

To run the example:

cd example
npm install
npx convex dev   # in one terminal
npm run dev       # in another

Testing

39 tests covering cache operations, normalization, TTL tiers, time travel, queries, invalidation, cleanup, config, and stats.

pnpm test

To register the component in your own tests with convex-test:

import { convexTest } from "convex-test";
import { register } from "@mzedstudio/llm-cache/test";
import schema from "./schema";

const modules = import.meta.glob("./**/*.ts");

test("my test", async () => {
  const t = convexTest(schema, modules);
  register(t, "llmCache");
  // ... test your functions that use the cache
});

License

Apache-2.0