npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

llm-cacher

v1.0.7

Published

Cache LLM responses — exact and semantic matching

Readme

llm-cacher

npm version npm downloads license CI

Cache LLM responses with exact and semantic matching. Works with OpenAI, Anthropic, and any SDK that follows a similar API shape. Supports in-memory, file, Redis, SQLite, and DynamoDB storage backends.

Table of Contents


Installation

npm install llm-cacher

Install the storage backend you need (all are optional peer dependencies):

# Redis
npm install ioredis

# SQLite
npm install better-sqlite3

# DynamoDB
npm install @aws-sdk/client-dynamodb

# Semantic caching with local model (no API key needed)
npm install @huggingface/transformers

# Semantic caching with OpenAI embeddings
npm install openai

# HNSW index (only needed for 10 000+ cached entries)
npm install hnswlib-node

Quick Start

OpenAI

import OpenAI from 'openai'
import { createCachedClient } from 'llm-cacher'

const openai = createCachedClient(new OpenAI(), {
  ttl: '24h',
  storage: 'memory',
})

// First call hits the API
const res1 = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'What is 2+2?' }],
})

// Second identical call is served from cache instantly
const res2 = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'What is 2+2?' }],
})

Anthropic

import Anthropic from '@anthropic-ai/sdk'
import { createCachedAnthropicClient } from 'llm-cacher'

const anthropic = createCachedAnthropicClient(new Anthropic(), {
  ttl: '12h',
  storage: 'sqlite',
})

const msg = await anthropic.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello!' }],
})

How It Works

Requests are cached by a SHA-256 hash of the request parameters (model, messages, temperature, etc.). The stream flag is excluded from the key so streaming and non-streaming calls share the same cache entry.

  • Cache hit: the response is returned immediately without calling the LLM API.
  • Cache miss: the request goes to the API, the response is stored, then returned.
  • Streaming: chunks are accumulated, stored as a list, and replayed as an AsyncGenerator on subsequent calls — the caller's code doesn't need to change.

Storage Backends

Memory (default, no extra deps)

createCachedClient(client, {
  ttl: '1h',
  storage: 'memory',
  maxSize: 500, // max entries, evicts oldest (LRU), default: 1000
})

File (JSON, useful for local dev and CI)

createCachedClient(client, {
  storage: 'file',
  storagePath: './llm-cacher.json',
})

Redis

import { RedisStorage } from 'llm-cacher'
import Redis from 'ioredis'

createCachedClient(client, {
  storage: new RedisStorage({ client: new Redis(), keyPrefix: 'llm:' }),
})

SQLite (great for single-process apps and scripts)

import { SQLiteStorage } from 'llm-cacher'

createCachedClient(client, {
  storage: new SQLiteStorage({ path: './llm-cacher.db' }),
  ttl: '7d',
})

DynamoDB

import { DynamoDBStorage } from 'llm-cacher'

createCachedClient(client, {
  storage: new DynamoDBStorage({
    tableName: 'llm-cache',
    region: 'us-east-1',
  }),
  ttl: '24h',
})

DynamoDB table requirements: partition key pk (String), optional TTL attribute ttl (Number). Enable TTL on the ttl attribute in the AWS console for automatic expiry.

To use different attribute names, pass keyAttribute, valueAttribute, or ttlAttribute to the constructor:

new DynamoDBStorage({
  tableName: 'llm-cache',
  region: 'us-east-1',
  keyAttribute: 'cacheKey',   // default: 'pk'
  valueAttribute: 'payload',  // default: 'value'
  ttlAttribute: 'expiresAt',  // default: 'ttl'
})

Error Handling

If the storage backend is unavailable, you can choose to fall through to the LLM API instead of throwing:

createCachedClient(client, {
  storage: new RedisStorage({ client }),
  onStorageError: 'throw', // default: 'passthrough'
})

Semantic Caching

Semantic caching matches similar prompts, not just identical ones. "What is 2+2?" and "What does 2 plus 2 equal?" can share the same cache entry.

Using a local model (no API key)

import { LocalEmbedder } from 'llm-cacher'

createCachedClient(client, {
  storage: 'sqlite',
  semantic: {
    embedder: new LocalEmbedder(), // downloads ~22MB model on first use
    threshold: 0.92,               // cosine similarity 0–1, higher = stricter
  },
})

Using OpenAI embeddings

import OpenAI from 'openai'
import Redis from 'ioredis'
import { OpenAIEmbedder, RedisStorage } from 'llm-cacher'

createCachedClient(client, {
  storage: new RedisStorage({ client: new Redis() }),
  semantic: {
    embedder: new OpenAIEmbedder({ client: new OpenAI() }),
    threshold: 0.95,
    indexType: 'hnsw', // 'flat' (default, O(N)) or 'hnsw' (O(log N), needs hnswlib-node)
  },
})

indexType: 'hnsw' is recommended when you expect more than ~10 000 cached entries.


Framework Integrations

Express

import express from 'express'
import OpenAI from 'openai'
import { llmCacheMiddleware } from 'llm-cacher/express'

const app = express()
app.use(llmCacheMiddleware({ ttl: '24h', storage: 'memory' }))

app.post('/chat', async (req, res) => {
  const openai = req.withCache(new OpenAI())
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: req.body.messages,
  })
  res.json(response)
})

Hono

import { Hono } from 'hono'
import OpenAI from 'openai'
import { llmCacheMiddleware } from 'llm-cacher/hono'

const app = new Hono()
app.use(llmCacheMiddleware({ ttl: '24h', storage: 'sqlite' }))

app.post('/chat', async (c) => {
  const openai = c.get('withCache')(new OpenAI())
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: await c.req.json(),
  })
  return c.json(response)
})

NestJS

// app.module.ts
import { Module } from '@nestjs/common'
import Redis from 'ioredis'
import { LlmCacheModule } from 'llm-cacher/nestjs'
import { RedisStorage } from 'llm-cacher'

@Module({
  imports: [
    LlmCacheModule.forRoot({
      ttl: '24h',
      storage: new RedisStorage({ client: new Redis() }),
      onStorageError: 'passthrough',
    }),
  ],
})
export class AppModule {}
// chat.service.ts
import { Injectable } from '@nestjs/common'
import OpenAI from 'openai'
import { LlmCacheService, InjectLlmCache } from 'llm-cacher/nestjs'

@Injectable()
export class ChatService {
  private readonly openai: OpenAI

  constructor(@InjectLlmCache() private readonly llmCache: LlmCacheService) {
    this.openai = this.llmCache.wrap(new OpenAI())
  }

  async chat(messages: OpenAI.ChatCompletionMessageParam[]) {
    return this.openai.chat.completions.create({ model: 'gpt-4o', messages })
  }
}

CLI

The CLI lets you inspect and manage cache files without writing code.

npx llm-cacher --help
Commands:
  stats   Show cache statistics
  list    List cached entries
  clear   Delete all cached entries

Options:
  --storage  Storage type: file | sqlite (default: sqlite)
  --path     Path to cache file (default: ./llm-cacher.db or ./llm-cacher.json)
  --limit    Max entries to list (default: 20)

Examples:

# SQLite stats
npx llm-cacher stats --storage sqlite --path ./llm-cacher.db

# List entries in a JSON cache
npx llm-cacher list --storage file --path ./llm-cacher.json --limit 10

# Clear SQLite cache
npx llm-cacher clear --storage sqlite --path ./llm-cacher.db

Examples

Runnable examples are in the examples/ folder. Requires OPENAI_API_KEY.

| File | What it shows | |---|---| | basic.ts | Memory cache — first call vs cached call, timing comparison | | streaming.ts | Streaming request on first call, chunk replay from cache on second | | with-redis.ts | Redis storage with onStorageError: 'passthrough' | | semantic.ts | Local embedder — different phrasings hit the same cache entry |

npx tsx examples/basic.ts
npx tsx examples/streaming.ts
npx tsx examples/semantic.ts     # needs: npm install @huggingface/transformers
npx tsx examples/with-redis.ts   # needs: Redis on localhost:6379

API Reference

createCachedClient(client, options?)

Wraps any OpenAI-compatible client with caching. Returns a Proxy with the same TypeScript type as the original.

createCachedAnthropicClient(client, options?)

Same as above but for Anthropic's messages.create.

LlmCacheOptions

| Option | Type | Default | Description | |---|---|---|---| | ttl | string \| number | undefined | Time-to-live. String: "24h", "30m", "7d", "500ms". Number: milliseconds. | | storage | 'memory' \| 'file' \| 'sqlite' \| IStorage | 'memory' | Storage backend. Pass an IStorage instance for Redis/DynamoDB. | | storagePath | string | see below | File path for 'file' (default ./llm-cacher.json) or 'sqlite' (default ./llm-cacher.db). | | maxSize | number | 1000 | Max entries for 'memory' storage. | | onStorageError | 'throw' \| 'passthrough' | 'passthrough' | Behaviour when storage read/write fails. | | semantic | SemanticOptions | undefined | Enable semantic matching. |

SemanticOptions

| Option | Type | Default | Description | |---|---|---|---| | embedder | IEmbedder | required | Embedding model to use. | | threshold | number | 0.92 | Minimum cosine similarity (0–1) to count as a cache hit. | | indexType | 'flat' \| 'hnsw' | 'flat' | Search index. Use 'hnsw' for large caches (10k+ entries). |

Storage classes

| Class | Package | Description | |---|---|---| | MemoryStorage | built-in | LRU in-memory cache. | | FileStorage | built-in | JSON file. | | RedisStorage | ioredis | Redis via ioredis. | | SQLiteStorage | better-sqlite3 | SQLite file. | | DynamoDBStorage | @aws-sdk/client-dynamodb | AWS DynamoDB. |

Resource cleanup: Call the appropriate method when your process shuts down to release connections and background timers:

memoryStorage.destroy()   // stops the expiry sweep timer
sqliteStorage.close()     // closes the SQLite connection
await redisStorage.quit() // disconnects from Redis

Embedders

| Class | Package | Dimensions | Description | |---|---|---|---| | LocalEmbedder | @huggingface/transformers | 384 | all-MiniLM-L6-v2, runs locally. | | OpenAIEmbedder | openai | 1536 | text-embedding-3-small. |


Configuration

TTL format

| String | Meaning | |---|---| | "500ms" | 500 milliseconds | | "30m" | 30 minutes | | "24h" | 24 hours | | "7d" | 7 days |

A numeric value is treated as milliseconds.

Custom storage

Implement IStorage to plug in any backend:

import type { IStorage, CacheEntry } from 'llm-cacher'

class MyStorage implements IStorage {
  async get(key: string): Promise<CacheEntry | null> { ... }
  async set(key: string, entry: CacheEntry): Promise<void> { ... }
  async delete(key: string): Promise<void> { ... }
  async clear(): Promise<void> { ... }
}

createCachedClient(client, { storage: new MyStorage() })

License

MIT

❤️ Support

If llm-cacher saves you money on LLM bills, consider supporting development: