npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@krishgupta/ai-prompt-cache

v0.2.0

Published

Production-grade prompt caching middleware suite for Vercel AI SDK

Readme

@krishgupta/ai-prompt-cache

Middleware for the Vercel AI SDK that enables prompt caching across providers. Tested improvements show 50%+ reduction in time-to-first-token (TTFT).

Features

  • Multi-provider prompt caching (OpenAI, Anthropic, Bedrock, Gemini)
  • Full response caching with streaming replay
  • In-flight request coalescing to deduplicate concurrent calls
  • Key sharding for high-QPS scenarios
  • Observability hooks for tracking cache hits and TTFT

Installation

npm install @krishgupta/ai-prompt-cache

Usage

import { wrapLanguageModel, streamText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { withPromptCache } from '@krishgupta/ai-prompt-cache';

const model = wrapLanguageModel({
  model: openai('gpt-4o'),
  middleware: withPromptCache({
    select: 'system-head',
    extraKeySalt: 'my-app-v1',
  }),
});

const result = await streamText({
  model,
  messages: [
    { role: 'system', content: 'You are a helpful assistant...' },
    { role: 'user', content: 'Hello!' },
  ],
});

Benchmark Results

Tested with OpenAI gpt-4o and a ~1200 line system prompt:

| Request | Mode | TTFT | |---------|------|------| | 1 | Baseline | 2223 ms | | 2 | With Cache | 1090 ms |

Result: 51% faster TTFT

Server-side metrics showed consistent cache key generation and TTFT dropping from 377ms to 337ms on subsequent cached requests.

Supported Providers

  • OpenAI (via promptCacheKey)
  • Anthropic (via cacheControl markers)
  • AWS Bedrock (via cachePoint)
  • Google Gemini (implicit caching)
  • OpenAI-compatible APIs

Options

| Option | Default | Description | |--------|---------|-------------| | select | system-head | Which prefix to cache. Options: system-head, tools+system, or a custom function | | extraKeySalt | undefined | Additional data to include in cache key (useful for RAG chunk IDs) | | onCacheResult | undefined | Callback with cache report after request completes | | debug | false | Enable debug logging |

Response Caching

For full response caching with streaming replay:

import { withPromptCache, withResponseCache, MemoryStore } from '@krishgupta/ai-prompt-cache';

const store = new MemoryStore({ maxSize: 1000 });

const model = wrapLanguageModel({
  model: openai('gpt-4o'),
  middleware: [
    withPromptCache({ select: 'system-head' }),
    withResponseCache({ store, ttlSeconds: 3600 }),
  ],
});

Cache Stores

Built-in stores:

// In-memory LRU cache
import { MemoryStore } from '@krishgupta/ai-prompt-cache';
const store = new MemoryStore({ maxSize: 1000 });

// File-based cache (for development)
import { FileStore } from '@krishgupta/ai-prompt-cache';
const store = new FileStore({ directory: '.cache' });

Custom stores need to implement:

interface CacheStore {
  get(key: string): Promise<string | null>;
  set(key: string, value: string, ttlSeconds?: number): Promise<void>;
}

How It Works

The middleware generates a stable SHA-256 hash of the cacheable prefix (system prompt by default) and passes it to the provider:

  • OpenAI: Sets promptCacheKey in provider options
  • Anthropic: Adds cacheControl markers to messages
  • Bedrock: Inserts cachePoint for Claude models

OpenAI caches prompts with 1024+ tokens and reuses them in 128-token increments. Anthropic requires similar minimum thresholds depending on the model.

Provider Requirements

| Provider | Minimum Tokens | |----------|----------------| | OpenAI | 1024 | | Anthropic Claude Sonnet | 1024 | | Anthropic Claude Haiku | 2048 |

License

MIT