prompt-cache-optimizer
v0.2.1
Published
Drop-in wrapper for the Anthropic SDK that auto-places cache_control breakpoints, measures cache hit rate, and explains exactly why your prompt caching silently broke.
Maintainers
Readme
prompt-cache-optimizer
Drop-in wrapper for the Anthropic SDK that makes prompt caching effortless. Auto-places cache_control breakpoints based on observed prompt stability, measures real cache hit rate from the response usage object, and explains exactly what changed when your cache silently breaks.

Real output from
bun run example. Six calls — autoCache marks the system prompt cacheable after observing it twice, calls 3–5 hit the cache (~1569 cached tokens each), and a final deliberate drift triggers the diagnostic showing the exact characters that changed.client.stability()reportssystem score=0.80cumulative across the run.
Status: v0.2 — auto-placement + cache-miss diagnostics + per-segment stability report. Backwards compatible with v0.1.
Why this exists
Anthropic prompt caching gives you a 90% discount on the cached portion of your prompt. But the API is finicky:
- A misplaced
cache_controlbreakpoint silently degrades to a full-price call - You only get 4 breakpoints per request — they have to be spent well
- Cache prefixes break if message order shifts even slightly
- The default TTL is 5 minutes; lots of setups silently regress when calls come in slower than that
- The only way to know it's working is to parse
cache_read_input_tokensyourself
prompt-cache-optimizer handles all of that for you.
Install
npm install prompt-cache-optimizer @anthropic-ai/sdk
# or
bun add prompt-cache-optimizer @anthropic-ai/sdkQuick start (v0.2 — auto-placement)
import { CachedAnthropic } from "prompt-cache-optimizer";
const client = new CachedAnthropic({
apiKey: process.env.ANTHROPIC_API_KEY!,
autoCache: true, // ← let the wrapper place cache_control for you
diagnoseMisses: true, // ← explain what changed when the cache misses
warnIfHitRateBelow: 0.6,
});
// Use the SDK exactly like normal. No placeBreakpoints() needed.
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
system: longSystemPrompt,
messages: conversation,
});
console.log(response.cacheInfo);
// { hit: true, cachedTokens: 8420, uncachedTokens: 312, dollarsSaved: 0.024, ... }
console.log(client.stats());
// { totalCalls: 1, hitRate: 1, totalCachedTokens: 8420, dollarsSaved: 0.024, ... }
console.log(client.stability());
// { entries: [{ segment: 'system', stabilityScore: 1, approxTokens: 2103, ... }], ... }The first call always misses (that's when the cache is written). Once the wrapper has seen the system prompt twice unchanged, it auto-marks it cacheable and subsequent calls hit. No code changes needed when your prompt shape evolves — auto-placement re-evaluates each call.
How auto-placement decides what to cache
On every call the wrapper:
- Fingerprints each candidate segment —
system,tools, and every cumulativemessages[0..N]prefix — using SHA-256 over a canonical form (cache_control markers stripped so they don't affect the hash). - Tracks the fingerprint history per segment.
- Once a segment has been seen unchanged for at least
autoCacheMinObservationsconsecutive calls (default2), it qualifies for auto-placement. - Picks the highest-value placements within Anthropic's 4-breakpoint budget: system first, then tools, then the longest stable message prefix.
You can inspect this state live with client.stability().
Manual breakpoint placement (still supported)
If you want explicit control, placeBreakpoints from v0.1 still works exactly as before. Auto-placement is a no-op whenever you've already marked anything cacheable yourself — your intent is always respected.
import { placeBreakpoints } from "prompt-cache-optimizer";
const { system, messages } = placeBreakpoints({
system: longSystemPrompt,
messages: conversation,
strategy: "after-system",
});
await client.messages.create({ model, max_tokens, system, messages });Three strategies are available:
after-system— cache the system prompt (best for RAG and long instructions)after-last-assistant— cache the conversation history (best for chat)system-and-history— cache both (uses 2 of your 4 breakpoints)
Stats
client.stats();
// {
// totalCalls: 142,
// cacheHits: 124,
// hitRate: 0.873,
// totalCachedTokens: 1_240_000,
// totalUncachedTokens: 52_400,
// totalCacheWriteTokens: 21_000,
// dollarsSaved: 3.72,
// dollarsSpent: 1.41,
// }Cache-miss diagnostics
Enable diagnoseMisses: true and every cache-write-without-read warning gets a structured diff explaining what changed. Example:
new CachedAnthropic({
apiKey,
diagnoseMisses: true,
onWarning: (event) => {
if (event.code === "cache-write-without-read") {
console.error(event.message);
// → "...Detected: system prompt changed at character 1240: ...the docs as of [Tuesday|Wednesday]..."
console.error(event.detail?.diff);
// → [{ segment: 'system', summary: '...', detail: { changeIndex: 1240, ... } }]
}
},
});Common things it catches:
- system prompt drift (inserted timestamps, dynamic context)
- tool order changes
- retrieved-document reordering
- TTL expiration (cache was fine, then nobody called within 5 minutes)
Warnings
The client emits passive warnings (never throws, never blocks a request):
no-cache-control-found— you forgot to mark anything cacheable AND auto-cache hasn't activated yetcache-write-without-read— your prefix changed call-over-call; cache is broken (carries a diff whendiagnoseMisses: true)low-hit-rate— rolling hit rate fell below your thresholdunknown-model— pricing unknown, so dollar accounting is skippedauto-placement-applied— info-level: the wrapper just placed cache_control on a newly-stable segment
Route them anywhere:
new CachedAnthropic({
apiKey,
onWarning: (event) => logger.warn(event),
});Roadmap
- ~~v0.2 — auto-placement of
cache_controlbreakpoints based on observed prompt stability~~ ✅ shipped - v0.3 — safe message and tool reordering to maximize the stable prefix
- v0.4 — OpenAI and Gemini prompt caching support
- v1.0 — persistent stats adapter, middleware mode
Zero runtime dependencies
@anthropic-ai/sdk is a peer dependency. prompt-cache-optimizer itself has zero runtime deps. v0.2 uses Node's built-in node:crypto for fingerprinting.
Contributing
PRs welcome — see CONTRIBUTING.md.
Support this project
If this package saved you money on your Anthropic bill, consider buying me a coffee. This project is MIT-licensed and free forever; sponsorship just helps me spend more time on it.
License
MIT © Leonhail Paypa
⭐ If this package saved you money on your Anthropic bill, please star the repo. It's the single biggest signal that helps other developers find it.
