npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@mcowger/pi-better-messages-cache

v1.5.0

Published

Pi extension: dual cache-breakpoint strategy for Anthropic models — marks both the last assistant tool_use block and the last user message block with cache_control, dramatically improving cache hit rates on MiniMax, Kimi, and other Anthropic-compatible pr

Downloads

163

Readme

pi-better-messages-cache

A pi extension that implements the dual cache-breakpoint strategy for Anthropic models, dramatically improving prompt-cache hit rates on MiniMax, Kimi, and other Anthropic-compatible providers.

It also fixes a streaming control-character bug where the Anthropic SDK's stream() crashes on raw \t / \n bytes inside tool-call JSON, leaving tool arguments as {} and producing unrecoverable error results.

This implements the optimization proposed in badlogic/pi-mono#1737, which the upstream maintainer declined to merge into core.


The problem

The built-in Anthropic provider marks the last user message block with cache_control. On some providers — notably MiniMax and Kimi — the preceding assistant tool_use and thinking blocks sit outside the cached window, so the cache must be re-read from scratch on almost every turn:

turn N
  [assistant]  thinking …          ← NOT cached ✗
               tool_use foo        ← NOT cached ✗
  [user]       tool_result foo     ← cache_control ✓  (only marker)

turn N+1
  The cache window starts at tool_result, missing the assistant blocks above.

The fix

Mark two locations per turn:

| Location | Who marks it | |---|---| | Last assistant tool_use block | This extension (new) | | Last user message block | Built-in provider (preserved) |

Both markers together ensure the full assistant turn (thinking + tool_use + tool_result) sits inside the growing cached prefix on every subsequent call:

turn N
  [assistant]  thinking …
               tool_use foo  ← cache_control ✓  (marker 1 — NEW)
  [user]       tool_result foo  ← cache_control ✓  (marker 2 — existing)

turn N+1
  The cache window now covers the entire assistant turn above.

This dual-marking pattern aligns with the cache strategies used by OpenCode, Kilo Code, and Roo Code.

Streaming control-character fix

The Anthropic SDK's stream() method parses each SSE event with bare JSON.parse. When the model emits raw tab (\t) or newline (\n) bytes inside a tool-call JSON argument (e.g. tab-indented oldText in an Edit call), JSON.parse throws:

Bad control character in string literal in JSON at position N

This error propagates out of the stream, cuts it before content_block_stop fires, and leaves the tool-call arguments as {}. The resulting error is displayed as the tool result and cannot be retried because the model has no context about the original arguments.

Fix: this extension replaces client.messages.stream() with client.messages.create().asResponse() to get the raw HTTP response, then parses SSE events using parseJsonWithRepair (from @earendil-works/pi-ai), which escapes raw control characters before handing off to JSON.parse. Streaming tool-call argument accumulation also uses parseStreamingJson instead of bare JSON.parse, ensuring partial JSON with control characters is handled correctly throughout the stream lifecycle.

Anthropic cache breakpoint limit

Anthropic-compatible APIs allow a maximum of 4 total blocks with cache_control in a single request.

That limit applies across the entire payload, including:

  • system prompt blocks
  • assistant tool_use blocks
  • user / tool_result blocks

In longer multi-turn conversations, a naive dual-marking strategy can accidentally exceed that limit and trigger errors like:

A maximum of 4 blocks with cache_control may be provided. Found 5.

To prevent this, this extension now enforces the limit before sending the request:

  • keep system prompt cache markers intact
  • keep the newest message-level cache breakpoints
  • remove older message-level cache breakpoints first

This preserves the most useful recent cache anchors while ensuring requests never exceed Anthropic's hard cap.

Empirical impact (from PR #1737 field data)

| Provider | Before | After | |---|---|---| | MiniMax / Kimi | near-zero cache hits | 80 %+ cache hit rate | | Anthropic native | baseline | small positive improvement |

Built-in pi caching — "cache hit wall" (MiniMax)

Note: Notice the "cache hit wall" at ~4.2K cache hits — the orange cache-hit line flatlines, while the cache-miss line continues climbing.

With pi-better-messages-cache extension — drastically improved cache hits

Note: Cache hits continue climbing throughout the session — the orange line no longer flatlines, achieving the dual cache-breakpoint strategy's intended behavior.


How it works

pi.registerProvider("anthropic", { api: "anthropic-messages", streamSimple }) replaces the global api-registry entry for the "anthropic-messages" API type. This transparently intercepts every model that uses that API — all native Anthropic models — without touching any model definitions, pricing, OAuth config, or other settings.

The custom streamSimple handler:

  1. Applies dual cache breakpoints — marks both the last assistant tool_use block and the last user message block with cache_control.
  2. Enforces the 4-breakpoint limit — keeps system prompt markers and the newest message-level breakpoints, removing older ones first.
  3. Streams via raw HTTP + custom SSE parser — uses client.messages.create().asResponse() instead of the SDK's stream(), then parses SSE events with parseJsonWithRepair to handle raw control characters in tool-call JSON.
  4. Uses parseStreamingJson for argument accumulation — ensures partial tool-call JSON containing control characters is parsed correctly throughout the stream, not just at the end.

Installation

# Global install (all projects)
pi install npm:@mcowger/pi-better-messages-cache

# Project-local install
pi install -l npm:@mcowger/pi-better-messages-cache

Try without installing

pi -e npm:@mcowger/pi-better-messages-cache

From git (latest unreleased)

pi install git:github.com/mcowger/pi-better-messages-cache

Requirements

  • pi (any recent version)
  • @earendil-works/pi-coding-agent and @earendil-works/pi-ai (bundled with pi, listed as peerDependencies)

Uninstalling

pi remove npm:@mcowger/pi-better-messages-cache

This restores the built-in Anthropic stream handler automatically.


License

MIT © mcowger