npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

llm-hedge

v0.2.0

Published

Execution layer that hides LLM tail latency with hedging, speculation, prefetch, and cancellation.

Downloads

442

Readme

llm-hedge

An execution layer that hides LLM tail latency with hedging, speculation, prefetch, and cancellation. It wraps an Anthropic-compatible client with a small set of domain-agnostic primitives so the slowest request stops setting the pace.

The library owns the mechanism (race, hedge, queue, retry, timeout, cancel, JSON extraction). Policy — what to make redundant, which candidates to race, whether a result is acceptable — stays with the caller and is injected through callbacks and generics. llm-hedge knows nothing about your domain types.

Reference implementation: among-ai drives an entire browser werewolf simulation through these primitives — speculative speaker races, hedged decision calls, a shared admission queue, and trace-based latency probing. See src/game/engine.ts and src/game/agents.ts there.

Install

pnpm add llm-hedge @anthropic-ai/sdk

@anthropic-ai/sdk is a peer dependency. The client targets any Anthropic-compatible baseURL (Anthropic, z.ai, OpenAI-compatible gateways) — provider selection is the caller's concern.

Primitives

Races — hide tail latency

import { hedge, raceCandidates } from "llm-hedge";

// Redundancy: run the SAME call N times, take the first success, abort the rest.
const answer = await hedge((ctx) => callModel({ signal: ctx.signal }), { slots: 3 });

// Speculation: run DIFFERENT candidates, adopt the first that finishes.
const { item, value } = await raceCandidates(
  speakers,
  (speaker, ctx) => generateSpeech(speaker, { signal: ctx.signal }),
  {
    onLosersAborted: ({ winner, losers, raceSize }) => {
      // emit a diagnostic — this hook is where your policy lives
    }
  }
);

Each attempt gets its own AbortSignal; when one wins, the others are aborted. The race rejects only if every attempt rejects (with the last error). An optional signal aborts all in-flight attempts.

Admission queue — bound concurrency and rate

import { createLlmQueue } from "llm-hedge";

const queue = createLlmQueue({
  concurrency: 5,        // or () => number for dynamic limits
  minIntervalMs: 0,
  onTrace: (event) => {  // queued / started / finished / aborted_waiting
    metrics.record(event);
  }
});

const release = await queue.acquire(signal, { model, maxTokens: 384, label: "speech" });
try {
  /* ... call the model ... */
} finally {
  release(); // idempotent
}

Waiting entries are cancellable via their signal and are dropped from the queue without ever consuming a slot.

Client + retry + timeout gate

import { createLlmClient, completeWithRetry } from "llm-hedge";

const client = createLlmClient({ apiKey, baseUrl, timeoutMs: 120_000 });

const text = await completeWithRetry({
  client,
  params: { model, system, messages, max_tokens: 384, temperature: 0.8 },
  queue,
  timeoutMs: 120_000,
  signal,
  label: "decision"
});

Each attempt acquires a queue slot, races the request against a timeout + external-cancellation gate, and releases the slot in finally. Retries are bounded (DEFAULT_LLM_ATTEMPTS) and gated by isRetryableError (429 / 5xx / ECONNRESET; timeouts and cancellations are terminal). Override attempts, isRetryable, or retryDelayMs to change the policy.

Cancellation & JSON helpers

import { mergeAbortSignals, throwIfAborted, abortError, sleep } from "llm-hedge";
import { parseJsonObject, tryParseJson } from "llm-hedge";

const { signal, cleanup } = mergeAbortSignals(parentSignal, requestSignal);
// ... use signal ...
cleanup();

// Tolerant structured-output parsing: strict parse, then first {...} span.
const obj = parseJsonObject(modelText); // Record<string, unknown> | null

Design

  • Dependency direction: your app → llm-hedge. The runtime never imports the app.
  • Mechanism vs policy: primitives are generic (generics + callbacks); slot allocation, candidate selection, result validation, and diagnostics are injected by the caller.
  • Tracing: the queue emits structured LlmTraceEvents to onTrace; fan that out to a metrics sink and/or stdout in your own glue code.

API

| Export | Purpose | | --- | --- | | hedge(run, { slots, signal? }) | Redundant race over N copies of one call | | raceCandidates(items, run, opts?) | Speculative race over different candidates | | mapConcurrentUnordered(items, run, opts) | Bounded pool that yields every result in completion order | | createLlmQueue(opts)LlmQueue | Concurrency/rate admission queue | | createLlmClient({ apiKey, baseUrl, timeoutMs }) | Anthropic-compatible client | | completeWithRetry(opts) | One completion: queue + timeout gate + retries | | isRetryableError, defaultRetryDelayMs | Retry policy defaults | | mergeAbortSignals, throwIfAborted, abortError, sleep | Cancellation utils | | parseJsonObject, tryParseJson | Tolerant JSON-object extraction |

License

MIT