npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@askalf/brio

v0.0.4

Published

The capability layer for AI workloads — semantic cache, cost-aware tiering, structured cost reporting, policy enforcement. Sits in front of any Anthropic-compat endpoint (dario, api.anthropic.com, OpenRouter, vLLM, Ollama).

Downloads

242

Readme


Why this exists

You're paying Anthropic per token (or via subscription routed through dario) and watching half the spend go to questions you've already answered. A coding agent reads the same package.json thirty times in a session. A research agent re-fetches the same source on the second turn. A team of five engineers asks the same dependency-version question across the week. Every one of those costs you the full prompt tokens on every hit, even when the answer was identical the last time.

That's the wedge. brio caches the prompt-response pair under a semantic key and serves the cached answer when the same shape recurs, saving tokens, latency, and rate-limit headroom. Cache miss → request flows through to your backend untouched. Cache hit → response comes back in single-digit milliseconds with the cached answer marked so the calling agent knows it's a replay.

That's v0.1. Around it: cost-aware model tiering (route easy prompts to Haiku, hard ones to Opus, by length + complexity heuristics), structured cost reports (per-conversation, per-user, per-day), and a policy layer (model allowlists, cost caps, PII redaction). Eventually team mode (multi-user auth + per-user quotas + audit log) for org adoption.

Where it sits

   client (Cursor, Aider, Continue,            client (Claude Code,
   custom code, etc.)                          OpenClaw, Hermes, etc.)
        │                                            │
        └─────► http://localhost:8765 ◄──────────────┘
                        │
                       brio    ─── cache (semantic key)
                        │       ─── cost report
                        │       ─── tier (haiku / sonnet / opus)
                        │       ─── policy (allowlists, caps, DLP)
                        │
                        ▼
        ┌─────────────────────────────────────┐
        │  ANY Anthropic-compatible endpoint  │
        │                                     │
        │  - http://localhost:3456 (dario)    │ ← Claude Max via OAuth
        │  - https://api.anthropic.com         │ ← per-token API
        │  - https://openrouter.ai/v1          │ ← OpenRouter
        │  - http://localhost:11434            │ ← Ollama, etc.
        └─────────────────────────────────────┘

brio doesn't replace dario. dario solves "speak Anthropic's wire shape exactly so my Claude Max subscription works outside Claude Code." brio solves "make every backend smarter about cost, latency, and policy." Composing them: clients hit brio, brio caches what it can, the rest flows to dario, dario routes to your subscription. Either layer can run alone; neither requires the other.

60 seconds

# 1. Install.
npm install -g @askalf/brio

# 2. Point brio at whichever backend you want to wrap. Default is dario at :3456.
brio start                                   # wraps dario on localhost:3456
brio start --upstream=https://api.anthropic.com --api-key=$ANTHROPIC_API_KEY
brio start --upstream=https://openrouter.ai/v1 --api-key=$OPENROUTER_API_KEY

# 3. Point your client at brio instead of the backend directly.
ANTHROPIC_BASE_URL=http://localhost:8765
ANTHROPIC_API_KEY=brio                       # any value when running through dario

# 4. Use whatever client you already use. Everything routes through brio.
claude                                       # Claude Code
cursor                                       # Cursor
aider --model=claude-opus-4-7                # Aider

Run brio cost after a session. You'll see the cache hit rate, the dollar value of replay traffic, and the per-conversation breakdown.

What v0.1 ships

  • Semantic response cache — every successful request keyed by a hash over {model, system_prompt, messages, tools}. TTL configurable (default 1 hour). Hits return in single-digit ms with x-brio-cache: hit header. Disk-backed at ~/.brio/cache/<sha>.json. Verify with brio cache stats.
  • Cache-aware streaming — cache hits replay the original SSE event stream so streaming clients see the same chunks they would have without the cache. No bypass needed for streaming requests.
  • Structured cost reporting — every request records {timestamp, model, inputTokens, outputTokens, cacheHit, latencyMs, conversationId}. brio cost summarizes per-day, per-conversation, per-model. brio cost --json for piping into your own dashboards.
  • Pass-through everything else — non-cacheable requests, tool-call patterns brio doesn't understand yet, anything that touches /v1/files or other side-channels — all forwarded byte-for-byte to the upstream. brio is additive; it shouldn't change what works.
  • brio doctor — health check across upstream reachability, cache directory writability, and a smoke probe to verify the upstream is what you said it was.

What v0.2 will probably add

  • Cost-aware tiering--tier=auto routes prompts under N tokens to Haiku, complex ones to Opus, with explainable decisions surfaced via brio explain <request-id>.
  • Per-user accounting (single-machine) — a user header / API key from the client lets brio attribute spend per developer when one machine serves a team.
  • Policy file — declarative model allowlists, cost caps, PII regex strip-list, blocked tool patterns. brio policy validate lints the file.

What v1 will look like

  • Team mode — multi-user auth, per-user quotas, signed audit log. Turns brio from a personal middleware into a small ops-deployable service.
  • Federation — multiple brio instances coordinate cache + cost across machines.
  • Hot-reload config — change tier/policy/upstream without restart.

What v0.1 is NOT

  • Not a model proxy that forwards to multiple backends in parallel. brio talks to one upstream at a time. (For multi-backend routing, dario's pool mode handles that at the subscription layer.)
  • Not a vector store / RAG layer. The cache is keyed on the literal request shape, not on semantic similarity of message content. RAG and brio are orthogonal: brio caches the request whether or not it includes RAG-fetched context.
  • Not a guardrails framework with model-level safety reasoning. Policy is rule-based: regex / allowlist / cap. If you need an LLM-as-judge guardrails layer, run that as a separate service in front of brio.
  • Not branded as the askalf commercial product. brio is open OSS infrastructure. askalf is something else, and that something else may eventually run brio internally as a component.

Flags you'll reach for

| Flag | Default | Why | |---|---|---| | --upstream <url> | http://localhost:3456 | Where requests go on cache miss. Anthropic-compat endpoint. | | --port <n> | 8765 | brio's listen port. | | --api-key <k> | — | API key brio sends to upstream when upstream isn't dario. | | --cache-ttl <ms> | 3600000 (1h) | TTL on cache entries. 0 disables caching. | | --cache-dir <path> | ~/.brio/cache | Where cache files live. | | --no-cache | off | Bypass cache for this run. | | --no-cost | off | Suppress per-request cost line on stderr. | | --verbose, -v | — | Stream cache hits / misses / forward decisions to stderr. | | --upstream-format <anthropic\|openai> | auto | Wire format the upstream expects. Auto-detected from URL. |

Every flag mirrors a BRIO_* env var. CLI wins over env.

Trust and transparency

| Signal | Status | |---|---| | Runtime dependencies | Two — one HTTP framework, one schema validator. Pinned, audited. No hosted services, no telemetry. | | Credentials | API keys live in env vars or CLI flags; brio never persists them. Cache files store request + response payloads only. | | Network scope | Whatever upstream you point at, plus the cache TTL clock (no external time service). No other outbound traffic. Verify with lsof -i during a run. | | Telemetry | None. Zero analytics, tracking, or data collection. Deliberately, not aspirationally. | | License | MIT |

See DISCLAIMER.md for the full AS IS / no-affiliation / user-responsibility terms.

Relationship to other askalf projects

  • dario — wire-fidelity LLM router. brio's default upstream. Stable maintenance mode (drift watch only); brio is where active feature work lives.
  • hands — computer-use agent. Routes through brio (or dario, or anything Anthropic-compat) like any other client.
  • arnie — IT troubleshooting companion. Same — client of brio.
  • deepdive — local research agent. Same — client of brio.

askalf (the org) is the umbrella; the future commercial chat/agent product called askalf is something else and not what brio is.

Contributing

PRs welcome. Code style matches dario — small TypeScript, pure decision functions, node --test assertions on anything with logic in it. Run npm run build && npm test before submitting.

License

MIT — see LICENSE and DISCLAIMER.md.