npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@darylkang/arbiter

v0.1.0

Published

Research-grade CLI for studying LLM behavior as a distribution.

Readme

Arbiter

Arbiter is a research-grade CLI for studying LLM response distributions under repeated, controlled sampling.

It is designed for teams that need:

  • deterministic trial planning,
  • auditable artifact outputs,
  • reproducible run verification,
  • and clear provenance for requested vs. actual model behavior.

Arbiter focuses on measurement quality and traceability. It does not claim model correctness.


Contract Status

This README defines Arbiter's stabilized v1 product and artifact contracts.

Implementation rollout is tracked in docs/exec-plans/. If runtime behavior diverges from this document, treat that as either:

  • an implementation defect to fix, or
  • an explicit migration step that must be recorded in an ExecPlan.

What Arbiter does

Arbiter runs many trials against a fixed question and configuration, then records:

  • trial-level execution outputs with parse and embedding summaries,
  • batch-level novelty monitoring signals,
  • optional embedding-group outputs,
  • and a complete run manifest for verification.

This supports analysis of how response behavior changes across model/persona/protocol sampling choices.


Core principles

  • Schema-first: output contracts are defined by JSON Schemas.
  • Deterministic planning: trial plans are seeded and reproducible.
  • Audit-first artifacts: runs emit machine-verifiable files, not just terminal logs.
  • Provenance-aware: requested and actual model identifiers are both recorded.

Requirements

  • Node.js >=24
  • macOS/Linux terminal (TTY for interactive mode)
  • OpenRouter API key only for live runs (OPENROUTER_API_KEY)

Install

Option A: Install globally from npm

npm install -g @darylkang/arbiter

Option B: Install from source (editable/local development)

git clone https://github.com/darylkang/arbiter.git
cd arbiter
npm install
npm run build
npm link

Verify installation:

arbiter --version
arbiter --help

Quick start

Wizard entry (TTY)

Launch the wizard:

arbiter

Initialize a config

arbiter init

This writes arbiter.config.json in CWD, or the first collision-safe filename:

  • arbiter.config.1.json
  • arbiter.config.2.json
  • and so on

After writing, Arbiter prints:

  • the created config file path
  • suggested next commands: arbiter, arbiter run --config <file>

Headless run (default)

arbiter run --config arbiter.config.json

Live run override

export OPENROUTER_API_KEY=<your_key>
arbiter run --config arbiter.config.json --mode live

Dashboard monitor (human-only)

arbiter run --config arbiter.config.json --dashboard

If stdout is not TTY, Arbiter prints a warning to stderr and continues headless.


CLI Contract (v1)

Arbiter exposes exactly three primary entry points:

  1. arbiter
  2. arbiter init
  3. arbiter run

Global flags:

  • --help, -h
  • --version, -V

Command behavior:

  • arbiter: launch wizard when stdout is TTY; otherwise print help and exit 0.
  • arbiter init: write a collision-safe default config in CWD and never overwrite existing files.
  • arbiter run: headless execution command, requires --config <path>.

Run override flags (arbiter run):

  • --out <dir> (default: ./runs)
  • --workers <n>
  • --batch-size <n>
  • --max-trials <n>
  • --mode <mock|live>
  • --dashboard (TTY-only Stage 2/3 monitor)

Not part of v1:

  • no --headless
  • no --verbose
  • no wizard flag (--wizard)
  • no experiment-variable CLI flags (models, personas, protocol, decode, debate params, clustering thresholds)
  • no redundant aliases beyond -h and -V

Config Resolution Contract

Resolution precedence:

  1. built-in defaults
  2. config file
  3. CLI override flags

Per run directory, Arbiter writes:

  • config.source.json (exact input config as read)
  • config.resolved.json (final resolved config used to execute)

The source config file is never mutated during run execution.


Run Directory Contract

Scope note:

  • The artifact lists below apply to executed runs (arbiter run ...), including graceful user stop.
  • Resolve-only directories are tooling-internal and intentionally slimmer than executed-run artifact packs.

Each run writes to:

runs/<run_id>/

Run ID format:

  • YYYYMMDDTHHMMSSZ_<random6> (UTC timestamp + random suffix)

Always-produced files:

  • config.source.json
  • config.resolved.json
  • manifest.json
  • trial_plan.jsonl
  • trials.jsonl
  • monitoring.jsonl
  • receipt.txt

Conditionally produced files:

  • embeddings.arrow when at least one eligible embedding is finalized to Arrow
  • embeddings.jsonl as fallback when Arrow is not written, or when debug mode explicitly keeps JSONL embeddings
  • groups/assignments.jsonl and groups/state.json when grouping artifacts are emitted
  • debug/events.jsonl and debug/execution.log only when debug mode is enabled

Resolve-only run artifacts:

  • config.resolved.json
  • manifest.json

Consolidation notes:

  • trials.jsonl is the canonical per-trial record and includes parse plus embedding summaries.
  • for Debate runs, intermediate turns are persisted in per-trial transcript records in trials.jsonl.
  • final run-level metrics and embedding provenance summaries live in manifest.json.
  • this contract supersedes legacy artifact names such as parsed.jsonl, convergence_trace.jsonl, aggregates.json, embeddings.provenance.json, and clusters/*.

Exit Code Contract

Exit 0 for:

  • normal completion,
  • novelty saturation stop,
  • max-trials stop,
  • graceful Ctrl+C stop.

Use non-zero only for:

  • invalid config,
  • inability to start run,
  • fatal execution failure.

Interpreting results responsibly

Arbiter measures distributional behavior, not correctness.

Important guidance:

  • Stopping indicates novelty saturation under the configured measurement setup.
  • Embedding groups are measurement artifacts, not ground-truth semantic classes.
  • Free-tier models are useful for exploration but not ideal for publication-grade claims.
  • Always report measurement settings and model provenance when sharing results.

Troubleshooting

error: config not found ...

Initialize a config first:

arbiter init

Live run fails with missing API key

Set key in environment:

export OPENROUTER_API_KEY=<your_key>

--dashboard used in non-TTY

Arbiter warns to stderr and continues headless by contract.


Documentation

  • Design reference: docs/DESIGN.md
  • Wizard UX spec: docs/product-specs/tui-wizard.md
  • ExecPlan contract: docs/PLANS.md
  • Contributor/agent rules: AGENTS.md

License

MIT