npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@tinydarkforge/arbiter

v0.1.1

Published

Deterministic guardrails for TypeScript AI agent loops. Tool-call validation, dollar-cost circuit breakers, loop detection. No LLM dependency.

Readme

    ╔═╦═╦═╦═╗       █████ █████ █   █ █   █ █████ █   █
    ║▣║▣║▣║▣║       █     █   █ █   █ ██  █   █    █ █
    ╠═╩═╩═╩═╣       █  ██ █████ █   █ █ █ █   █     █
    ║ ░░⊗░░ ║       █   █ █   █ █   █ █  ██   █    █ █
    ║ ▒▒▒▒▒ ║       █████ █   █ █████ █   █   █   █   █
    ╠═══════╣
    ║▓▓▓▓▓▓▓║       ━━━━━━━━━━ AGENT GAUNTLET ━━━━━━━━━━
╔═══╩═══════╩═══╗   Limits · Schema · Tools · Cost · Loops
║▓▓░░░░░░░░░░░▓▓║   — one guard, one verdict, sub-5ms.
║▓▓▒▒▒▒▒▒▒▒▒▒▒▓▓║   No LLM. MIT · No account · No tel.
╚═══════════════╝

Arbiter is a deterministic safety gauntlet for TypeScript AI agent loops. Every step runs through one synchronous check() — token & dollar budgets, tool-call rules, schema, repetition and cycle detection — and returns ok / retry / abort in under 5ms. No LLM in the path. No embeddings. No surprises.

Status: v0.1.1 — live on npm: @tinydarkforge/arbiter. Public API frozen for v0.1; reason codes stable.


░▒▓█ TL;DR

npm install @tinydarkforge/arbiter
import { createGuard } from "@tinydarkforge/arbiter";

const guard = createGuard("./arbiter.config.json");
const verdict = guard.check({ task_id, output, tool_calls, model, tokens_in, tokens_out, attempt });

if (verdict.status === "abort") throw new Error(verdict.reasons[0].code);
if (verdict.status === "retry") continue;

arbiter does not generate. it judges.


░▒▓█ What it does today

Most AI guardrail tools protect a single LLM call. Arbiter protects the whole agent loop:

  • Budget enforcement — cumulative token + dollar caps across N steps, per-step caps, per-task caps. Stops $50 runaway agents.
  • Tool-call discipline — allowlist, per-tool arg schemas, mutex groups, blast-radius caps, sequence rules (deploy requires prior run_tests).
  • Output validation — JSON schema, length bounds, forbidden-pattern regex, deterministic — no model in the path.
  • Loop detection — n-gram repetition on output, identical-tool-call detection, state-cycle detection. No embeddings.
  • Retry budget — knows when to give up. retry_exhausted is an abort, not a hang.

One sync call. One verdict. Three statuses: ok, retry, abort.


░▒▓█ Guards

| Guard | Class | What it catches | |----------------|----------------------|--------------------------------------------------------------------------| | limits | budget | max_steps, per-step + cumulative tokens, output length out of range | | forbidden | output | case-insensitive regex over output ("as an ai language model", etc.) | | schema | output | Ajv on output JSON, on tool args | | tool-calls | sequencing / safety | allowlist, mutex, blast radius, requires_prev sequencing | | cost | budget | model + token → dollars via price table; max_dollars_per_task cap | | loop | runaway | n-gram repetition, identical tool repeat, state cycles |

Every guard is deterministic, reproducible, and runs in under 5ms p95 against a 2 KB output with three tool calls. No external calls. No model dependency.


░▒▓█ Positioning

Arbiter is not an output parser, a content moderation classifier, or a dialog flow engine. It is a loop-level safety gate that sits between your agent and its next step.

| Alternative | When to pick it instead of Arbiter | |----------------------|-------------------------------------------------------------------------| | Guardrails AI | Single-call validation in Python, with managed validators. | | NeMo Guardrails | Conversational dialog flow with Colang. Different problem. | | instructor | Structured-output extraction in Python. Schema only, no loop guards. | | LangChain parsers| You already live in LangChain and only need output shape validation. | | Llama Guard / Lakera | Semantic safety classification (toxicity, jailbreaks). Use as classifier behind Arbiter (v0.3 hook). |

Arbiter's niche: TypeScript-native, sub-5ms, deterministic, multi-step, no LLM dependency. If you need a hosted classifier or dialog rails, use those tools — and put Arbiter in front to enforce the budget.


░▒▓█ Three ways teams use it

  1. Cost circuit breaker. Set max_dollars_per_task: 0.50. Arbiter tracks model + tokens per step against a price table and aborts the loop the moment cumulative cost crosses the line.
  2. Tool sandbox. Restrict which tools the agent can call, validate args per tool, enforce sequence rules (deploy requires prior run_tests), cap blast radius (write_file max 5 per task).
  3. Loop killer. When an agent spirals (same output repeated, same tool call retried, state revisited), Arbiter detects the pattern deterministically — no embeddings — and aborts before your bill or your database gets shredded.

░▒▓█ Prerequisites

Node.js >=20. Single runtime dep: ajv. No other native or network dependencies.


░▒▓█ Install

From npm

npm install @tinydarkforge/arbiter

One-shot CLI via npx

echo '{...}' | npx @tinydarkforge/arbiter check --config ./arbiter.config.json

From source

git clone https://github.com/tinydarkforge/arbiter.git
cd arbiter
npm install
npm run build

░▒▓█ 30-second quickstart

import { createGuard } from "@tinydarkforge/arbiter";

const guard = createGuard("./arbiter.config.json");

let attempt = 0;
while (!done) {
  const { output, tool_calls, tokens_in, tokens_out } = await callModel(...);

  const verdict = guard.check({
    task_id: "task-1",
    state: "execute",
    output,
    tool_calls,
    model: "gpt-4o",
    tokens_in,
    tokens_out,
    attempt,
  });

  if (verdict.status === "abort") throw new Error(`arbiter aborted: ${verdict.reasons[0].code}`);
  if (verdict.status === "retry") { attempt++; continue; }

  attempt = 0;
  await executeToolCalls(tool_calls);
}

A runnable demo loop hitting cost_cap, loop_repeat_tool, tool_blast_radius, and tool_sequence lives at examples/agent.js.


░▒▓█ Configuration

arbiter.config.json:

{
  "limits": {
    "max_steps": 20,
    "max_tokens_per_step": 4000,
    "max_total_tokens": 50000,
    "output_min": 1,
    "output_max": 10000
  },
  "forbidden_patterns": ["as an ai language model", "lorem ipsum"],
  "tool_calls": {
    "allowed": ["search", "read_file", "write_file", "run_tests", "deploy"],
    "blast_radius": { "write_file": 5, "deploy": 1 },
    "mutex": [["deploy", "run_tests"]],
    "sequence": [{ "tool": "deploy", "requires_prev": "run_tests" }]
  },
  "cost": {
    "prices": {
      "gpt-4o":            { "input_per_1m": 2.5,  "output_per_1m": 10.0 },
      "gpt-4o-mini":       { "input_per_1m": 0.15, "output_per_1m": 0.6  },
      "claude-sonnet-4-6": { "input_per_1m": 3.0,  "output_per_1m": 15.0 },
      "claude-haiku-4-5":  { "input_per_1m": 1.0,  "output_per_1m": 5.0  }
    },
    "max_dollars_per_task": 0.5
  },
  "loop_detection": {
    "ngram_size": 5,
    "max_repeats": 2,
    "detect_identical_tool_calls": true,
    "max_state_visits": 3
  },
  "retry": { "max_attempts": 2 },
  "store": { "ttl_ms": 600000, "history_limit": 50 }
}

Schema: schema/config.schema.json.

Field reference

| Field | Type | Default | Description | |----------------------------------|-----------|-----------------|--------------------------------------------------------------------------| | limits.max_steps | number | — | Hard cap on steps per task_id before max_steps abort | | limits.max_tokens_per_step | number | — | Per-step token cap (sum of tokens_in + tokens_out) | | limits.max_total_tokens | number | — | Cumulative token cap across the task | | limits.output_min / output_max | number | — | Output length bounds — out of range = length_min / length_max retry | | forbidden_patterns | string[] | [] | Regex sources, applied case-insensitive over output | | tool_calls.allowed | string[] | — | Tool allowlist; anything else → tool_not_allowed abort | | tool_calls.arg_schemas | object | {} | Per-tool JSON schema for args (Ajv compiled, cached) | | tool_calls.mutex | string[][] | [] | Groups; calling two from the same group in one task → tool_mutex abort | | tool_calls.blast_radius | object | {} | Per-tool max call count across the task | | tool_calls.sequence | array | [] | { tool, requires_prev } rules; missing predecessor = tool_sequence | | cost.prices | object | {} | Per-model input_per_1m / output_per_1m USD | | cost.max_dollars_per_task | number | — | Cumulative cap; overflow → cost_cap abort | | loop_detection.ngram_size | number | 5 | Token n-gram window over output history | | loop_detection.max_repeats | number | 2 | Repeats before loop_repeat_output fires | | loop_detection.detect_identical_tool_calls | boolean | true | Stable JSON of {name, args} vs immediate previous | | loop_detection.max_state_visits| number | 3 | State-cycle threshold | | retry.max_attempts | number | — | Retry budget; attempt > max_attemptsretry_exhausted abort | | store.ttl_ms | number | 600000 | Per-task state TTL; reclaimed by gc() | | store.history_limit | number | 50 | Ring buffer length for output / tool / state histories |

Unknown models in cost.prices are silently skipped (no abort, no cost accumulated). Missing config sections fall back to library defaults.


░▒▓█ API

createGuard(configOrPath): Guard

type Guard = {
  check(input: CheckInput): CheckResult;
  reset(task_id: string): void;
  gc(ttl_ms?: number): number;   // returns count of evicted task states
};

CheckInput

type CheckInput = {
  task_id: string;
  step?: number;          // optional; auto-incremented
  state?: string;         // for state-cycle detection
  output?: string;        // model text/JSON output
  tool_calls?: { name: string; args: unknown; id?: string }[];
  model?: string;         // pricing key
  tokens_in?: number;
  tokens_out?: number;
  attempt?: number;       // for retry budget
};

CheckResult

type CheckResult = {
  status: "ok" | "retry" | "abort";
  reasons: { code: ReasonCode; message: string; meta?: unknown }[];
  metrics: {
    steps: number;
    total_tokens_in: number;
    total_tokens_out: number;
    total_dollars: number;
    tool_counts: Record<string, number>;
    elapsed_ms: number;
  };
};

Status semantics

| Status | Meaning | |----------|-----------------------------------------------------------------------------| | ok | Step accepted; state committed; continue the loop | | retry | Step rejected, recoverable; state not committed; bump attempt + retry | | abort | Step rejected, terminal; state not committed; tear the task down |

Aborts and retries never mutate task state. The store only commits on ok.

Reason codes

| Code | Class | Meaning | |------|-------|---------| | schema_invalid | retry | output failed JSON schema | | forbidden_pattern | retry | regex hit in output | | length_min / length_max| retry | output length out of range | | tool_args_invalid | retry | tool args failed schema | | max_steps | abort | step counter exceeded | | max_tokens_step | abort | per-step token cap exceeded | | max_tokens_total | abort | cumulative token cap exceeded | | tool_not_allowed | abort | tool call outside allowlist | | tool_sequence | abort | sequence rule violated | | tool_mutex | abort | mutually exclusive tools called | | tool_blast_radius | abort | tool called too many times | | cost_cap | abort | dollar cap exceeded | | loop_repeat_output | abort | n-gram repetition detected | | loop_repeat_tool | abort | identical tool call repeated | | loop_state_cycle | abort | state visited too often | | retry_exhausted | abort | retry budget hit |

Reasons accumulate in order. The first reason determines the status (abort > retry > ok).


░▒▓█ CLI

arbiter check --config ./arbiter.config.json < input.json
# stdin: a single CheckInput object as JSON
# stdout: a CheckResult object as JSON
# exit 0 = ok, 1 = retry, 2 = abort, 3 = CLI / config error
arbiter --version
arbiter --help

The CLI is intentionally thin — one in, one out, exit code carries the verdict. Wire it into any orchestrator that can pipe JSON.


░▒▓█ Performance

Measured on a 2 KB output with three tool calls, against the v0.1 reference config, on a single thread:

| Percentile | check() latency | |-----------:|-------------------| | p50 | ~0.003 ms | | p95 | ~0.007 ms | | p99 | ~0.015 ms |

CI gate enforces p95 < 5ms per release. The metrics.elapsed_ms field on every CheckResult lets callers track this in production.


░▒▓█ Limits / non-goals

  • No semantic drift via embeddings. Intentional — keeps zero AI deps. Use the v0.3 classifier hook with Llama Guard / Lakera if you need it.
  • No async check() in v0.1. Sync only. v0.3 adds checkAsync for HTTP / in-process classifiers.
  • No PII / prompt-injection detection in v0.1. Bundled deterministic packs ship in v0.2.
  • No dialog flow / conversational rails. Different problem — use NeMo Colang.
  • No LLM-as-judge evals. Use Promptfoo, Braintrust.
  • Arbiter never generates or rewrites output. It judges. Forever.

░▒▓█ Roadmap

  • v0.1 — multi-step loop guards, tool-call validation, cost cap, loop detection (current)
  • v0.2 — bundled deterministic safety packs (PII, prompt-injection, secrets) — regex/heuristic, no AI dep
  • v0.3 — classifier plug-in hook (HTTP + in-process) — bring your own Llama Guard / Lakera / OpenAI moderation / Presidio
  • v0.4+arbiter replay <log.jsonl> --config new.json, per-tenant quotas, OTel emission

Full plan: docs/tasks.md.


░▒▓█ Documentation

| Doc | What's in it | |----------------------------------------------------|---------------------------------------------------------------| | docs/description.md | Positioning, voice, target users, elevator pitch | | docs/tasks.md | Build sequence, status, post-launch roadmap | | schema/config.schema.json | Ajv-validated config schema (source of truth) | | examples/agent.js | Runnable demo loop hitting four distinct abort codes | | examples/config.json | Realistic reference config |


░▒▓█ License

MIT — © TinyDarkForge

            ╔═══╗
            ║ ⊗ ║   "JUDGE. BUDGET. HALT."
            ╚═══╝