@tinydarkforge/arbiter

v0.1.1

Published

9 days ago

Deterministic guardrails for TypeScript AI agent loops. Tool-call validation, dollar-cost circuit breakers, loop detection. No LLM dependency.

0High
0Medium
0Low

doceno

ai agents guardrails llm safety validation tool-calls deterministic

    ╔═╦═╦═╦═╗       █████ █████ █   █ █   █ █████ █   █
    ║▣║▣║▣║▣║       █     █   █ █   █ ██  █   █    █ █
    ╠═╩═╩═╩═╣       █  ██ █████ █   █ █ █ █   █     █
    ║ ░░⊗░░ ║       █   █ █   █ █   █ █  ██   █    █ █
    ║ ▒▒▒▒▒ ║       █████ █   █ █████ █   █   █   █   █
    ╠═══════╣
    ║▓▓▓▓▓▓▓║       ━━━━━━━━━━ AGENT GAUNTLET ━━━━━━━━━━
╔═══╩═══════╩═══╗   Limits · Schema · Tools · Cost · Loops
║▓▓░░░░░░░░░░░▓▓║   — one guard, one verdict, sub-5ms.
║▓▓▒▒▒▒▒▒▒▒▒▒▒▓▓║   No LLM. MIT · No account · No tel.
╚═══════════════╝

Arbiter is a deterministic safety gauntlet for TypeScript AI agent loops. Every step runs through one synchronous check() — token & dollar budgets, tool-call rules, schema, repetition and cycle detection — and returns ok / retry / abort in under 5ms. No LLM in the path. No embeddings. No surprises.

Status: v0.1.1 — live on npm: @tinydarkforge/arbiter. Public API frozen for v0.1; reason codes stable.

░▒▓█ TL;DR

npm install @tinydarkforge/arbiter

import { createGuard } from "@tinydarkforge/arbiter";

const guard = createGuard("./arbiter.config.json");
const verdict = guard.check({ task_id, output, tool_calls, model, tokens_in, tokens_out, attempt });

if (verdict.status === "abort") throw new Error(verdict.reasons[0].code);
if (verdict.status === "retry") continue;

arbiter does not generate. it judges.

░▒▓█ What it does today

Most AI guardrail tools protect a single LLM call. Arbiter protects the whole agent loop:

Budget enforcement — cumulative token + dollar caps across N steps, per-step caps, per-task caps. Stops $50 runaway agents.
Tool-call discipline — allowlist, per-tool arg schemas, mutex groups, blast-radius caps, sequence rules (deploy requires prior run_tests).
Output validation — JSON schema, length bounds, forbidden-pattern regex, deterministic — no model in the path.
Loop detection — n-gram repetition on output, identical-tool-call detection, state-cycle detection. No embeddings.
Retry budget — knows when to give up. retry_exhausted is an abort, not a hang.

One sync call. One verdict. Three statuses: ok, retry, abort.

░▒▓█ Guards

| Guard | Class | What it catches | |----------------|----------------------|--------------------------------------------------------------------------| | limits | budget | max_steps, per-step + cumulative tokens, output length out of range | | forbidden | output | case-insensitive regex over output ("as an ai language model", etc.) | | schema | output | Ajv on output JSON, on tool args | | tool-calls | sequencing / safety | allowlist, mutex, blast radius, requires_prev sequencing | | cost | budget | model + token → dollars via price table; max_dollars_per_task cap | | loop | runaway | n-gram repetition, identical tool repeat, state cycles |

Every guard is deterministic, reproducible, and runs in under 5ms p95 against a 2 KB output with three tool calls. No external calls. No model dependency.

░▒▓█ Positioning

Arbiter is not an output parser, a content moderation classifier, or a dialog flow engine. It is a loop-level safety gate that sits between your agent and its next step.

| Alternative | When to pick it instead of Arbiter | |----------------------|-------------------------------------------------------------------------| | Guardrails AI | Single-call validation in Python, with managed validators. | | NeMo Guardrails | Conversational dialog flow with Colang. Different problem. | | instructor | Structured-output extraction in Python. Schema only, no loop guards. | | LangChain parsers| You already live in LangChain and only need output shape validation. | | Llama Guard / Lakera | Semantic safety classification (toxicity, jailbreaks). Use as classifier behind Arbiter (v0.3 hook). |

Arbiter's niche: TypeScript-native, sub-5ms, deterministic, multi-step, no LLM dependency. If you need a hosted classifier or dialog rails, use those tools — and put Arbiter in front to enforce the budget.

░▒▓█ Three ways teams use it

Cost circuit breaker. Set max_dollars_per_task: 0.50. Arbiter tracks model + tokens per step against a price table and aborts the loop the moment cumulative cost crosses the line.
Tool sandbox. Restrict which tools the agent can call, validate args per tool, enforce sequence rules (deploy requires prior run_tests), cap blast radius (write_file max 5 per task).
Loop killer. When an agent spirals (same output repeated, same tool call retried, state revisited), Arbiter detects the pattern deterministically — no embeddings — and aborts before your bill or your database gets shredded.

░▒▓█ Prerequisites

Node.js >=20. Single runtime dep: ajv. No other native or network dependencies.

░▒▓█ Install

From npm

npm install @tinydarkforge/arbiter

One-shot CLI via npx

echo '{...}' | npx @tinydarkforge/arbiter check --config ./arbiter.config.json

From source

git clone https://github.com/tinydarkforge/arbiter.git
cd arbiter
npm install
npm run build

░▒▓█ 30-second quickstart

import { createGuard } from "@tinydarkforge/arbiter";

const guard = createGuard("./arbiter.config.json");

let attempt = 0;
while (!done) {
  const { output, tool_calls, tokens_in, tokens_out } = await callModel(...);

  const verdict = guard.check({
    task_id: "task-1",
    state: "execute",
    output,
    tool_calls,
    model: "gpt-4o",
    tokens_in,
    tokens_out,
    attempt,
  });

  if (verdict.status === "abort") throw new Error(`arbiter aborted: ${verdict.reasons[0].code}`);
  if (verdict.status === "retry") { attempt++; continue; }

  attempt = 0;
  await executeToolCalls(tool_calls);
}

A runnable demo loop hitting cost_cap, loop_repeat_tool, tool_blast_radius, and tool_sequence lives at examples/agent.js.

░▒▓█ Configuration

arbiter.config.json:

{
  "limits": {
    "max_steps": 20,
    "max_tokens_per_step": 4000,
    "max_total_tokens": 50000,
    "output_min": 1,
    "output_max": 10000
  },
  "forbidden_patterns": ["as an ai language model", "lorem ipsum"],
  "tool_calls": {
    "allowed": ["search", "read_file", "write_file", "run_tests", "deploy"],
    "blast_radius": { "write_file": 5, "deploy": 1 },
    "mutex": [["deploy", "run_tests"]],
    "sequence": [{ "tool": "deploy", "requires_prev": "run_tests" }]
  },
  "cost": {
    "prices": {
      "gpt-4o":            { "input_per_1m": 2.5,  "output_per_1m": 10.0 },
      "gpt-4o-mini":       { "input_per_1m": 0.15, "output_per_1m": 0.6  },
      "claude-sonnet-4-6": { "input_per_1m": 3.0,  "output_per_1m": 15.0 },
      "claude-haiku-4-5":  { "input_per_1m": 1.0,  "output_per_1m": 5.0  }
    },
    "max_dollars_per_task": 0.5
  },
  "loop_detection": {
    "ngram_size": 5,
    "max_repeats": 2,
    "detect_identical_tool_calls": true,
    "max_state_visits": 3
  },
  "retry": { "max_attempts": 2 },
  "store": { "ttl_ms": 600000, "history_limit": 50 }
}

Schema: schema/config.schema.json.

Field reference

| Field |----------------------------------|-- | limits.max_steps | limits.max_tokens_per_step | limits.max_total_tokens | limits.output_min | forbidden_patterns | tool_calls.allowed | tool_calls.arg_schemas | tool_calls.mutex | tool_calls.blast_radius | tool_calls.sequence | cost.prices | cost.max_dollars_per_task | loop_detection.ngram_size | loop_detection.max_repeats | loop_detection.detect_identica | loop_detection.max_state_visits| | retry.max_attempts | store.ttl_ms | store.history_limit | Type | Default | Description | ---------|-----------------|--------------------------------------------------------------------------| | number | — | Hard cap on steps per task_id before max_steps abort | | number | — | Per-step token cap (sum of tokens_in + tokens_out) | | number | — | Cumulative token cap across the task | / output_max | number | — | Output length bounds — out of range = length_min / length_max retry | | string[] | [] | Regex sources, applied case-insensitive over output | | string[] | — | Tool allowlist; anything else → tool_not_allowed abort | | object | {} | Per-tool JSON schema for args (Ajv compiled, cached) | | string[][] | [] | Groups; calling two from the same group in one task → tool_mutex abort | | object | {} | Per-tool max call count across the task | | array | [] | { tool, requires_prev } rules; missing predecessor = tool_sequence | | object | {} | Per-model input_per_1m / output_per_1m USD | | number | — | Cumulative cap; overflow → cost_cap abort | | number | 5 | Token n-gram window over output history | | number | 2 | Repeats before loop_repeat_output fires | l_tool_calls | boolean | true | Stable JSON of {name, args} vs immediate previous | number | 3 | State-cycle threshold | | number | — | Retry budget; attempt > max_attempts → retry_exhausted abort | | number | 600000 | Per-task state TTL; reclaimed by gc() | | number | 50 | Ring buffer length for output / tool / state histories |

Unknown models in cost.prices are silently skipped (no abort, no cost accumulated). Missing config sections fall back to library defaults.

░▒▓█ API

`createGuard(configOrPath): Guard`

type Guard = {
  check(input: CheckInput): CheckResult;
  reset(task_id: string): void;
  gc(ttl_ms?: number): number;   // returns count of evicted task states
};

`CheckInput`

type CheckInput = {
  task_id: string;
  step?: number;          // optional; auto-incremented
  state?: string;         // for state-cycle detection
  output?: string;        // model text/JSON output
  tool_calls?: { name: string; args: unknown; id?: string }[];
  model?: string;         // pricing key
  tokens_in?: number;
  tokens_out?: number;
  attempt?: number;       // for retry budget
};

`CheckResult`

type CheckResult = {
  status: "ok" | "retry" | "abort";
  reasons: { code: ReasonCode; message: string; meta?: unknown }[];
  metrics: {
    steps: number;
    total_tokens_in: number;
    total_tokens_out: number;
    total_dollars: number;
    tool_counts: Record<string, number>;
    elapsed_ms: number;
  };
};

Status semantics

| Status | Meaning | |----------|-----------------------------------------------------------------------------| | ok | Step accepted; state committed; continue the loop | | retry | Step rejected, recoverable; state not committed; bump attempt + retry | | abort | Step rejected, terminal; state not committed; tear the task down |

Aborts and retries never mutate task state. The store only commits on ok.

Reason codes

| Code | Class | Meaning | |------|-------|---------| | schema_invalid | retry | output failed JSON schema | | forbidden_pattern | retry | regex hit in output | | length_min / length_max| retry | output length out of range | | tool_args_invalid | retry | tool args failed schema | | max_steps | abort | step counter exceeded | | max_tokens_step | abort | per-step token cap exceeded | | max_tokens_total | abort | cumulative token cap exceeded | | tool_not_allowed | abort | tool call outside allowlist | | tool_sequence | abort | sequence rule violated | | tool_mutex | abort | mutually exclusive tools called | | tool_blast_radius | abort | tool called too many times | | cost_cap | abort | dollar cap exceeded | | loop_repeat_output | abort | n-gram repetition detected | | loop_repeat_tool | abort | identical tool call repeated | | loop_state_cycle | abort | state visited too often | | retry_exhausted | abort | retry budget hit |

Reasons accumulate in order. The first reason determines the status (abort > retry > ok).

░▒▓█ CLI

arbiter check --config ./arbiter.config.json < input.json
# stdin: a single CheckInput object as JSON
# stdout: a CheckResult object as JSON
# exit 0 = ok, 1 = retry, 2 = abort, 3 = CLI / config error

arbiter --version
arbiter --help

The CLI is intentionally thin — one in, one out, exit code carries the verdict. Wire it into any orchestrator that can pipe JSON.

░▒▓█ Performance

Measured on a 2 KB output with three tool calls, against the v0.1 reference config, on a single thread:

| Percentile | check() latency | |-----------:|-------------------| | p50 | ~0.003 ms | | p95 | ~0.007 ms | | p99 | ~0.015 ms |

CI gate enforces p95 < 5ms per release. The metrics.elapsed_ms field on every CheckResult lets callers track this in production.

░▒▓█ Limits / non-goals

No semantic drift via embeddings. Intentional — keeps zero AI deps. Use the v0.3 classifier hook with Llama Guard / Lakera if you need it.
No async check() in v0.1. Sync only. v0.3 adds checkAsync for HTTP / in-process classifiers.
No PII / prompt-injection detection in v0.1. Bundled deterministic packs ship in v0.2.
No dialog flow / conversational rails. Different problem — use NeMo Colang.
No LLM-as-judge evals. Use Promptfoo, Braintrust.
Arbiter never generates or rewrites output. It judges. Forever.

░▒▓█ Roadmap

v0.1 — multi-step loop guards, tool-call validation, cost cap, loop detection (current)
v0.2 — bundled deterministic safety packs (PII, prompt-injection, secrets) — regex/heuristic, no AI dep
v0.3 — classifier plug-in hook (HTTP + in-process) — bring your own Llama Guard / Lakera / OpenAI moderation / Presidio
v0.4+ — arbiter replay <log.jsonl> --config new.json, per-tenant quotas, OTel emission

Full plan: docs/tasks.md.

░▒▓█ Documentation

| Doc | What's in it | |----------------------------------------------------|---------------------------------------------------------------| | docs/description.md | Positioning, voice, target users, elevator pitch | | docs/tasks.md | Build sequence, status, post-launch roadmap | | schema/config.schema.json | Ajv-validated config schema (source of truth) | | examples/agent.js | Runnable demo loop hitting four distinct abort codes | | examples/config.json | Realistic reference config |

░▒▓█ License

            ╔═══╗
            ║ ⊗ ║   "JUDGE. BUDGET. HALT."
            ╚═══╝