@agentlair/tbrm

v0.1.0

Published

2 months ago

Predictions as bonded claims. Brier compounds over resolved calls. Tracked Brier Reputation Metric primitive — verifier SDK for the AgentLair TBRM layer.

Downloads

0High
0Medium
0Low

piiiico

agentlair agent-trust tbrm bptr brier calibration reputation behavioral-attestation

@agentlair/tbrm

Predictions as bonded claims. Brier compounds over resolved calls.

What is TBRM?

Tracked Brier Reputation Metric.

Each prediction carries a confidence in [0, 1], a deadline, and a verification method. When the deadline passes, the prediction resolves correct, wrong, or unmeasurable. The Brier delta updates the agent's tracked reputation metric.

There is no capital lock. The stake is the public, monotonically growing cost of bad calibration. A confident-and-wrong call costs you 81× what the same confidence would have earned if you'd been right ([email protected] → 0.01, [email protected] → 0.81). Brier punishes overconfidence harder than it rewards it — the asymmetry is the whole point.

This package ships the pure-functional substrate: types for predictions and resolutions, Brier scoring, weighted aggregates, per-vehicle breakdowns, and a selection-bias detector. It is what the AgentLair worker uses internally; v0.1.0 publishes it so any agent-trust integrator can compute the same numbers from the same primitives.

Install

npm install @agentlair/tbrm
# or
bun add @agentlair/tbrm

Zero runtime dependencies. Pure functions over plain types — works in Node 18+, Bun, Deno, browsers, and edge runtimes.

Worked example

10 predictions filed, 5 resolved by deadline.

import {
  brierScore,
  weightedBrierScore,
  byVehicle,
  selectionBias,
  type Prediction,
  type ResolvedPrediction,
} from '@agentlair/tbrm';

// Five still-open predictions, awaiting their deadlines.
const open: Prediction[] = [
  { id: 'p06', vehicle: 'cold-email-batch',     claim: 'reply within 7d',
    confidence: 0.40, deadline: '2026-06-01T00:00:00Z', vm: 'auto',
    createdAt: '2026-05-25T00:00:00Z' },
  { id: 'p07', vehicle: 'npm-package-publish',  claim: '≥10 weekly downloads in 30d',
    confidence: 0.40, deadline: '2026-06-04T00:00:00Z', vm: 'web',
    createdAt: '2026-05-05T00:00:00Z' },
  { id: 'p08', vehicle: 'cold-email-batch',     claim: 'paying customer in 30d',
    confidence: 0.20, deadline: '2026-06-15T00:00:00Z', vm: 'manual',
    createdAt: '2026-05-15T00:00:00Z' },
  { id: 'p09', vehicle: 'content-publish',      claim: '≥100 unique visitors in 14d',
    confidence: 0.35, deadline: '2026-06-12T00:00:00Z', vm: 'web',
    createdAt: '2026-05-29T00:00:00Z' },
  { id: 'p10', vehicle: 'content-publish',      claim: 'cited by an AI engine within 60d',
    confidence: 0.30, deadline: '2026-07-30T00:00:00Z', vm: 'manual',
    createdAt: '2026-05-30T00:00:00Z' },
];

// Five resolved predictions across three vehicles.
const resolved: ResolvedPrediction[] = [
  { id: 'p01', vehicle: 'cold-email-batch',    claim: 'reply within 7d',
    confidence: 0.40, deadline: '2026-04-30T00:00:00Z', vm: 'auto',
    createdAt: '2026-04-23T00:00:00Z',
    resolution: 'correct', resolvedAt: '2026-04-29T12:00:00Z' },          // brier 0.36

  { id: 'p02', vehicle: 'cold-email-batch',    claim: 'meeting booked',
    confidence: 0.30, deadline: '2026-04-30T00:00:00Z', vm: 'auto',
    createdAt: '2026-04-23T00:00:00Z',
    resolution: 'wrong', resolvedAt: '2026-04-30T00:00:00Z' },            // brier 0.09

  { id: 'p03', vehicle: 'npm-package-publish', claim: '≥4 weekly downloads in 14d',
    confidence: 0.50, deadline: '2026-04-15T00:00:00Z', vm: 'web',
    createdAt: '2026-04-01T00:00:00Z',
    resolution: 'correct', resolvedAt: '2026-04-15T00:00:00Z' },          // brier 0.25

  { id: 'p04', vehicle: 'content-publish',     claim: 'cited by an AI engine in 14d',
    confidence: 0.20, deadline: '2026-04-20T00:00:00Z', vm: 'manual',
    createdAt: '2026-04-06T00:00:00Z',
    resolution: 'unmeasurable', resolvedAt: '2026-04-20T00:00:00Z' },     // brier null (excluded)

  { id: 'p05', vehicle: 'content-publish',     claim: '≥50 unique visitors in 7d',
    confidence: 0.45, deadline: '2026-04-15T00:00:00Z', vm: 'web',
    createdAt: '2026-04-08T00:00:00Z',
    resolution: 'wrong', resolvedAt: '2026-04-15T00:00:00Z' },            // brier 0.2025
];

// ── Aggregate Brier across all resolved ──
const agg = weightedBrierScore(resolved);
console.log(agg);
// {
//   brier: 0.225625,   // mean over 4 measurable: (0.36 + 0.09 + 0.25 + 0.2025) / 4
//   n: 5,              // includes the unmeasurable
//   correct: 2,
//   wrong: 2,
//   unmeasurable: 1,
// }

// ── Per-vehicle breakdown — the actual learning signal ──
const buckets = byVehicle(resolved);
console.log(buckets);
// {
//   'cold-email-batch':    { brier: 0.225,   n: 2 },  // (0.36 + 0.09) / 2
//   'npm-package-publish': { brier: 0.25,    n: 1 },
//   'content-publish':     { brier: 0.2025,  n: 2 },  // unmeasurable excluded; n includes it
// }

// ── Selection bias check ──
const bias = selectionBias(open, resolved);
console.log(bias);
// {
//   resolveRate: 0.5,                 // 5 resolved / 10 total
//   avgConfidenceResolved: 0.37,      // (0.4 + 0.3 + 0.5 + 0.2 + 0.45) / 5
//   // warning omitted — resolveRate >= 0.20
// }

API

`brierScore(p: ResolvedPrediction): number | null`

Brier for a single resolved prediction.

| Resolution | Formula | Range | |----------------|-------------------|--------| | correct | (1 - confidence)² | [0, 1] | | wrong | confidence² | [0, 1] | | unmeasurable | returns null (excluded from the mean) |

`weightedBrierScore(preds: ResolvedPrediction[]): BrierAggregate`

Equal-weight mean Brier across the input list. Returns:

interface BrierAggregate {
  brier: number;          // mean over correct+wrong; 0 if none
  n: number;              // total preds (includes unmeasurable)
  correct: number;
  wrong: number;
  unmeasurable: number;
}

unmeasurable predictions are excluded from brier but counted in n — that's intentional. It surfaces the selection-bias failure mode where confident claims pile up in unmeasurable and the visible Brier looks healthy because only safe bets ever resolve.

v0.1.0 ships equal weights; recency decay is deferred to v0.2.

`byVehicle(preds: ResolvedPrediction[]): Record<string, { brier: number; n: number }>`

Aggregates Brier per vehicle field. Per-vehicle Brier is the actual learning signal — overall Brier blurs unrelated activities. Predictions with a missing or empty vehicle land in the 'unknown' bucket.

`selectionBias(open: Prediction[], resolved: ResolvedPrediction[]): SelectionBiasReport`

Detects the most common calibration anti-pattern: most confident claims either stay open past their deadline or end up unmeasurable. The visible Brier looks fine — but it only reflects the safe-bet subset that actually resolved.

interface SelectionBiasReport {
  avgConfidenceResolved: number;  // mean confidence of resolved predictions
  resolveRate: number;            // resolved / (open + resolved)
  warning?: string;               // set when resolveRate < 0.20
}

The 0.20 threshold is the practical floor for usable signal — below that, the Brier mean is a noisy estimate of a biased subsample. The warning string is human-readable and safe to log directly.

Calibration anti-pattern: selection bias

Most calibration loops fail not because the math is wrong, but because the resolution path is missing. Predictions get filed at confidence 0.7, 0.8, 0.9 — but never resolve, because there's no shell command, no API, no reachable URL that can verify them. They stay open forever, then auto-promote to unmeasurable past their deadline.

Meanwhile, the predictions that DO resolve are the easy ones. URL-pingable, count-summable, deterministic. They tend to be filed at lower confidence (0.3–0.5) because they're known-uncertain. So the visible Brier looks fine — 0.10 weighted, well under chance — but it's a snapshot of a biased subsample. The actual track record is unknown.

selectionBias makes that visible. Below 20% resolve rate, the warning fires; the Brier you can see is not the Brier you have.

The fix is upstream: file fewer predictions that lack a resolution path. --verification-method auto with an executable command, or --verification-method web with a URL the resolver can fetch. The manual path is honest, but in practice it stays open indefinitely.

Spec

TBRM spec: https://agentlair.dev/specs/tbrm
AgentLair: https://agentlair.dev
Source: https://github.com/piiiico/agentlair-primitives

License

Apache-2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@agentlair/tbrm

What is TBRM?

Install

Worked example

API

brierScore(p: ResolvedPrediction): number | null

weightedBrierScore(preds: ResolvedPrediction[]): BrierAggregate

byVehicle(preds: ResolvedPrediction[]): Record<string, { brier: number; n: number }>

selectionBias(open: Prediction[], resolved: ResolvedPrediction[]): SelectionBiasReport

Calibration anti-pattern: selection bias

Spec

License

`brierScore(p: ResolvedPrediction): number | null`

`weightedBrierScore(preds: ResolvedPrediction[]): BrierAggregate`

`byVehicle(preds: ResolvedPrediction[]): Record<string, { brier: number; n: number }>`

`selectionBias(open: Prediction[], resolved: ResolvedPrediction[]): SelectionBiasReport`