@agentlair/tbrm
v0.1.0
Published
Predictions as bonded claims. Brier compounds over resolved calls. Tracked Brier Reputation Metric primitive — verifier SDK for the AgentLair TBRM layer.
Downloads
66
Maintainers
Readme
@agentlair/tbrm
Predictions as bonded claims. Brier compounds over resolved calls.
What is TBRM?
Tracked Brier Reputation Metric.
Each prediction carries a confidence in [0, 1], a deadline, and a verification method. When the deadline passes, the prediction resolves correct, wrong, or unmeasurable. The Brier delta updates the agent's tracked reputation metric.
There is no capital lock. The stake is the public, monotonically growing cost of bad calibration. A confident-and-wrong call costs you 81× what the same confidence would have earned if you'd been right ([email protected] → 0.01, [email protected] → 0.81). Brier punishes overconfidence harder than it rewards it — the asymmetry is the whole point.
This package ships the pure-functional substrate: types for predictions and resolutions, Brier scoring, weighted aggregates, per-vehicle breakdowns, and a selection-bias detector. It is what the AgentLair worker uses internally; v0.1.0 publishes it so any agent-trust integrator can compute the same numbers from the same primitives.
Install
npm install @agentlair/tbrm
# or
bun add @agentlair/tbrmZero runtime dependencies. Pure functions over plain types — works in Node 18+, Bun, Deno, browsers, and edge runtimes.
Worked example
10 predictions filed, 5 resolved by deadline.
import {
brierScore,
weightedBrierScore,
byVehicle,
selectionBias,
type Prediction,
type ResolvedPrediction,
} from '@agentlair/tbrm';
// Five still-open predictions, awaiting their deadlines.
const open: Prediction[] = [
{ id: 'p06', vehicle: 'cold-email-batch', claim: 'reply within 7d',
confidence: 0.40, deadline: '2026-06-01T00:00:00Z', vm: 'auto',
createdAt: '2026-05-25T00:00:00Z' },
{ id: 'p07', vehicle: 'npm-package-publish', claim: '≥10 weekly downloads in 30d',
confidence: 0.40, deadline: '2026-06-04T00:00:00Z', vm: 'web',
createdAt: '2026-05-05T00:00:00Z' },
{ id: 'p08', vehicle: 'cold-email-batch', claim: 'paying customer in 30d',
confidence: 0.20, deadline: '2026-06-15T00:00:00Z', vm: 'manual',
createdAt: '2026-05-15T00:00:00Z' },
{ id: 'p09', vehicle: 'content-publish', claim: '≥100 unique visitors in 14d',
confidence: 0.35, deadline: '2026-06-12T00:00:00Z', vm: 'web',
createdAt: '2026-05-29T00:00:00Z' },
{ id: 'p10', vehicle: 'content-publish', claim: 'cited by an AI engine within 60d',
confidence: 0.30, deadline: '2026-07-30T00:00:00Z', vm: 'manual',
createdAt: '2026-05-30T00:00:00Z' },
];
// Five resolved predictions across three vehicles.
const resolved: ResolvedPrediction[] = [
{ id: 'p01', vehicle: 'cold-email-batch', claim: 'reply within 7d',
confidence: 0.40, deadline: '2026-04-30T00:00:00Z', vm: 'auto',
createdAt: '2026-04-23T00:00:00Z',
resolution: 'correct', resolvedAt: '2026-04-29T12:00:00Z' }, // brier 0.36
{ id: 'p02', vehicle: 'cold-email-batch', claim: 'meeting booked',
confidence: 0.30, deadline: '2026-04-30T00:00:00Z', vm: 'auto',
createdAt: '2026-04-23T00:00:00Z',
resolution: 'wrong', resolvedAt: '2026-04-30T00:00:00Z' }, // brier 0.09
{ id: 'p03', vehicle: 'npm-package-publish', claim: '≥4 weekly downloads in 14d',
confidence: 0.50, deadline: '2026-04-15T00:00:00Z', vm: 'web',
createdAt: '2026-04-01T00:00:00Z',
resolution: 'correct', resolvedAt: '2026-04-15T00:00:00Z' }, // brier 0.25
{ id: 'p04', vehicle: 'content-publish', claim: 'cited by an AI engine in 14d',
confidence: 0.20, deadline: '2026-04-20T00:00:00Z', vm: 'manual',
createdAt: '2026-04-06T00:00:00Z',
resolution: 'unmeasurable', resolvedAt: '2026-04-20T00:00:00Z' }, // brier null (excluded)
{ id: 'p05', vehicle: 'content-publish', claim: '≥50 unique visitors in 7d',
confidence: 0.45, deadline: '2026-04-15T00:00:00Z', vm: 'web',
createdAt: '2026-04-08T00:00:00Z',
resolution: 'wrong', resolvedAt: '2026-04-15T00:00:00Z' }, // brier 0.2025
];
// ── Aggregate Brier across all resolved ──
const agg = weightedBrierScore(resolved);
console.log(agg);
// {
// brier: 0.225625, // mean over 4 measurable: (0.36 + 0.09 + 0.25 + 0.2025) / 4
// n: 5, // includes the unmeasurable
// correct: 2,
// wrong: 2,
// unmeasurable: 1,
// }
// ── Per-vehicle breakdown — the actual learning signal ──
const buckets = byVehicle(resolved);
console.log(buckets);
// {
// 'cold-email-batch': { brier: 0.225, n: 2 }, // (0.36 + 0.09) / 2
// 'npm-package-publish': { brier: 0.25, n: 1 },
// 'content-publish': { brier: 0.2025, n: 2 }, // unmeasurable excluded; n includes it
// }
// ── Selection bias check ──
const bias = selectionBias(open, resolved);
console.log(bias);
// {
// resolveRate: 0.5, // 5 resolved / 10 total
// avgConfidenceResolved: 0.37, // (0.4 + 0.3 + 0.5 + 0.2 + 0.45) / 5
// // warning omitted — resolveRate >= 0.20
// }API
brierScore(p: ResolvedPrediction): number | null
Brier for a single resolved prediction.
| Resolution | Formula | Range |
|----------------|-------------------|--------|
| correct | (1 - confidence)² | [0, 1] |
| wrong | confidence² | [0, 1] |
| unmeasurable | returns null (excluded from the mean) |
weightedBrierScore(preds: ResolvedPrediction[]): BrierAggregate
Equal-weight mean Brier across the input list. Returns:
interface BrierAggregate {
brier: number; // mean over correct+wrong; 0 if none
n: number; // total preds (includes unmeasurable)
correct: number;
wrong: number;
unmeasurable: number;
}unmeasurable predictions are excluded from brier but counted in n — that's intentional. It surfaces the selection-bias failure mode where confident claims pile up in unmeasurable and the visible Brier looks healthy because only safe bets ever resolve.
v0.1.0 ships equal weights; recency decay is deferred to v0.2.
byVehicle(preds: ResolvedPrediction[]): Record<string, { brier: number; n: number }>
Aggregates Brier per vehicle field. Per-vehicle Brier is the actual learning signal — overall Brier blurs unrelated activities. Predictions with a missing or empty vehicle land in the 'unknown' bucket.
selectionBias(open: Prediction[], resolved: ResolvedPrediction[]): SelectionBiasReport
Detects the most common calibration anti-pattern: most confident claims either stay open past their deadline or end up unmeasurable. The visible Brier looks fine — but it only reflects the safe-bet subset that actually resolved.
interface SelectionBiasReport {
avgConfidenceResolved: number; // mean confidence of resolved predictions
resolveRate: number; // resolved / (open + resolved)
warning?: string; // set when resolveRate < 0.20
}The 0.20 threshold is the practical floor for usable signal — below that, the Brier mean is a noisy estimate of a biased subsample. The warning string is human-readable and safe to log directly.
Calibration anti-pattern: selection bias
Most calibration loops fail not because the math is wrong, but because the resolution path is missing. Predictions get filed at confidence 0.7, 0.8, 0.9 — but never resolve, because there's no shell command, no API, no reachable URL that can verify them. They stay open forever, then auto-promote to unmeasurable past their deadline.
Meanwhile, the predictions that DO resolve are the easy ones. URL-pingable, count-summable, deterministic. They tend to be filed at lower confidence (0.3–0.5) because they're known-uncertain. So the visible Brier looks fine — 0.10 weighted, well under chance — but it's a snapshot of a biased subsample. The actual track record is unknown.
selectionBias makes that visible. Below 20% resolve rate, the warning fires; the Brier you can see is not the Brier you have.
The fix is upstream: file fewer predictions that lack a resolution path. --verification-method auto with an executable command, or --verification-method web with a URL the resolver can fetch. The manual path is honest, but in practice it stays open indefinitely.
Spec
- TBRM spec: https://agentlair.dev/specs/tbrm
- AgentLair: https://agentlair.dev
- Source: https://github.com/piiiico/agentlair-primitives
License
Apache-2.0
