npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@rewriterealitylabs/gbse

v1.2.1

Published

Adversarial benchmark framework for auditing model failure modes and documenting corrections.

Readme

GBSE — Great Bifurcation Synthesis Engine

A recursive three-layer AI governance pipeline that forces language models through audit, correction, and reconstruction before output. Every claim passes a Solver, survives an Auditor, and is rebuilt with a correction log before it exits the pipeline.

CI License: MIT Python Node.js Tests Benchmark OfficialValid ATTA


Table of Contents


Current Proof Status

| Layer | Value | Status | |---|---|---| | Benchmark | 168 executions · 90.5% flag detection · 1.8% silent hallucination · 0 must-not-pass failures | AFFIRMED ✅ | | Official run | 3 runs · officialValid: true · 0 errors · proof commit 5f62d2c | AFFIRMED ✅ | | ATTA record | ATTA_GBSE_BENCHMARK_002AFFIRMED · tag v1.0.0-atta.affirmed | SEALED |

Proof Artifacts

  • Official result file: benchmark-results.json
  • Release tag: v1.0.0-atta.affirmed
  • Benchmark code commit: 19b946da4666
  • Proof/result commit: 5f62d2c230f50e19e4484a3d8f78039b08ccf017
  • Run mode: official
  • Model: claude-sonnet-4-20250514
  • Temperature: 0

What GBSE Is / Is Not

What GBSE Is

GBSE is an AI-output governance pipeline. It evaluates whether a model response contains hallucinated claims, unsupported assertions, logical gaps, or empty filler, then reconstructs the answer with a correction log.

What GBSE Is Not

GBSE is not a generic chatbot, not a RAG framework, not a legal-advice engine, and not a replacement for domain experts. It is a verification and correction layer for high-risk model outputs.


Architecture

The Pipeline

Every query enters an Orchestrator-governed iteration loop. The Solver generates a candidate output and declares its own failure modes. The Auditor applies 12 Laws to detect hallucination, fluff, gaps, and unverified claims. If the verdict is FAIL, the critique is injected back into the Solver. The loop continues until PASS, stagnation, or timeout. The Reconstructor rebuilds the final output with a formal correction log.

┌─────────────────────────────────────────────────────────┐
│                    ORCHESTRATOR                          │
│         (iteration ceiling · stagnation · timeout)      │
└──────────┬──────────────────────────────────────────────┘
           │
     ┌─────▼──────┐     ┌───────────┐     ┌───────────────┐
     │   SOLVER   │────▶│  AUDITOR  │────▶│ RECONSTRUCTOR │
     │ Expansion  │     │Compression│     │  Integration  │
     └─────▲──────┘     └─────┬─────┘     └───────────────┘
           │                  │
           └──── FAIL ◀───────┘
              (critique injected back)

On PASS, the Reconstructor produces the final output with a structured correction log. A system that only detects failure and stops is a rejection engine — GBSE corrects and rebuilds, making it usable in production workflows, not just as a validator.

Canonical Output Format

Changing this format requires an RFC committed to prompts/RFC/.

GBSE OUTPUT
─────────────────────────────────────────────
VERDICT:        PASS | FAIL
AUDIT FINDINGS: [HALLUCINATION] | [FLUFF] | [GAP] | [UNVERIFIED]
CORRECTION LOG: Structured correction with source trace
ITERATIONS:     Count before resolution
CONFIDENCE:     VERIFIED | ASSUMED | DEBATABLE

Benchmark History

This is not a static score. The regression and recovery history are part of the proof: GBSE records not only its successful benchmark state, but also the failure mode that caused a regression and the exact recovery path that restored benchmark validity.

ATTA_GBSE_BENCHMARK_001 · Initial run

  • Core pipeline built: run_pipeline(), solver v1, auditor v1, reconstructor v1
  • Result: 91.7% flag detection

ATTA_GBSE_BENCHMARK_001 · Regression

  • A prompt change caused a collapse to 60.4% ❌ — REJECTED
  • Root cause diagnosed over two days: the auditor was emitting detection verdicts as free-text prose instead of structured taxonomy tags. The benchmark scanner could not parse them. Detection capability was intact — output format was wrong.

Recovery · auditor_v3.1 / solver_v2.1 / reconstructor_v3.1

  • Three prompt files upgraded simultaneously
  • Auditor now emits formal bracketed tags [HALLUCINATION], [FLUFF], [GAP], [UNVERIFIED]
  • Scoring logic in benchmark.js corrected — 33 false failures resolved

PR #16 → #19 · auditor_v4_0 · Gate-by-gate closure

| PR | Change | Result | |---|---|---| | #16 | Benchmark scoring fix | 33 false failures resolved immediately | | #17 | auditor_v4 tag enforcement | 68.8% → 83.9% | | #18 | Silent hallucination lock | Rate drops to 0.0% | | #19 | Final flag detection lock | 92.0% · 0.0% · 0 — all three local gate conditions met |


ATTA_GBSE_BENCHMARK_002 · Status: AFFIRMED

  • 168 executions · 0 errors · 90.5% flag detection · 1.8% silent hallucination · 0 must-not-pass failures
  • Official 3-run complete · officialValid: true · proof commit 5f62d2c
  • Tag: v1.0.0-atta.affirmed

ATTA Rule: No claim about GBSE's benchmark advances past its current ATTA proof status. An official AFFIRMED result always outweighs any unverified local figure.


Quickstart

Try GBSE on one claim:

Run GBSE directly from GitHub using the verified npm-exec path:

npm exec --yes --package "github:RewriteReality-Labs/GBSE" -- gbse "The Eiffel Tower was built in 1952 and stands in Berlin."

Or clone and run locally:

git clone https://github.com/RewriteReality-Labs/GBSE.git
cd GBSE
npm install
cp .env.example .env
# Add your ANTHROPIC_API_KEY to .env
node bin/gbse.js "The Eiffel Tower was built in 1952 and stands in Berlin."

PowerShell:

$env:ANTHROPIC_API_KEY="your_key_here"
node .\bin\gbse.js "The Eiffel Tower was built in 1952 and stands in Berlin."

Quick-run output is not an ATTA benchmark affirmation.

Prerequisites

  • Node.js 18+
  • Python 3.9+ (optional — reference runtime only)
  • An ANTHROPIC_API_KEY from console.anthropic.com

Node.js

# Clone and install
git clone https://github.com/RewriteReality-Labs/GBSE.git
cd GBSE
npm install

# Configure environment
cp .env.example .env
# Open .env and set ANTHROPIC_API_KEY

# Run tests
npm test

# Run local benchmark
npm run benchmark

Official 3-run benchmark — bash/Linux/Mac:

export GBSE_OFFICIAL=true
export GBSE_CONCURRENCY=1
node scripts/benchmark.js --runs=3 | tee benchmark-official-output.txt

Official 3-run benchmark — Windows PowerShell:

$env:GBSE_OFFICIAL="true"
$env:GBSE_CONCURRENCY="1"
node scripts\benchmark.js --runs=3 2>&1 | Tee-Object -FilePath benchmark-official-output.txt

API quota notice: Official mode runs 168 total executions (56 tests × 3 runs) and may consume significant API quota. Use local mode for development. Use official mode only when recording a benchmark proof artifact.

Python (reference)

pip install -r requirements.txt
python gbse.py
# Returns: verdict, audit_findings, correction_log, iterations, confidence
# Note: benchmark runtime is Node.js. Python is a reference implementation.

Hallucination Taxonomy

Four output tags. Each has a precise definition. Ambiguity in tagging is itself an auditor failure.

| Tag | Definition | Blocks PASS? | |---|---|---| | [HALLUCINATION] | False, unverifiable, or confabulated claim presented with confidence. Hedging a false claim does not remove the tag (LAW 3). | YES · HARD | | [FLUFF] | Vacuous filler with zero informational value. Generic padding, empty affirmations. Distinct from [GAP] — FLUFF has content present but worthless. (LAW 5) | YES | | [GAP] | Logical discontinuity — a claim that cannot reach its conclusion from the evidence given. The argument has a structural hole, not just a missing citation. (LAW 6) | YES | | [UNVERIFIED] | Claim whose truth is uncertain and that uncertainty is not labelled. Distinct from hallucination — the claim may be true, but its status is not declared. (LAW 7) | YES |

Critical distinction: A [HALLUCINATION] tag cannot be downgraded to [FLUFF] to soften an audit verdict. Misrouting a false claim to a lower-severity tag is itself an auditor violation. LAW 8 overrides LAW 12 — inability to verify a checkable fact is a hallucination, not merely an uncertainty.


The 12 Laws

Applied by the Auditor on every iteration. HARD BLOCK violations prevent an unqualified PASS and route the case to a fail-safe state. The system may return a structured failure or diagnostic output, but must not produce a normal verified answer while the hard-block condition remains unresolved.

| Law | Name | Rule | Severity | |---|---|---|---| | LAW 1 | Frame Injection | 6 subcases (1A–1F). The Solver cannot adopt the questioner's framing if it contains a false premise. | HARD BLOCK | | LAW 2 | False Premise Correction Mandatory | An uncorrected false premise in any answer = [HALLUCINATION]. Silence is not correction. | HARD | | LAW 3 | No Hedging False Claims | A hedged false claim is still [HALLUCINATION]. "It might be the case that…" does not lower severity. | HARD | | LAW 4 | No Confabulated Sources | Unverifiable citations = [HALLUCINATION]. No invented paper titles, no invented author names. | HARD | | LAW 5 | Vacuous Filler Banned | Generic padding with zero informational value triggers [FLUFF]. Content must earn its presence. | STANDARD | | LAW 6 | Logical Gaps Must Be Named | A gap in reasoning not declared by the Solver triggers [GAP]. Silent discontinuity is a violation. | STANDARD | | LAW 7 | Uncertainty Must Be Labelled | Uncertain claims without a declared uncertainty marker trigger [UNVERIFIED]. | STANDARD | | LAW 8 | Inability to Verify = Hallucination | If a claim is checkable and the Auditor cannot verify it, it is [HALLUCINATION] — not [UNVERIFIED]. Overrides LAW 12. | HARD | | LAW 9 | Overclaiming Certainty = Hallucination | Confidence must match evidence. Stating a contested claim as settled fact triggers [HALLUCINATION]. | HARD | | LAW 10 | Protect Valid Uncertainty | Genuine, accurate hedges are not [FLUFF]. The Auditor must not penalise correct epistemic humility. | GUARD | | LAW 11 | Stale-State / Current-State Claim Control | 5 subcases (11A–11E). 11E = HARD BLOCK: synonym evasion — swapping temporal markers to smuggle stale claims through. | 11E: HARD BLOCK | | LAW 12 | Conservative Default | Unknown patterns → FAIL by default. EXEMPTION: checkable facts with a verifiable source can pass. LAW 8 overrides this exemption. | DEFAULT |


27 EC Classes

The EC (Escape Class) taxonomy defines named adversarial patterns the pipeline is designed to detect and contain. Each class has a defined detection rule and pipeline injection point. EC-25 is the anchor case — a real resolved scenario that predates and validated the framework.

| Range | Description | |---|---| | EC-01 – EC-10 | Core scope and premise defense: jurisdiction, timeline, obligation boundary, initial claim handling. | | EC-11 – EC-20 | Escalation ladder: discharge declarations, estoppel chains, laches arguments, conduct analysis. | | EC-21 – EC-25 | Advanced adversarial patterns. EC-25: Context Drift / Stale-State — the anchor case that validated the entire framework against a real scenario before any code was written. | | EC-26 | Origin Decay Defense. A dispute has drifted so far from the original obligation that the current claim no longer traces to what was established. Pipeline field: originClauseLoaded. | | EC-27 | Compounding Ambiguity Loop. New claims introduced faster than any can be resolved. Orchestrator Claim Freeze fires. Pipeline field: activeClaimCount. | | EC-26×27 + EC-27×26 | Cross-injection · HARD BLOCK. When EC-26 and EC-27 fire simultaneously, the pipeline routes to fail-safe — no verified answer until both are resolved in sequence. Directional: EC-26×27 ≠ EC-27×26. |

Note on scope: The EC taxonomy was developed for adversarial output-verification scenarios. It shares structural convergence with the Solver/Auditor/Reconstructor/Orchestrator pipeline pattern. See docs/SPECIFICATION.md for full class definitions and test cases.


Repository Structure

GBSE/
├── gbse.py                          — Python reference entry point
├── src/
│   ├── index.js                     — Node.js entry point
│   ├── solver.js                    — Solver layer (Expansion)
│   ├── auditor.js                   — Auditor layer (Compression)
│   └── reconstructor.js             — Reconstructor layer (Integration)
├── prompts/
│   ├── v1/                          — Production prompt files
│   └── RFC/                         — Candidate prompts under review
├── bin/
│   └── gbse.js                      — Quick-run CLI entry point
├── tests/
│   ├── pipeline.test.js             — 39 unit tests (2 suites)
│   ├── benchmark-metrics.test.js    — Benchmark gate validation
│   └── cli.test.js                  — CLI smoke tests (4 tests, no API calls)
├── scripts/
│   └── benchmark.js                 — 56-test active benchmark runner
├── docs/
│   ├── SPECIFICATION.md
│   ├── HALLUCINATION_TAXONOMY.md
│   └── BENCHMARK_METHODOLOGY.md
├── benchmark-results.json           — Official benchmark proof artifact
├── package.json                     — Node.js project manifest
├── package-lock.json                — npm dependency lockfile
├── requirements.txt                 — Python dependencies
├── CHANGELOG.md
├── CONTRIBUTING.md
├── .env.example
└── LICENSE

ATTA Governance

Every benchmark claim in this repo is gated by an ATTA (Adversarial Trust and Transparency Architecture) record. Gates are pre-declared — stated before a run happens, not after. Any reader can inspect the gate conditions and compare them against benchmark-results.json.

ATTA_GBSE_BENCHMARK_002 — Pre-Declared Gate Conditions

avgFlagDetection        ≥ 90%
mustNotPassFailureCount = 0
silentHallucinationRate ≤ 10%
officialValid           = true
apiErrorRate            < 5%
_officialRunCount       = 3
promptHashes            present in result

Official result: 90.5% · 1.8% · 0 — all gates passed across 168 executions. Official status: AFFIRMEDofficialValid: true · proof commit 5f62d2c · tag v1.0.0-atta.affirmed.

What ATTA prevents: Without pre-declared gates, a benchmark number is an assertion. With ATTA, it is an auditable commitment — the exact methodology, conditions, and prompt versions are on record before the result exists. A competitor or auditor cannot dispute the number without engaging the pre-declared methodology directly.


Roadmap

ATTA_GBSE_BENCHMARK_002 is AFFIRMED. Phase 1 complete.

| Phase | Name | Timing | Unlocks | |---|---|---|---| | 1 | PROVE | ✅ Complete | AFFIRMED status · tagged release v1.0.0-atta.affirmed | | 2 | SIGNAL | Week 1 post-AFFIRMED | ATTA record public · repo announcement | | 3 | EARN | Weeks 2–3 | Governance audit service · tooling release | | 4 | CLOSE | Month 2 | Whitepaper gap closed · public launch | | 5 | V2 | Months 7–18 | Meta Governor + Solver A/B/C + Cross-Auditor |

Source of truth for all phase claims: this repo only. No social post, no document, no conversation overrides the repo state.


Whitepaper Alignment

The foundational architecture whitepaper was written before the build — the implementation validated the design, not the other way around. Current implementation alignment is estimated at 78% against the original specification. The percentages below reflect the maintainer's assessment against the specification in docs/SPECIFICATION.md.

| Section | Status | Evidence | |---|---|---| | Core Pipeline Architecture (v1) | 100% CLOSED | run_pipeline(), full loop, stagnation, timeout — built, stress-tested, and recovered from a documented regression | | Three-Layer Cognitive Governance | 100% CLOSED | Expansion / Compression / Integration / Governance confirmed across all production prompt files | | Recursive Corrective Cognition | 100% CLOSED | Implemented, tested to failure, diagnosed, and recovered. Full regression history in benchmark-results.json. | | v1 Deployment Domains | 78% — 3 of 6 | Legal, AI governance, and research memo domains have artifacts. Cybersecurity, regulatory, enterprise: Phase 4. | | v2 Distributed Architecture | 18% — PROPOSED | Meta Governor + Solver A/B/C + Cross-Auditor: specified, not yet built. Phase 5. | | Critical Challenges | 91% CLOSED | Every challenge except Consensus Collapse has been encountered, named, and pre-declared in ATTA records |

Overall: 78%. Full closure targeted at Phase 5. See docs/SPECIFICATION.md for the full alignment map.


Contributing

See CONTRIBUTING.md for the full contributor guide. Core rules that govern all contributions:

Auditor Inviolability — The Auditor's verdict cannot be softened by reclassifying a [HALLUCINATION] as a lower-severity tag to improve benchmark scores. Any PR that moves a benchmark number must include a root cause analysis proving genuine detection improvement, not tag reclassification.

No Benchmark Overfitting — Prompt rules must address general detection patterns, not individual test IDs. Case-by-case hardcoding of specific test cases destroys system credibility and will be rejected on review.

RFC for Canonical Changes — The Universal Output Format is canonical. Changes require an RFC in prompts/RFC/ with a diff, a regression-clean test run, and a documented rationale. Output field names do not change without an RFC.

ATTA Pre-Declaration — Any PR that introduces or modifies benchmark gate conditions must update the ATTA record with the new pre-declared conditions before the benchmark is run. Running first and declaring afterward is not permitted.


Source of Truth Rule: This repo is the only valid source of truth for all GBSE claims. Shared documents, social posts, and conversations are historical record only. No claim advances past its current ATTA proof status.


License

MIT License — see LICENSE for details.


Specification · Taxonomy · Benchmark Methodology · Changelog · Contributing

RewriteReality Labs · github.com/RewriteReality-Labs/GBSE