npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

rulith

v0.3.2

Published

An external reasoning board for LLM agents: exact arithmetic, rule-derived conclusions with provenance, consume/produce actions with history. Derived or it didn't happen.

Readme

RULITH

An external reasoning board for LLM agents. Derived or it didn't happen.

rulith — rule + -lith (Greek líthos, "stone") — is a stone tablet for an agent's rules: a working memory with a rule engine. The agent proposes facts, rules, and actions; the board computes deductive closures, does exact arithmetic, tracks consumption/production, and keeps an evidence chain for every conclusion. The agent cannot launder a guess into a result: every claim on the board is derived (closure-backed), an effect (action product), or asserted (bare claim) — and results that rest on bare claims are rejected.

First run through the published package, a 27B local model driving the board from Claude Code: the frontier model supervising the release (Claude Fable 5 Max) had mentally computed 9381274 × 6473 and confidently repeated the wrong answer three times. The board derived the right one. That incident is validation round #27 — the product demoing itself on its own author.

Papers

  • The Driving Floor: When an External Symbolic Reasoning Board Helps an LLM — the empirical study (board vs baseline across quantized local models). PDF · 中文版
  • The Rulith Decision Kernel: Proof-Carrying Decisions for Autonomous Agents — the whitepaper (the trust invariants + the commitment ladder). PDF · 中文版

Why

LLMs assert; they do not prove. For work where an unverified step is expensive — audits, multi-step analysis, planning, anything that carries state — the fix is not a smarter model but a surface the model has to show its work on, symbolically:

  • Deductive closure — the agent proposes facts and rules; the board derives every consequence to fixpoint and tags it [derived] with an evidence chain. A finding(...) must be derived from primitive observations, not asserted — bare findings block record_result. No way to claim a result without showing it.
  • Exact-or-fail arithmetic — integer math is exact within ±2^53; overflow, NaN, and silent precision loss fail loudly instead of rounding. The sharpest case of "show your work" — the model never does arithmetic in its head.
  • Actions with history — consume/produce transformations archive what they consume and record an event (binding, consumed, produced). The board keeps the process, not just the end state.
  • Truth maintenance — retract an input and everything resting on it falls; declare a key functional and contradictions taint downstream conclusions as disputed.
  • Teaching errors — every rejection explains how to fix the call. Validated to keep 27B-class local models productive.
  • No model, no GPU, no network — rulith never calls an LLM. It is a pure local kernel (Node ≥ 20, two pure-JS dependencies) that the agent drives over MCP stdio.

Install

As a Claude Code / Cowork plugin (MCP server + skill in one step):

/plugin marketplace add rulith-dev/rulith
/plugin install rulith@rulith

As a bare MCP server in Claude Code:

claude mcp add rulith -- npx -y rulith

Or in any MCP host, project-scoped .mcp.json:

{
  "mcpServers": {
    "rulith": { "command": "npx", "args": ["-y", "rulith"] }
  }
}

Optional persistence across sessions: set env RULITH_DB to a .jsonl file path. Without it, the board lives and dies with the session.

Quickstart

rulith is driven by an agent over MCP: the agent proposes facts and rules, the board derives the rest — and shows its work. A minimal session — line-item costs the model can't fudge:

1. Open a board. create_space { "title": "invoice" } returns a space id.

2. Assert the line items + a costing rule in one update_working_memory call:

{
  "operations": [
    { "op": "assert_fact", "id": "line_widget", "predicate": "line", "args": { "item": "widget", "unit": 1299, "qty": 7 } },
    { "op": "assert_fact", "id": "line_gasket", "predicate": "line", "args": { "item": "gasket", "unit": 4500, "qty": 12 } },
    { "op": "add_axiom", "id": "ax_cost", "label": "cost = unit * qty",
      "when": [
        { "predicate": "line", "args": { "item": "?i", "unit": "?u", "qty": "?q" } },
        { "predicate": "mul",  "args": { "left": "?u", "right": "?q", "result": "?t" } }
      ],
      "then": [{ "predicate": "cost", "args": { "item": "?i", "total": "?t" } }] }
  ]
}

mul is a built-in arithmetic predicate: the board computes 1299 × 7 and 4500 × 12 exactly (BigInt-checked) and binds ?t.

3. Read the board. get_logic_context returns cost(widget, 9093) and cost(gasket, 54000), each tagged [derived] with an evidence chain back to its line fact. The model never did the arithmetic — the board did, and it cannot be off by a digit.

From here: roll the costs into a total with derive_aggregate (sum), guard a budget with the gt built-in, or consume/produce inventory with define_action. Open goals come back with teaching hints (needs via <rule>: ...) naming the missing fact. And record_result on a bare assertion — rather than a derived fact — is rejected: show your work, or get nothing.

Beyond arithmetic — what else the board enforces

The same board does three more things an LLM can't be trusted to do by feel:

Actions leave a trail. define_action describes a consume/produce transformation — negated effects are consumed, positive ones produced — and apply_action runs it, recording an event:

{ "op": "define_action", "id": "craft", "action": "craft_sword", "label": "spend 2 gold, get 1 sword",
  "preconditions": [
    { "predicate": "have", "args": { "item": "gold", "qty": "?q" } },
    { "predicate": "gte", "args": { "left": "?q", "right": 2 } },
    { "predicate": "sub", "args": { "left": "?q", "right": 2, "result": "?rest" } }
  ],
  "effects": [
    { "predicate": "have", "args": { "item": "gold", "qty": "?q" }, "negated": true },
    { "predicate": "have", "args": { "item": "gold", "qty": "?rest" } },
    { "predicate": "have", "args": { "item": "sword", "qty": 1 } }
  ] }

With have(gold, 3) on the board, apply_action leaves have(gold, 1) and have(sword, 1) plus a consumed/produced event — the board keeps the process, not just the end state. simulate_action previews the same delta without committing.

Contradictions taint, they don't merge. Declare a key functional and a clash can't slip through:

{ "op": "assert_fact", "predicate": "functional_dependency", "args": { "predicate": "cost", "key": "item" } }

Now if two sources ever assert a different cost for the same item, the board raises a functional_conflict and flags both facts disputed — conclusions resting on them inherit the taint, instead of one silently winning.

Retract an input and everything resting on it falls. Truth maintenance is automatic — drop the widget line by its id:

{ "op": "retract_node", "nodeId": "line_widget" }

and cost(widget, 9093), plus any total derived from it, disappears with it. No conclusion outlives its premises.

Not just a calculator — a substrate for what to do next

A board that only answered "what is true" would be a calculator. rulith also helps drive what to do next, in the same symbolic, auditable way:

  • It names the next move. An open goal comes back with needs via <rule>: ... (which fact is missing) and producible via action ... (which defined action would produce it) — not just "unproven".
  • It previews before acting. simulate_action and validate_plan return the exact delta a step, or an ordered plan, would commit — so the agent checks consequences before touching the world.
  • It advances state, audited. consume/produce actions carry the board forward one verified transformation at a time, each leaving an event in the record.

This is the driving floor the paper measures: how far a board carries an LLM from a single claim, through a multi-step task, toward an autonomous lifetime — the commitment ladder of the whitepaper. The kernel here is the public floor of that ladder; the higher rungs rest on the same derived-or-it-didn't-happen contract.

Tools

create_space, update_working_memory (declare_goal / assert_fact / add_axiom / define_action / declare_hypothesis / record_result / retract_node / revise_fact), simulate_action, apply_action, get_logic_context, distill_space, list_spaces.

Open goals come back with teaching hints: which rule is missing which facts (needs via ...), and which defined action could produce the missing atom (producible via action ...).

Validated, not vibe-coded

This kernel was built against a discipline of red-tests-first and real-model validation: 100+ logged rounds of local models (gemma/qwen, 27B–35B class) driving the board through real tasks — judgment, diagnosis-and-repair, open-ended audit, stoichiometric reactions — each round documented with board evidence, each kernel gap found by a real run, exposed by a failing test, then fixed. The entire series ran on an AMD Strix Halo iGPU (Radeon 8060S); no discrete GPU was involved at any point.

Hard-arithmetic A/B (validation round #28, seeded and reproducible — 8-digit × 5-digit line items, 5–8 lines, exact totals, same 27B local model both arms): plain chat scored 0/10 (three confidently wrong totals, seven non-terminating DNFs at a 10-minute cap); the board arm scored 8/10, every solved value closure-derived, median 3 turns. Across all ten problems the board never displayed a single wrong number — it either derived the exact value or claimed nothing. The two board losses were generation-level runaways, replayed clean and re-verified with BigInt. 1,200+ unit tests; CI on Linux and Windows.

Eight A/B benchmark fixtures ship in src/examples/, each pitting board against baseline on a dimension a strong model still gets wrong: bench-arith (exact arithmetic), bench-audit (error-finding audits), bench-coding-trust (a fabricated "fix" is blocked — the board certifies a repair only from a real edit plus a passing test), bench-repair (diagnosis-first bug repair), bench-revision (content-addressed consistency under concurrent edits), and bench-aggregate (exact summation) — plus the bench-arms / bench-pool cross-model harness (run any fixture board-vs-baseline, two models, token-metered).

License

Apache License 2.0.