abax-mcp
v0.2.0
Published
An external reasoning board for LLM agents: exact arithmetic, rule-derived conclusions with provenance, consume/produce actions with history. Derived or it didn't happen.
Maintainers
Readme
abax
An external reasoning board for LLM agents. Derived or it didn't happen.
abax (Greek ἄβαξ, counting board — the ancestor of the abacus) gives an agent a working memory with a rule engine. The agent proposes facts, rules, and actions; the board computes deductive closures, does exact arithmetic, tracks consumption/production, and keeps an evidence chain for every conclusion. The agent cannot launder a guess into a result: every claim on the board is derived (closure-backed), an effect (action product), or asserted (bare claim) — and results that rest on bare claims are rejected.
First run through the published package, a 27B local model driving the board from Claude Code: the frontier model supervising the release (Claude Fable 5 Max) had mentally computed
9381274 × 6473and confidently repeated the wrong answer three times. The board derived the right one. That incident is validation round #27 — the product demoing itself on its own author.
Why
LLMs assert; they do not prove. For tasks where a wrong number or an unverified claim is expensive — audits, invoices, inventory, multi-step analysis — the fix is not a smarter model but a surface the model must show its work on:
- Exact-or-fail arithmetic — integer math is exact within ±2^53; overflow, NaN, and silent precision loss fail loudly instead of rounding. The model never does arithmetic in its head.
- Derivation gate —
finding(...)facts must be derived by the rule closure from primitive observations. Asserted findings blockrecord_result. There is no way to claim without showing. - Actions with history — consume/produce transformations archive what they consume and record an event (binding, consumed, produced). The board keeps the process, not just the end state.
- Truth maintenance — retract an input and everything resting on it falls; contradictions taint downstream conclusions as disputed.
- Teaching errors — every rejection explains how to fix the call. Validated to keep 27B-class local models productive.
- No model, no GPU, no network — abax never calls an LLM. It is a pure local kernel (Node ≥ 20, two pure-JS dependencies) that the agent drives over MCP stdio.
Install
As a Claude Code / Cowork plugin (MCP server + skill in one step):
/plugin marketplace add nvwaonline/abax
/plugin install abax@abaxAs a bare MCP server in Claude Code:
claude mcp add abax -- npx -y abax-mcpOr in any MCP host, project-scoped .mcp.json:
{
"mcpServers": {
"abax": { "command": "npx", "args": ["-y", "abax-mcp"] }
}
}Optional persistence across sessions: set env ABAX_DB to a .jsonl
file path. Without it, the board lives and dies with the session.
Tools
create_space, update_working_memory (declare_goal / assert_fact /
add_axiom / define_action / declare_hypothesis / record_result /
retract_node / revise_fact), simulate_action, apply_action,
get_logic_context, distill_space, list_spaces.
Open goals come back with teaching hints: which rule is missing which
facts (needs via ...), and which defined action could produce the
missing atom (producible via action ...).
Validated, not vibe-coded
This kernel was built against a discipline of red-tests-first and real-model validation: 27 logged rounds of local models (gemma/qwen, 27B–35B class) driving the board through real tasks — judgment, diagnosis-and-repair, open-ended audit, stoichiometric reactions — each round documented with board evidence, each kernel gap found by a real run, exposed by a failing test, then fixed. The entire series ran on an AMD Strix Halo iGPU (Radeon 8060S); no discrete GPU was involved at any point.
Hard-arithmetic A/B (validation round #28, seeded and reproducible —
8-digit × 5-digit line items, 5–8 lines, exact totals, same 27B local
model both arms): plain chat scored 0/10 (three confidently wrong
totals, seven non-terminating DNFs at a 10-minute cap); the board arm
scored 8/10, every solved value closure-derived, median 3 turns.
Across all ten problems the board never displayed a single wrong number
— it either derived the exact value or claimed nothing. The two board
losses were generation-level runaways, replayed clean and re-verified
with BigInt. Fixtures: src/examples/bench-arith.ts (and bench-audit.ts). 170+ unit
tests; CI on Linux and Windows. A/B benchmark fixtures (exact
arithmetic, error-finding audits) ship in src/examples/.
The lab notes and agent runtime around the kernel live in this repo too: see docs/galaxy-lab.md (Chinese) for the REPL, web board visualizer, and verification fixtures.
License
MIT
