knitbrain
v0.4.0
Published
The local-first brain for coding agents: per-project memory, task-tier workflow routing, and lossless context compression — measured ~48% of all tool-result tokens on real sessions (60–70% on code/JSON/logs), answer-preservation gated, reproducible with o
Maintainers
Readme
Knit Brain
The local-first brain for coding agents: per-project memory, task-tier workflow routing, and lossless context compression — measured ~48% of all tool-result tokens on real sessions (55% on sizable blocks; 60–70% on code/JSON/logs), with answer-preservation gates, reproducible with one command.
Pure TypeScript. No Python, no native binaries, no network beyond npm install.
npx knitbrain profile # measure what it would save on YOUR real sessions — before installing anything
npx knitbrain evals # prove the answers survive — same corpus, deterministic judgingThe honest number
Most tools in this space quote their best workload ("up to 90%!"). We publish the number nobody else does: the all-inclusive average — every tool-result token from real coding sessions, including the small outputs that pass through uncompressed.
On ~3M tokens of tool results from real Claude Code sessions: ~48% saved overall, lossless. That denominator includes every tool-result token — even the small outputs that pass through untouched (counting only sizable blocks ≥400 chars, it's ~55%). The exact figure moves with your workload mix; run knitbrain profile for yours. Every original recoverable byte-for-byte.
| shape | % of real burn | saved | |---|---|---| | code & file reads | 47% | 60.3% | | repetitive logs | 17% | 70.5% | | short prose (reports, summaries) | 16% | 18.0% | | long prose | 7% | 69.2% | | test output | 6% | 47.4% | | JSON | 5% | 65.1% | | diffs | 1% | 62.7% |
Measured the way others measure — single best-case workloads — we land 60–99% (import graphs 98.9%, whole files 88.8%, body-heavy code 71.6%). But that's not the number you'll feel; the all-inclusive average is.
Don't take our word for any of this. knitbrain profile runs the actual optimizer over your own transcripts (~/.claude/projects by default) and prints your number. Local only — nothing is uploaded.
Same answers, measured
Saving tokens is worthless if the agent loses the answer. knitbrain evals checks — on the same real corpus, with deterministic string-containment judging (no LLM judge to flatter us) — that the facts agents act on survive compression:
| check | result | gate |
|---|---|---|
| error-fidelity — every error/failure line survives in the skeleton | 142/142 = 100% | 100% |
| summary-fidelity — test/build result totals survive | 189/189 = 100% | ≥95% |
| identifier-fidelity — top-level declared names survive | ≥99% (corpus-dependent) | ≥99% |
| round-trip — ⟨ccr:hash⟩ recovers the original byte-for-byte | 100% | 100% |
| never-expand — no compressed block got bigger | 100% | 100% |
Error lines, result summaries, and round-trip recovery are hard guarantees (always 100%). Identifier-fidelity runs ≥99% on normal corpora; on declaration-dense corpora a name in a very large block can land in an elided region — and even then it's lossless (the full block is one knitbrain_retrieve away). Holding the fidelity line costs ~1pp of savings; we pay it. Run npx knitbrain evals (exit code 1 on gate failure) for your own number.
Why this and not a point tool
Compression-only layers shrink tokens but remember nothing. Memory-only layers remember but burn your window. Knit Brain is one substrate doing both, plus the workflow layer that makes agents use them:
- Memory — per-project learnings, session handoffs, a knowledge graph (imports/exports/blast-radius), on-demand skills that compound across tasks, and
knitbrain learn— offline failure mining that writes corrections from your real sessions into CLAUDE.md. - Lossless optimization — structure-preserving skeletons (JSON keeps its schema; code keeps its signatures via tree-sitter AST across TypeScript/TSX/JS, Python, Go, Rust, Java, C++, C#, Ruby, PHP, Bash), dedicated handlers for search results, build/test logs, and diffs (error lines always survive), cross-turn dedup of re-sent bulk, sentence anchoring for prose — all reversible through a content-addressed store.
- Workflow intelligence — a deterministic tier classifier (inquiry/trivial/standard/complex) routing how much process a task deserves; complex verdicts carry an explicit ENTER-PLAN-MODE directive the agent follows before touching files; guardrailed agent generation and a shared team board.
- Every rung compounds, not just remembers — the difference between an asset and a log is the closed loop (memory → signal → adjustment). Each subsystem closes it: compression backs off any kind that gets over-retrieved (TOIN), the classifier shifts its thresholds after 3 wrong-verdict votes (
record_false_positive), skills get flaggedneeds-revisionwhen reported failing (skill_outcome), and learnings are ranked by whether they actually helped — a learning reported wrong is discredited and demoted, its correction folded into the lesson (learning_outcome). Wrong tuning costs efficiency, never correctness. - Closed loop, enforced not just nudged — the full operating protocol (load session → classify → plan-mode adherence → skills → agents → context discipline → record learning) rides the MCP handshake. On hook-capable hosts (Claude Code),
setupalso installs lifecycle hooks so the loop's first step is automatic: a SessionStart hook injects the protocol + prior handoff + proven learnings into every session, a PreToolUse hook hard-redirects large raw Reads toknitbrain_read, PreCompact/Stop keep work resumable. Honest ceiling: enforced where the host has hooks, strongly nudged everywhere else.
Architecture
agent (Claude Code / Cursor / Codex) your app (API key)
│ │
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────────┐
│ knitbrain · MCP server │ │ knitbrain-proxy (loopback) │
│ 27 tools │ │ rolling window — old turns │
│ ├ memory: learnings, │ │ compressed harder · exact │
│ │ handoffs, sessions │ │ repeats deduped to markers │
│ ├ knowledge graph │ │ · your directive verbatim │
│ ├ classifier + FP loop │ │ · CacheAligner prefix │
│ ├ skills · agents │ └──────────────┬───────────────┘
│ ├ team board · meter │ │ smaller request
│ └ optimize / retrieve │ ▼
└────────────┬─────────────┘ LLM provider (Anthropic
│ every data payload /v1/messages · OpenAI
▼ /v1/chat/completions)
┌──────────────────────────────────────────────┐
│ optimizer router │
│ json → schema-preserving skeleton │
│ code → tree-sitter AST body elision │
│ (TS/JS · Py · Go · Rust · Java · │
│ C++ · C# · Ruby · PHP · Bash) │
│ search → per-file collapse + counts │
│ logs → errors+summaries kept, runs │
│ collapsed (races template dedup) │
│ diffs → headers kept, hunks → ±counts │
│ prose → sentence anchor (TOIN-gated) │
│ errors / result lines NEVER elided │
└────────────────────┬─────────────────────────┘
▼ skeleton + ⟨ccr:hash⟩
┌──────────────────────────────────────────────┐ ┌─────────────────┐
│ CCR store — lossless, content-addressed │◀───▶│ live dashboard │
│ (sha256 = handle) · integrity-checked reads │ │ 127.0.0.1:8790 │
│ hot → cold gzip → budgeted purge │ └─────────────────┘
└──────────────────────────────────────────────┘
self-healing: TOIN backs off over-retrieved kinds ·
classifier recalibrates after 3 wrong-verdict votesOne brain, two doors, one lossless store:
- MCP server (
knitbrain) — 27 tools: memory (learnings, session handoff), knowledge graph (imports/exports/dependents), workflow classification with a self-healing false-positive loop (3 wrong-verdict votes shift the threshold), aknitbrain_runorchestrator (task → skill → agents → directive), an on-demand skills engine with an outcome signal (skills that keep failing are flagged needs-revision, failure notes fold into the playbook), project-specific agent generation, a shared team board, a context-window meter (warns and tells the agent to save a handoff before the window blows), and explicitoptimize/retrieve. Every data payload flows through one dispatch chokepoint where it's compressed structure-preservingly and tagged with a⟨ccr:hash⟩handle. - Proxy (
knitbrain-proxy) — a loopback HTTP proxy in front of the LLM API (provider auto-detected per request: Anthropic/v1/messages, OpenAI/v1/chat/completions). Compresses the full request — old turns harder than recent ones, exact repeats across turns collapsed to a marker, pasted bulk inside your message compressed while your directive stays verbatim — and streams the response back. - CCR store — content-addressed (SHA-256 = handle), integrity-checked on every read, atomic writes, tiered retention (hot → cold gzip archive → budgeted purge). The pristine original is always one
retrieveaway, which is what makes aggressive compression safe. - Live dashboard — context meter, tokens saved, CCR tiers, self-tuning stats, knowledge graph, skills, recent learnings, team board. All stores are cross-process fresh: what the agent writes, the dashboard shows on the next tick.
Quickstart
npm install -g knitbrain
knitbrain profile # your savings, on your transcripts, before you commit to anything
# in your project — ONE command configures everything (memory, workflow,
# plan-mode adherence, skills, teams, meter, hooks; non-clobbering):
knitbrain setup # native integration: Claude Code, Cursor, VS Code +
# Copilot, Windsurf written directly
knitbrain setup --yes # ALSO writes the global configs (Codex, Copilot CLI,
# Zed) for you — backed up + non-clobbering, no paste
knitbrain dashboard # live local dashboard (127.0.0.1:8790)
knitbrain learn # mine past sessions for failure→success corrections (--apply writes CLAUDE.md)
knitbrain evals # answer-preservation gates on your own transcripts
knitbrain prompt # full operating prompt, for platforms without MCP-instructions support
# pay per token? one command wires the optimizer proxy into your agent:
knitbrain wrap claude # (or codex / aider / copilot) — sets the base URL,
# starts the proxy, launches the agent. No manual export.
# teams — shared optimized sessions (one URL + one token):
knitbrain hub # start the team hub (host runs this once)
knitbrain join <hub-url> <token> <name> # everyone else; postings mirror automaticallyUse as a library
The same router that powers the proxy and MCP server is importable directly — no server, no config:
import { createOptimizer } from "knitbrain";
const kb = createOptimizer(); // CCR store under ~/.knitbrain/ccr
const r = kb.compress(bigToolOutput); // detect → route → compress
console.log(r.savedPct, r.contentType); // e.g. 62.4 "json"
// r.skeleton → hand to the model; kb.retrieve(r.handle) → exact original, byte-for-bytecompress() is lossless (original always recoverable via the CCR handle) and guarded — if compression doesn't save at least 5%, the original passes through untouched.
If you pay per token
Agent loops re-send the entire conversation on every turn, so input tokens dominate the bill — usually by an order of magnitude over output. That makes context the thing worth optimizing:
- The proxy shrinks the request itself, on the wire. ~48% fewer tool-result tokens means a proportionally smaller input bill on the bulk of every request, every turn, compounding over a session.
knitbrain wrap claudewires it in with one command — no manual base-URL export. - It stacks with provider prompt caching. CacheAligner keeps the system prefix byte-stable across turns: whitespace normalization, volatile lines ("Today's date is …") moved out of the prefix to a marked tail, and — when your client doesn't manage its own — Anthropic
cache_controlbreakpoints inserted at the system prompt and the stable history boundary. Cached input reads are ~90% cheaper on Anthropic; OpenAI prefix caching needs exactly the byte-stability this provides. Compression is deterministic, so optimized history prefixes stay stable turn over turn — the two levers stack. - It can never make a request more expensive. The never-expand guard is enforced by tests: output tokens ≤ input tokens, always.
- On a subscription instead? Same mechanics, different currency: fewer tokens per turn means the context window fills slower — fewer compactions, fewer lost-context restarts, longer useful sessions.
Run knitbrain profile to see the percentage on your own workload before believing any of this.
Guarantees (enforced by gated tests, not promises)
- Lossless — every compressed payload recovers byte-for-byte from CCR; the round-trip test gates the build.
- Never-expand — output tokens ≤ input tokens, always.
- Errors survive — error/failure lines, result summaries, and top-level declarations are never elided;
knitbrain evalsgates this at 100%/≥95%/≥99% on real transcripts. - Governance verbatim — your instructions and protocol/classification text are never skeletonized.
- Local-first — proxy, hub, and dashboard bind
127.0.0.1by default; nothing leaves your machine. - Reproducible claims — the headline numbers come from
knitbrain profileandknitbrain evalson real transcripts, both of which you can run on yours. (npm run benchis a CI regression gate: a real-shape suite whose fixture mix mirrors the profiled distribution, with per-shape savings floors and fidelity checks, plus a clearly-labeled best-case suite — fixture numbers are never quoted as real-world savings.)
Development
npm install
npm run verify # typecheck → lint → test → build → consistency → bench (all must pass)
npm run e2e # built-artifact E2E: stdio session + real-file compression
npm run audit:prod # cold-start proof: clone → install → pack → installed binaries → all 27 toolsCurrent proof status: 233 tests passing on Linux and Windows (CI matrix: Ubuntu + Windows, Node 20 & 22), eval gates PASS on real transcript blocks, and the production audit (audit:prod) passes — fresh clone, clean install, packed tarball installed into a new project, all 27 tools and the three binaries verified working (incl. a 27-tool live stdio E2E and live proxy/hook/dashboard/hub checks). One opt-in test (live LLM endpoint) requires your own API key: KNITBRAIN_LIVE_TEST=1 ANTHROPIC_API_KEY=… npm test.
License
MIT © Piyush Dua
