npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

zettel-compress

v1.0.0

Published

Deterministic memory engine for LLM apps — compress, search, and inject conversation memory offline, on-device, or at the edge. Zero external calls, zero infrastructure, zero dependencies.

Downloads

1,397

Readme

zettel-compress

npm license

Memory for LLM apps that runs anywhere your code runs — offline, on-device, at the edge. Compresses conversation history into structured, searchable memory and hands your model exactly the context it needs. Zero external calls. Zero infrastructure. Zero dependencies. 18 kB gzipped.

And the property nothing else in this space has: it's deterministic — the same messages produce byte-identical memory, every time, on every machine. Memory you can snapshot-test, replay in CI, diff in code review, and trust in an air-gapped deployment.

▶ Try it in the playground — the full engine runs client-side in your tab (14 kB gzipped): paste a conversation, watch it become structured memory, ask it questions.

import { compress, recall, injectContext } from 'zettel-compress'

const memory = compress(conversationHistory)

// search memory at question time — BM25 + graph expansion, no embeddings
recall(memory, 'what did we decide about authentication?', { topK: 5 })

// or inject a hard-budgeted memory block into your prompt
injectContext(memory, { maxTokenBudget: 300, format: 'markdown' })

Measured results

Every number below is reproducible: npm run bench (deterministic) and npm run bench:llm (requires an OpenAI key in .env). Datasets: a real assistant conversation (~2.5k tokens) and two public-domain books — Pride and Prejudice (~182k tokens) and On the Origin of Species (~233k tokens) — each with 12 decision facts planted at seeded positions.

Can a real model answer questions from the compressed memory?

QA accuracy with gpt-4o-mini answering from each context (the reply must contain the unique answer token):

| context given to the model | avg tokens | conversation | novel (182k) | science (233k) | |---|---|---|---|---| | nothing | — | 0% | 0% | 0% | | first 300 tokens of the document | ~370 | 8% | 0% | 0% | | injectContext top-10, markdown | 430–860 | 58% | 0% | 0% | | injectContext budget-300, markdown | ~290 | 33% | 0% | 0% | | recall(question) top-5 | 90–200 | 58% | 67% | 75% |

The honest read:

  • recall() is the headline. From a 233k-token corpus, ~150 tokens of retrieved memory let the model answer 75% of questions — the naive same-cost baseline answers 0%. No embeddings, no API, sub-millisecond.
  • Static injection is for conversation-scale memory. On a 19-zettel conversation, top-10 injection carries 83% of planted decision signals (58% end-to-end with the model). On a 1,085-zettel book, 10 zettels cannot cover 12 scattered facts — use recall() for archives.
  • Use format: 'markdown' when injecting directly into prompts — models read plain quotes better than the compact AAAK lines (33% vs 17% at the same budget in our runs). Use AAAK for storage and round-tripping.

Public benchmark: LoCoMo-10 (very-long-term conversational QA)

Full protocol over all 1,986 questions: 10 multi-session conversations (260k tokens total) compressed once each, gpt-4o-mini answering from recallContext(question) top-10 passages only (npm run bench:locomo).

| retrieval mode | overall F1 (cat 1–4) | single-hop | temporal | multi-hop | answer-in-context | |---|---|---|---|---|---| | quotes only (pre-0.3) | 5.9 | 8.0 | 1.3 | 5.4 | 9.7% | | small-to-big (recallContext) | 42.4 | 58.4 | 23.9 | 27.8 | 38.1% |

Adversarial category (446 unanswerable questions): 88.3% correct abstention, 5.2% trapped. Average context: ~1,700 tokens/question — under 1% of the conversation. No-context baseline: F1 0.0.

Honest framing: long-context GPT-4-class models with the entire conversation in the prompt score in the 30s–50s F1 band on these categories; LLM-write memory systems (mem0) report ~67 under a more lenient LLM-judge metric. zettel-compress reaches the full-context band at ~1% of the tokens with zero model calls on the memory side. Ceiling analysis: only 45.1% of LoCoMo gold answers appear verbatim anywhere in the conversation (annotators reworded the rest), and BM25 retrieval already surfaces 86% of that ceiling in the top-10 — a measured GloVe-blend spike gained +0.4 points, so the remaining gap is answer rewording and metric strictness, not retrieval ranking.

Speed, size, and guarantees

| dataset | input tokens | zettels | compress time | throughput | |---|---|---|---|---| | conversation | 2,571 | 19 | 5.3 ms | ~485 tok/ms | | novel | 182,179 | 1,085 | 757 ms | ~241 tok/ms | | science | 233,248 | 753 | 525 ms | ~444 tok/ms |

  • Compression: injectContext top-10 reduces the 233k-token text to 882 tokens (0.38%); a 300-token budget always lands ≤ its ceiling (measured 82–100% utilization, zero overruns across all tiers and datasets).
  • Lossless round-trip: decode(encode(result)) reproduces zettels, tunnels, and the entity index exactly — verified by deep equality on all three datasets and a 200-case property test (multi-line quotes, unicode, pipes, snake_case all survive).
  • Streaming: CompressStream processes ~0.1 ms/message with bounded memory.
  • Entity detection: 100% precision / 100% recall on the labeled benchmark fixture (10 gold entities among changelog/chat noise).

When to use it — and when not to

Use zettel-compress when your constraints look like this:

  • Offline / on-device / air-gapped — local-first apps, privacy-bound deployments, anywhere user text must never leave the process
  • Edge runtimes — Cloudflare Workers, Vercel Edge, browsers: no vector DB to stand up, no embedding service to call, nothing to operate
  • Determinism matters — agent test suites, replayable sessions, auditable memory; byte-identical output is something no LLM- or embedding-based memory can offer even in principle
  • No external calls allowed — compliance, latency budgets measured in milliseconds, or simply zero appetite for another vendor dependency

Use something else when:

  • You want maximum recall quality and can run infrastructure → embeddings RAG (pgvector + any embedding model) handles paraphrase ("pottery class" ↔ "ceramics workshop") better than lexical matching ever will
  • You want managed memory with fact-updating and contradiction handling → mem0 / Zep are good products; they trade API calls, latency, and nondeterminism for higher QA scores
  • Your conversations are short → a sliding window of recent messages is simpler and good enough

| | zettel-compress | mem0 / Zep | embeddings RAG | |---|---|---|---| | External calls | none, ever | every write | every index + query | | Infrastructure | none | hosted service | vector store | | Runs offline / on-device / edge | yes | no | rarely | | Deterministic / replayable / testable | byte-exact | no | no | | Lossless text serialization (diffable memory) | yes (AAAK) | no | no | | Semantic paraphrase matching | no — lexical + graph | yes | yes | | LoCoMo QA (our measurement / their reported) | 41.6 F1 | ~67 (LLM-judged) | unmeasured here |

The trade is explicit and we publish the numbers on it.


How it works

Text is chunked on paragraph boundaries (overlap snaps to word boundaries; every chunk carries exact source offsets). Each chunk becomes a zettel:

  • entities — proper nouns detected by capitalization evidence (sentence-start noise like Added, Please is filtered; chat speaker labels are kept), with pronoun coreference: she/he link to the most recent gender-matching entity, so a person stays attached to the conversation after their first mention
  • topics — key terms with CamelCase/ALL-CAPS/hyphenation boosts
  • quote — the most information-dense sentence (TextRank blended with decision-word density; falls back gracefully on lowercase chat text)
  • weight — importance in [0, 1], rank-normalized with tie-aware midranks (equal raw scores always get equal weights; relative within a result)
  • flagsDECISION | ORIGIN | CORE | PIVOT | GENESIS | TECHNICAL
  • emotions — 30 states via word-boundary lexicons with negation scope (a useful filtering signal, not sentiment analysis — calibrate expectations accordingly)

Tunnels link zettels sharing entities/topics above a Jaccard threshold (capped per zettel). recall() runs BM25 over quotes+topics+entities with automatic synonym expansion (move↔relocate, marry↔wedding, job↔career, and city abbreviations like NYC↔"New York") and a date-proximity bonus that boosts zettels whose resolvedDate matches a year/month mentioned in the query. Hits expand one associative hop along tunnels with personalized PageRank.


Documentation

  • Getting started — core concepts, the three verbs (compress → recall → inject), persistence, options
  • Integration recipes — chatbot memory loop, Vercel AI SDK, Cloudflare Workers + KV, browser/local-first, streams, multi-session, exact token budgets
  • Deterministic testing — snapshot-test and replay your agent's memory in CI (the thing only deterministic memory can do)

Install

npm install zettel-compress

Quick start

import { compress, injectContext, recall, wakeUp, CompressStream } from 'zettel-compress'

const result = compress(conversationHistory)

// hard token budget — measured output, never exceeds the ceiling
const block = injectContext(result, { maxTokenBudget: 300, format: 'markdown' })

// guarantee decisions survive selection even when ranked low
injectContext(result, { maxZettels: 10, guaranteeFlags: ['DECISION'] })

// diversity-aware selection (maximal marginal relevance)
injectContext(result, { maxZettels: 10, selection: 'mmr' })

// search memory at question time — ranked zettels, or ready-to-inject passages
const hits = recall(result, 'what did we decide about auth?', { topK: 5 })
const context = recallContext(result, 'what did we decide about auth?', { maxTokens: 2000 })

// short narrative of the top moments (top 15% by weight)
const summary = wakeUp(result)

// streaming: compress each message as it arrives, bounded memory
const mem = new CompressStream({ halfLifeTurns: 50, maxZettels: 200 })
mem.push('Alice: the login service keeps timing out')
mem.push('Bob: we decided to rotate tokens hourly')
mem.recall('token decision')   // search the live stream
mem.snapshot()                 // CompressResult at any point — replayable

API

compress(text, options?): CompressResult

compress(text, {
  chunkSize: 800,          // chars per chunk (default 800)
  chunkOverlap: 100,       // overlap, snapped to word boundaries (default 100)
  date: '2026-06-12',      // ISO date for the AAAK header
  title: 'My Session',     // title for the AAAK header
  minEntityFrequency: 1,   // min occurrences to count as entity
  stopWords: ['foo'],      // extra stop words for topic extraction
  temperature: 0.5,        // softmax temperature for weight spread
  tunnelThreshold: 0.3,    // min Jaccard similarity for a tunnel
  tunnelTopK: 3,           // max tunnels per zettel
  dedupe: true,            // merge near-duplicate zettels (default false)
  dedupeThreshold: 0.9,    // token-set Jaccard that counts as duplicate
  verboseLabels: true,     // tunnel labels as Alice+Bob instead of ALC+BBB
  keepSource: true,        // retain normalized input on meta.source for
                           // provenance-expanded recall (default true)
})

Zettels carry exact sourceStart/sourceEnd offsets into meta.source; the offsets serialize in AAAK, the source text never does (the format stays compact — re-supply the text via recallContext's source option after decoding).

Tunnel building switches to MinHash/LSH candidate generation above 500 zettels — 10,000 zettels link in ~400ms instead of 50M pairwise comparisons, deterministically.

recall(result, query, options?): Zettel[]

Query-time retrieval: BM25 over each zettel's full source chunk (falling back to quote/topics/entities when no source is kept), with built-in synonym expansion (common paraphrase clusters: move/relocate/transfer, job/career/work, marry/wedding/spouse, and city abbreviations) and a date-proximity bonus that promotes zettels whose resolvedDate matches any year/month found in the query. Hits optionally expand one hop along the tunnel graph with personalized PageRank. { topK?: number, hops?: boolean, expandQuery?: boolean, after?: string, before?: string }. Deterministic.

recallContext(result, query, options?): string

Small-to-big retrieval — the recommended way to build LLM context. Ranks on the compact zettel index, then returns the full source passages the hits came from: overlapping spans merge, a token budget admits passages in rank order, and the output assembles in document order so narrative/temporal flow survives. { topK?, hops?, maxTokens?, source? }. Falls back to quotes when no source text is available (e.g. decoded AAAK; pass source to restore it). This is what lifted LoCoMo F1 from 5.9 to 41.6.

injectContext(result, options?): string

injectContext(result, {
  maxZettels: 10,            // top N by 0.7·weight + 0.3·signal-flag bonus
  selection: 'mmr',          // 'weight' (default) | 'mmr' diversity selection
  guaranteeFlags: ['DECISION'], // always include one zettel per flag if present
  minWeight: 0.5,            // weight floor
  flags: ['DECISION'],       // filter to flags
  format: 'markdown',        // 'aaak' (default) | 'json' | 'markdown'
  maxTokenBudget: 300,       // hard ceiling — output measured, never exceeded
  countTokens: myTokenizer,  // optional exact counter (e.g. js-tiktoken)
})

Only tunnels and entity-index entries belonging to the selected zettels are emitted.

CompressStream

Incremental memory for message streams. push(text), snapshot(), recall(query, opts?), size. Options: all of CompressOptions plus halfLifeTurns (recency decay in pushes) and maxZettels (bounded memory via lowest-decayed-weight eviction). With dedupe: true, a re-sent or boilerplate message refreshes the recency of the zettel it duplicates instead of growing the stream — repetition strengthens a memory rather than copying it. Entity codes never change once assigned; replaying the same pushes reproduces a byte-identical snapshot.

wakeUp(result, topPct = 0.15): string

Narrative summary of the top topPct zettels by weight (plus ORIGIN/CORE/GENESIS flags), capped at 5. Never empty on non-empty input.

encode(result): string / decode(aaak, options?): CompressResult

AAAK v2 text serialization — fully lossless: E: lines carry the entity index, quotes/topics/headers are escaped (multi-line quotes, ", |, snake_case topics all survive exactly). decode reads v1 and v2; { strict: true } throws on malformed lines, default mode collects meta.warnings (including header-count mismatches and unknown emotion/flag tokens).

FILE:002|ALC+BOB|2026-06-12|Auth Design|v2
E:ALC=Alice;BOB=Bob
001:ALC+BOB|authentication,security|"We decided to use JWT tokens."|0.91|conviction|DECISION+TECHNICAL
T:001<->002|ALC+BOB

Others

compressMany(texts, options?) · mergeResults(results) (re-normalizes weights onto one scale) · topZettels(result, n) · normalizeWeights(zettels, temperature?) · estimateTokens(text) · encodeZettelLine / encodeTunnelLine · runtime constants ALL_FLAGS, ALL_EMOTIONS.


Integration examples

Vercel AI SDK — budgeted memory block

import { compress, injectContext } from 'zettel-compress'

const memory = compress(messages.map(m => `${m.role}: ${m.content}`).join('\n'))
const block = injectContext(memory, { maxTokenBudget: 300, format: 'markdown' })

const response = await streamText({
  model: openai('gpt-4o'),
  messages: [
    { role: 'system', content: `Relevant past context:\n${block}` },
    ...recentMessages,
  ],
})

Question-time recall — only inject what the question needs

import { compress, recall } from 'zettel-compress'

const memory = compress(fullHistory)
const relevant = recall(memory, userQuestion, { topK: 5 })
const block = relevant.map(z => z.quote).join('\n')   // ~100–200 tokens

Cloudflare Workers — persistent compressed memory in KV

import { compress, encode, decode, recall } from 'zettel-compress'

export default {
  async fetch(request: Request, env: Env) {
    const { sessionId, message, question } = await request.json()

    if (question) {
      const stored = await env.KV.get(`memory:${sessionId}`)
      if (!stored) return Response.json([])
      const hits = recall(decode(stored), question, { topK: 5 })
      return Response.json(hits.map(z => z.quote))
    }

    // append to the session log, store the compressed memory alongside it
    const log = ((await env.KV.get(`log:${sessionId}`)) ?? '') + '\n\n' + message
    await env.KV.put(`log:${sessionId}`, log)
    await env.KV.put(`memory:${sessionId}`, encode(compress(log)))
    return new Response('ok')
  },
}

Emotion states detected

conviction, grief, joy, fear, hope, trust, wonder, rage, exhaustion, shame, pride, nostalgia, anxiety, relief, anticipation, frustration, gratitude, loneliness, inspiration, confusion, clarity, guilt, awe, regret, determination, vulnerability, acceptance, resistance, love, loss

Importance flags

| Flag | Triggered by | |---|---| | DECISION | "decided", "chose", "committed", "concluded", "agreed to", "going to", "we will", "final decision" | | ORIGIN | "founded", "originated", "first time ever", "inception", "birth of", "how it began" | | CORE | "fundamental", "essential", "key principle", "foundation of", "bedrock", "non-negotiable" | | PIVOT | "turning point", "breakthrough", "changed everything", "transformed", "pivotal", "game changer" | | GENESIS | "led to", "resulted in", "because of this", "gave rise to", "which caused", "set in motion" | | TECHNICAL | "architecture", "implement", "deploy", "config", "database", "module", "infrastructure", "stack", "endpoint", "schema" |

All keyword matching is word-boundary anchored with negation-scope handling ("we never decided" does not flag). TECHNICAL is metadata only — it does not affect zettel weight so technical detail chunks don't crowd out decisions and emotional moments in ranked selection.


Reproducing the benchmarks

npm run bench       # deterministic: performance, compression, budgets,
                    # round-trip, answer-in-context QA, MRR, entities, streaming
npm run bench:llm   # end-to-end QA with a real model; needs OPENAI_API_KEY in .env

Both harnesses use seeded PRNGs — same machine, same numbers.


License

MIT