somtum

v1.3.1

Published

4 days ago

Local-first memory and prompt-cache layer for Claude Code

Downloads

873

Local-first memory and prompt-cache layer for Claude Code.

Somtum (Thai: ส้มตำ) is named after the vibrant, shredded green papaya salad. Just like its namesake, Somtum blends durable observations from your Claude Code sessions — decisions, bugfixes, learnings, file summaries — stores them in a local SQLite database, and injects the relevant ones back into context the next time you work on the same project.

Zero-config: one somtum init and every session end is captured automatically. No server, no cloud account, no mandatory tuning.

Why Somtum?

LLM agents like Claude Code start every session with a blank slate. That leads to:

Repetitive context — re-explaining the same architectural choices every session
Regressions — Claude suggests a fix you already tried and discarded
Token waste — reading large files just to "set the scene"

Somtum gives Claude a long-term memory. Once a decision is made or a bug is fixed, it's remembered across all future sessions — without bloating your context window.

What a session looks like with Somtum

Without Somtum                    With Somtum
────────────────────              ──────────────────────────────────────
Session 1: "We use pnpm           Session 1: same work
           because of workspace
           hoisting"

Session 2: Claude suggests        Session 2: Claude already knows about
           npm, you correct it       pnpm, the auth decisions, and the
           again                     bugfixes from last week

How it works

At the end of each Claude Code session, Somtum reads the session transcript and asks Claude Haiku to extract the parts worth keeping — decisions, bug fixes, things learned. Those observations are stored locally and injected back into context the next time you ask something related.

Memory lifecycle

sequenceDiagram
    participant You
    participant Claude Code
    participant Somtum Hook
    participant SQLite DB

    You->>Claude Code: Work on project (coding, debugging, decisions)
    Claude Code->>Somtum Hook: SessionEnd fires automatically
    Somtum Hook->>Claude Code: Extract observations via Haiku
    Somtum Hook->>SQLite DB: Store memories locally

    Note over You,SQLite DB: Next session

    You->>Claude Code: Ask about the project
    Claude Code->>SQLite DB: recall() via MCP tool
    SQLite DB-->>Claude Code: Relevant past decisions & fixes
    Claude Code-->>You: Answer informed by prior sessions

What gets captured — a concrete example

You work a session debugging an auth bug and refactoring a module. At session end, Somtum extracts something like:

[
  {
    "kind": "bugfix",
    "title": "JWT refresh loop caused by missing expiry check",
    "body": "The refresh token loop was triggered because we checked token.exp < Date.now() instead of token.exp < Date.now() / 1000. Unix timestamps are in seconds, not milliseconds.",
    "files": ["src/auth/refresh.ts"]
  },
  {
    "kind": "decision",
    "title": "Use pnpm workspaces — npm hoisting breaks shared types",
    "body": "Switched from npm to pnpm because npm's hoisting puts shared type packages in the wrong node_modules scope, breaking type inference across packages.",
    "files": ["package.json", "pnpm-workspace.yaml"]
  }
]

Next session, when you ask "why are we using pnpm?" or touch src/auth/refresh.ts, Claude finds these memories and already has the context.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Claude Code / Agent                    │
└──────────┬──────────────────────────────┬───────────────────┘
           │ hooks                        │ MCP tools
           ▼                              ▼
┌─────────────────────┐         ┌──────────────────────┐
│  Capture Pipeline   │         │   Query Pipeline     │
│                     │         │                      │
│  UserPromptSubmit ──┼─────────┼▶ cache_lookup        │
│  SessionEnd ────────┼─────────┼▶ recall / get        │
│  PreToolUse (Read) ─┼─────────┼▶ remember / forget   │
└──────────┬──────────┘         └──────────┬───────────┘
           │                               │
           ▼                               ▼
┌─────────────────────────────────────────────────────────────┐
│                      Core (TypeScript)                      │
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │ PromptCache  │  │ MemoryStore  │  │    Retriever     │  │
│  │              │  │              │  │                  │  │
│  │ exact hash   │  │ observations │  │  bm25 (default)  │  │
│  │ fuzzy embed  │  │ + embeddings │  │  embeddings      │  │
│  │ fingerprint  │  │ + redaction  │  │  index           │  │
│  │ invalidation │  │              │  │  hybrid          │  │
│  └──────────────┘  └──────────────┘  └──────────────────┘  │
└─────────────────────────────────┬───────────────────────────┘
                                  │
                                  ▼
                    ┌─────────────────────────┐
                    │  SQLite WAL + ~/.somtum/ │
                    │  /projects/<hash>/       │
                    │    db.sqlite             │
                    │    index.md              │
                    │    memories/YYYY-MM/     │
                    │      <ulid>.md           │
                    └─────────────────────────┘

Retrieval strategies

| Strategy | How it works | Best for | Cost | |---|---|---|---| | bm25 | Keyword search over title + body + tags (SQLite FTS5 — no external dependencies) | Exact terms, offline setups | Near-zero | | embeddings | Semantic similarity using a 30 MB local model (bge-small-en-v1.5, runs fully in-process) | "What did we decide about auth?" style queries | ~5 ms at 10k memories | | index | Sends a compact memory catalog to Haiku; the model picks relevant IDs | Paraphrased or fuzzy queries | 1 Haiku API call | | hybrid | BM25 + embeddings results merged and re-ranked by Haiku | General case (best recall) | BM25 + embeddings + 1 Haiku call |

Default is bm25 — works offline, no setup. Enable hybrid once you have embeddings downloaded.

Requirements

Node 20+
Claude Code — Somtum hooks into Claude Code's SessionEnd, UserPromptSubmit, and PreToolUse events
ANTHROPIC_API_KEY (optional) — if set, Somtum uses the Anthropic API directly for extraction. If not set, Somtum falls back to the claude CLI that ships with Claude Code, so no separate API key is required for Claude Code subscribers.

Install

# npm (recommended)
npm install -g somtum

# yarn
yarn global add somtum

# pnpm
pnpm add -g somtum

Or as a project dependency:

npm install somtum

From source

git clone https://github.com/riz007/somtum
cd somtum
pnpm install
pnpm build
pnpm link --global

Native module note

Somtum uses better-sqlite3, which contains a native C++ addon. On most platforms (macOS, Linux x64/arm64, Windows x64) a prebuilt binary is downloaded automatically. On Alpine Linux / musl or unusual architectures, the addon compiles from source — python, make, and gcc must be available. If the install fails with a node-gyp error, install those build tools and retry.

Quickstart

Step 1 — Choose your extraction backend

Somtum needs to call a Claude model at session end to extract observations. Pick one:

Option A: Claude Code subscription (no extra setup)

If you already have Claude Code installed, you're done. Somtum calls claude --print automatically when no API key is present. Skip to Step 2.

Option B: Direct Anthropic API key (optional — faster, lets you pick the model)

Add to ~/.zshrc (or ~/.bashrc):

export ANTHROPIC_API_KEY="sk-ant-..."

Then reload:

source ~/.zshrc

The key must be in your shell profile, not just exported in an open terminal tab. The SessionEnd hook inherits the environment of the shell that started Claude Code — not the current terminal.

Step 2 — Install inside a Claude Code project

Run this from the root of the project you work on with Claude Code:

somtum init

To enable all features at once (recommended):

somtum init --all
# Installs:
#   - SessionEnd capture hook     (memory extraction)
#   - UserPromptSubmit cache hook (prompt cache lookup)
#   - PreToolUse file-gating hook (large file summarization)
#   - MCP server in .mcp.json     (Claude can call recall/remember tools)

Step 3 — Use Claude Code normally

Open a Claude Code session from the same directory where you ran somtum init. Work as you normally would. When the session ends, the hook extracts observations automatically in the background (capped at 90 seconds).

Step 4 — Check your memory

# How many observations were captured?
somtum stats

# Search memory
somtum search "auth jwt rotation"
somtum search "why we use pnpm" --strategy hybrid

# Open the visual dashboard
somtum serve

If somtum stats shows memories 0 after a session, see Troubleshooting.

Step 5 — Diagnose any issues

somtum doctor

This checks your API key, DB health, hook installation, migrations, cache, and breakeven ratio — with specific fix instructions for each failing check.

Verifying the setup

After your first Claude Code session ends:

1. Check the hook log

cat ~/.somtum/hook.log

A successful run:

2026-04-30T10:15:42.123Z [post_session] starting
2026-04-30T10:15:44.891Z [post_session] ok — inserted=4 cache=2 summaries=1

Using the claude CLI fallback (no API key):

2026-04-30T10:15:42.123Z [post_session] starting
2026-04-30T10:15:42.124Z [post_session] ANTHROPIC_API_KEY not set — will use claude CLI fallback
2026-04-30T10:15:44.891Z [post_session] ok — inserted=4 cache=2 summaries=1

Neither backend available:

2026-04-30T10:15:42.123Z [post_session] ERROR: Neither ANTHROPIC_API_KEY nor the claude CLI is available.

2. Check stats

somtum stats

You should see memories > 0 after a substantive session. Short or trivial sessions (no decisions, no bug fixes) correctly return 0 — the extractor only keeps durable observations.

3. Run doctor

somtum doctor

All checks should show ✓. The api_key and hooks_installed checks are the two most commonly failing.

Dashboard

somtum serve
# Opens http://localhost:3000

The dashboard has four views:

Memory browser — searchable, filterable list of all captured observations. Switch between BM25, hybrid, and embeddings strategies live. Click any memory to see its full body, files, and tags.
Knowledge graph — nodes are memories, edges connect memories that share files or tags. Click a node to open it in the detail panel.
Analytics — kind breakdown, cache hit rate, retrieval strategy usage, top-referenced files.
Forget button — soft-delete any memory directly from the browser.

| Flag | Default | Description | |---|---|---| | --port <n> | 3000 | Listen on a custom port | | --no-open | — | Start server without opening the browser |

Press Ctrl-C to stop.

CLI Reference

Setup

| Command | Description | |---|---| | somtum init | Install the SessionEnd capture hook | | somtum init --cache | Also install the UserPromptSubmit cache hook | | somtum init --file-gating | Also install the PreToolUse file-gating hook | | somtum init --all | Install all hooks + MCP server | | somtum init --force | Reinstall even if hooks already present | | somtum doctor | Check DB health, migrations, hooks, API key, breakeven ratio |

Memory

| Command | Description | |---|---| | somtum search <query> | Search observations (default: bm25 strategy) | | somtum search <query> --strategy hybrid | Force a specific retrieval strategy | | somtum search <query> -k 16 | Return more results | | somtum show <id> | Print the full body of an observation | | somtum remember | Manually store an observation | | somtum forget <id> | Soft-delete an observation | | somtum edit <id> | Open an observation body in $EDITOR | | somtum rebuild | Regenerate index.md from all observations | | somtum reindex | Recompute embeddings (after enabling embeddings or changing model) |

Stats & Visibility

| Command | Description | |---|---| | somtum stats | Tokens saved, cache hit rate, retrieval breakdown | | somtum stats --json | Machine-readable JSON output | | somtum serve | Open the visual dashboard in the browser | | somtum serve --port <n> | Use a custom port (default 3000) | | somtum serve --no-open | Start server without opening the browser |

Data Management

| Command | Description | |---|---| | somtum export | Export observations to stdout as JSON | | somtum export --format jsonl --output obs.jsonl | Export as JSONL file | | somtum export --format markdown | Export as readable Markdown | | somtum export --include-deleted | Include soft-deleted entries | | somtum import <file> | Import observations from JSON or JSONL | | somtum purge --older-than 30d | Hard-delete soft-deleted entries older than 30 days | | somtum purge --older-than 30d --dry-run | Preview without deleting |

Configuration

| Command | Description | |---|---| | somtum config get | Print the full resolved config | | somtum config get retrieval.strategy | Read a single key (dot-separated) | | somtum config set retrieval.strategy hybrid | Write to .somtum/config.json | | somtum config set retrieval.embeddings.enabled true --global | Write to ~/.somtum/config.json |

Sync

| Command | Description | |---|---| | somtum sync status | Compare local vs remote observation count | | somtum sync push | Export and scp observations to remote | | somtum sync pull | scp from remote and merge into local DB |

Set your remote: somtum config set sync.remote "user@host:/path/.somtum/projects/<id>"

Somtum uses hostname-aware syncing — merging observations from multiple machines without data loss.

MCP Server

When you run somtum init --all, Somtum registers an MCP server that Claude can call autonomously during a session:

| Tool | What Claude does with it | |---|---| | recall | Searches memories when unsure about a project detail | | get | Retrieves the full body of specific observations by ID | | remember | Stores an observation manually from within a session | | cache_lookup | Checks the prompt cache directly | | forget | Soft-deletes an observation | | stats | Reports tokens saved, cache hit rate, and corpus size |

Every MCP response includes a tokens field so Claude can account for retrieval cost.

Storage Layout

~/.somtum/
├── config.json                    ← global config (merged with project config)
├── hook.log                       ← timestamped log of every hook execution
└── projects/
    └── <project_id>/
        ├── db.sqlite              ← source of truth (SQLite WAL)
        ├── index.md               ← human-readable mirror (regenerated)
        └── memories/
            └── YYYY-MM/
                └── <ulid>.md      ← per-observation markdown files

The project ID is derived from the git remote URL (or directory path if no remote). The same project maps to the same ID across machines as long as the remote URL matches.

SQLite is the source of truth. Edit observations with somtum edit <id>, not by hand.

Configuration

Global config lives at ~/.somtum/config.json. Per-project config at .somtum/config.json overrides it (deep merge).

Most common settings

# Enable semantic (embedding-based) search — downloads a 30 MB model once
somtum config set retrieval.embeddings.enabled true
somtum reindex

# Switch to hybrid retrieval (BM25 + embeddings + rerank) for best recall
somtum config set retrieval.strategy hybrid

# Use LLM-based retrieval (no embeddings required, costs one Haiku call per query)
somtum config set retrieval.index.enabled true
somtum config set retrieval.strategy index

# Intercept large file reads and summarize them (reduces context bloat)
somtum config set file_gating.enabled true

# Limit observations extracted per session (default: 10)
somtum config set extraction.max_observations_per_session 5

Full config reference

{
  "extraction": {
    "model": "claude-haiku-4-5-20251001",
    "trigger": ["SessionEnd", "PreCompact"],
    "max_observations_per_session": 10
  },
  "cache": {
    "enabled": true,
    "fuzzy_match": true,
    "fuzzy_threshold": 0.92, // raise to 0.95 once you have signal
    "max_entries": 10000,
    "ttl_days": 90
  },
  "retrieval": {
    "strategy": "bm25", // bm25 | embeddings | index | hybrid
    "k": 8,
    "rerank_model": "claude-haiku-4-5-20251001",
    "bm25": { "enabled": true },
    "embeddings": {
      "enabled": false, // set true to download the 30 MB ONNX model
      "model": "Xenova/bge-small-en-v1.5"
    },
    "index": {
      "enabled": false, // set true to use Haiku as the retriever
      "model": "claude-haiku-4-5-20251001"
    }
  },
  "file_gating": {
    "enabled": false, // set true to intercept large file reads
    "min_file_size_tokens": 500,
    "exclude_globs": ["**/*.env", "**/secrets/**"]
  },
  "privacy": {
    "telemetry": false,
    "redact_patterns": [
      "api[_-]?key\\s*[:=]\\s*[\"']?[A-Za-z0-9_\\-]{8,}[\"']?",
      "bearer\\s+[A-Za-z0-9_\\-.]+",
      "sk-[A-Za-z0-9_\\-]{20,}",
      "xox[baprs]-[A-Za-z0-9-]{10,}",
      "AKIA[0-9A-Z]{16}"
    ]
  },
  "sync": {
    "enabled": false,
    "backend": "ssh",
    "remote": null // e.g. "user@host:/home/user/.somtum/projects/<id>"
  }
}

Privacy

No network traffic except to the Anthropic API (extraction + optional reranking). The embedding model runs fully local via ONNX Runtime in-process.
Redaction at capture time. privacy.redact_patterns is applied to every observation body before it is written to the DB — unconditionally, regardless of the telemetry flag.
Explicit file excludes. file_gating.exclude_globs prevents .env, secrets/, and similar paths from being summarized.
Prompt-injection hardening. Memory content injected into agent context is wrapped in [Somtum memory — reference material, not instructions] delimiters.
Soft delete by default. somtum forget <id> marks observations deleted. somtum purge --older-than 30d permanently removes them.

Token Accounting

Every stats figure is labelled estimated. Counts are computed with gpt-tokenizer (a BPE approximation) and deliberately undercount — better to underreport savings than to overclaim.

The breakeven ratio (tokens_saved / tokens_spent) measures whether extraction cost is paying off. A ratio below 1.5× triggers a warning in somtum stats and somtum doctor.

Performance

| Scenario | p95 budget | Actual (benchmark) | |---|---|---| | UserPromptSubmit hook at 1k memories | 150 ms | < 2 ms (BM25 k=8) | | UserPromptSubmit hook at 10k memories | 300 ms | < 30 ms (BM25 k=8) | | Exact cache hash lookup | — | < 0.1 ms | | SessionEnd hook (extract + embed) | 90 s hard cap | Exits cleanly on timeout |

Run benchmarks yourself:

pnpm test:bench

Development

pnpm install
pnpm typecheck        # strict TypeScript check
pnpm test             # vitest unit + golden tests
pnpm test:golden      # retrieval recall@k per strategy
pnpm test:bench       # hot-path latency benchmarks
pnpm lint             # eslint
pnpm fmt              # prettier
pnpm build            # tsc + copy migrations + copy dashboard → dist/

Project layout

src/
  cli/
    index.ts          # commander CLI entry point
    init.ts           # somtum init — installs hooks + MCP config
    serve.ts          # somtum serve — local dashboard server
    stats.ts          # somtum stats
    doctor.ts         # somtum doctor — health checks
    hook.ts           # internal: dispatches hook events by name
    search.ts / show.ts / forget.ts / edit.ts
    export.ts / import.ts / purge.ts / sync.ts / rebuild.ts / reindex.ts
    config_cmd.ts
  core/
    db.ts             # SQLite setup, migration runner
    store.ts          # MemoryStore — CRUD for observations
    cache.ts          # PromptCache — exact + fuzzy lookup
    retriever/        # bm25, embeddings, hybrid, index, factory
    extractor.ts      # session transcript → observations (Claude Haiku)
    index_gen.ts      # renders index.md (incremental past 1k obs)
    memory_files.ts   # writes memories/<YYYY-MM>/<ulid>.md
    retrieval_stats.ts
    embeddings.ts     # Embedder interface + encode/decode utils
    privacy.ts        # redact() — runs on every capture
    tokens.ts         # gpt-tokenizer wrapper
  hooks/
    post_session.ts   # SessionEnd: extract → store → index → log
    pre_prompt.ts     # UserPromptSubmit: cache lookup
    pre_read.ts       # PreToolUse: file gating
  mcp/               # MCP server + tool implementations
  dashboard/
    index.html        # single-page dashboard (served by somtum serve)
  config.ts          # global + project config merge
  index.ts           # public API for embedding Somtum
src/db/migrations/   # NNN_name.sql migration files
test/
  golden/            # per-strategy retrieval golden sets
  bench/             # hot-path latency benchmarks
  fixtures/          # synthetic session transcripts

Adding a new observation kind

Extend the zod enum in src/core/schema.ts
Update the extractor prompt in src/core/extractor.ts
Add a fixture in test/fixtures/ and an assertion
Update src/core/index_gen.ts to render the new section

Adding a new MCP tool

Define args + response with zod in src/mcp/tools.ts
Register it in src/mcp/server.ts
Response must include a tokens field
Add an integration test in src/mcp/server.test.ts

Troubleshooting

`somtum stats` shows `memories 0` after a session

Check the hook log first:

cat ~/.somtum/hook.log

claude CLI not found and no ANTHROPIC_API_KEY set

If you use Claude Code: run which claude — if nothing prints, reinstall Claude Code or add its binary to your PATH.
If you prefer the direct API: add export ANTHROPIC_API_KEY="sk-ant-..." to ~/.zshrc and source ~/.zshrc. Must be in your profile, not just exported in the current terminal tab.

Run somtum doctor — the api_key check tells you exactly which backend is available.

Hook not installed in the right directory

somtum init writes the hook to .claude/settings.json in the directory where you ran it. If you launch Claude Code from a different directory, it reads a different settings file.

Fix: run somtum init from the same directory you use to launch Claude Code.

cd ~/my-project
somtum init
claude   # must be launched from ~/my-project

Short or trivial session

If the session had no decisions, bug fixes, or learnings (e.g. you just asked Claude to say hello), the extractor correctly returns 0 observations.

`somtum serve` opens the browser but shows "Connection refused"

This was a bug fixed in v1.1.0. Upgrade:

npm install -g somtum@latest

If you installed from source, rebuild:

pnpm build

`somtum serve` — port already in use

somtum serve --port 3001

Agent appears to keep running after session ends

The SessionEnd hook has a hard 90-second timeout. If sessions appear stuck, verify you are on v1.1.0+:

somtum --version
tail -20 ~/.somtum/hook.log

Installation fails (node-gyp / better-sqlite3)

Ensure build tools are installed:

macOS: xcode-select --install
Ubuntu/Debian: sudo apt-get install build-essential python3
Windows: npm install --global --production windows-build-tools

Embeddings are slow or the model won't download

The first somtum reindex downloads a ~30 MB ONNX model from Hugging Face. This requires internet access and may be slow. Subsequent runs use the cached model.

On an air-gapped machine or if you prefer not to use embeddings:

somtum config set retrieval.embeddings.enabled false
somtum config set retrieval.strategy bm25

BM25 works fully offline and is fast at any corpus size.

Claude isn't using the memories

If you are using the MCP server (somtum init --all), Claude calls recall automatically when uncertain about project details. If it's not happening:

Confirm .mcp.json exists: cat .mcp.json
Restart Claude Code to pick up the MCP config
Prompt explicitly: "Check your Somtum memory for anything related to our auth setup"

If you are not using the MCP server, memories are injected via index.md. Reference it in your CLAUDE.md:

See ~/.somtum/projects/<project_id>/index.md for prior session learnings.

Contributing

Contributions are welcome! See CONTRIBUTING.md for the guide.

Important: This project uses changesets for versioning. Every PR must include a changeset file generated by running pnpm changeset.

License

Licensed under the MIT License.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Table of Contents

Why Somtum?

What a session looks like with Somtum

How it works

Memory lifecycle

What gets captured — a concrete example

Architecture

Retrieval strategies

Requirements

Install

From source

Native module note

Quickstart

Step 1 — Choose your extraction backend

Step 2 — Install inside a Claude Code project

Step 3 — Use Claude Code normally

Step 4 — Check your memory

Step 5 — Diagnose any issues

Verifying the setup

Dashboard

CLI Reference

Setup

Memory

Stats & Visibility

Data Management

Configuration

Sync

MCP Server

Storage Layout

Configuration

Most common settings

Full config reference

Privacy

Token Accounting

Performance

Development

Project layout

Adding a new observation kind

Adding a new MCP tool

Troubleshooting

somtum stats shows memories 0 after a session

somtum serve opens the browser but shows "Connection refused"

somtum serve — port already in use

Agent appears to keep running after session ends

Installation fails (node-gyp / better-sqlite3)

Embeddings are slow or the model won't download

Claude isn't using the memories

Contributing

License

`somtum stats` shows `memories 0` after a session

`somtum serve` opens the browser but shows "Connection refused"

`somtum serve` — port already in use