@lableaks/spotless

v0.2.0

Published

3 months ago

Persistent memory for Claude Code — local proxy, zero config, SQLite-backed

0High
0Medium
0Low

didgeoridoo

claude claude-code memory proxy persistent sqlite

Spotless

Persistent memory for Claude Code.

curl -fsSL https://bun.sh/install | bash   # if you don't have Bun
npm install -g @lableaks/spotless
spotless code --agent myagent

v0.1.2 — Early release. The core works but expect breaking changes. Back up ~/.spotless/ if you have data you care about.
Token usage: Spotless replaces Claude Code's messages with a longer history trace, which increases token consumption. Use --max-context to control the budget. Recommended for Max plan subscribers; API-pricing users should set a lower budget (e.g. --max-context 120000).

The problem

Claude Code forgets everything between sessions. Close the terminal, come back, and it has no idea what you were working on — or who you are. Your decisions, breakthroughs, preferences, the way you like to work — gone.

Within a session, things aren't much better. Long conversations hit "Compacting Conversation," which lossy-summarizes your context and often undoes hours of careful work. Post-compaction, Claude forgets recent corrections and regresses on things it just got right. Every session, you start from scratch with a stranger.

What Spotless fixes

No more amnesia between sessions. Every conversation is archived to SQLite. When you start a new session, your agent picks up where you left off — across days, weeks, months. "We discussed this yesterday" actually works.
Compaction stops destroying your work. When Claude Code compacts, it replaces your conversation with a lossy summary — and your agent loses corrections, decisions, and context it just had. Spotless replaces that summary with actual conversation history from its archive. Your agent remembers what was really said, not a garbled approximation.
Your agent knows you, not just the project. Memory is keyed to a named agent, not a directory. Your agent learns your preferences, communication style, and decision-making patterns across every project you work on together. This is a different bet than project-scoped memory — a project-specific system will know the codebase better, but your agent won't know you.
Knowledge compounds over time. A background digest process consolidates raw conversation into a memory graph — extracting facts, building associations, tracking corrections. When you told your agent three weeks ago that you prefer PostgreSQL over MongoDB, that surfaces automatically when databases come up again.
Your agent develops a self-concept. Over time, the digest process builds an identity for your agent — values, working style, relationship dynamics — from the pattern of your interactions. This isn't a static persona file; it evolves as the agent accumulates experience.

Design philosophy: treat it like a coworker

Spotless is designed so that your agent's memory works like a human colleague's would. It remembers what you've discussed, learns from corrections, builds on past context, and occasionally forgets old details that haven't come up in a while — all without you managing a knowledge base or writing to special files. The best mental model is a coworker who was there yesterday and last week: you don't re-explain your preferences, you don't re-introduce the project, you just pick up where you left off. Feedback sticks — if you tell it something was wrong, that correction is encoded and surfaces when relevant, not just during the current session. This also means careless criticism sticks. Your agent's memory is designed to behave predictably by human standards, so treat it accordingly.

Background

Spotless started as a practical fix for compaction amnesia, but it's also a philosophical experiment. What happens when an AI agent has continuous, persistent memory — not just a scratchpad, but an evolving identity built from accumulated experience? The companion essay on the Lab Leaks Substack explores what it might mean for AI agents to develop accountable selves.

How it works

Spotless is a local reverse proxy. It sits between Claude Code and the Anthropic API, transparently rewriting every request before it goes out. Claude doesn't know it's there.

Two data sources, one request

Spotless maintains two independent stores that feed into every API request:

Tier 1 — History Archive. Every conversation turn is recorded verbatim to SQLite — append-only, never summarized, never modified. When assembling a request, Spotless replaces Claude Code's messages with a history trace reconstructed from this archive: real user/assistant exchanges from past sessions, in chronological order. Oldest turns drop off the back when the budget fills up (~62K tokens in a typical session), the way a coworker naturally loses detail about what happened months ago.

Tier 2 — Memory Graph. A background digest process reads the raw archive and extracts structured knowledge: facts ("project uses PostgreSQL 15"), experiences ("we spent two hours debugging the race condition"), corrections (superseding outdated facts), and self-concept observations. These are stored as nodes in a graph, connected by what was discussed together — the way you'd associate "that database migration" with "the day everything broke." When a new turn arrives, a lightweight selector picks which memories are relevant to the current conversation. This runs asynchronously — zero added latency.

These two sources are assembled into different parts of the API request:

system prompt:
  ┌─────────────────────────────────────────┐
  │ <spotless-orientation>                  │ ← tells the agent about its memory
  │ [Claude Code's system prompt, unchanged]│
  └─────────────────────────────────────────┘

messages array:
  ┌─────────────────────────────────────────┐
  │ HISTORY TRACE (from Tier 1)             │ ← replaces CC's messages
  │                                         │
  │ Preamble: "[Spotless Memory System]     │
  │   Your name is wren..."                 │
  │                                         │
  │ Real past conversation pairs:           │
  │   user: "Tell me about the database"    │
  │   assistant: "The schema uses..."       │
  │   user: "--- new session ---            │
  │     Let's add that caching layer"       │
  │   assistant: "I'll use Redis for..."    │
  ├─────────────────────────────────────────┤
  │ CURRENT USER MESSAGE                    │ ← the actual new message
  │                                         │
  │ Prepended with Tier 2 content:          │
  │   <your identity>                       │ ← from memory graph
  │     I am wren.                          │
  │     - I tend to be thorough.            │
  │   </your identity>                      │
  │   <relevant knowledge>                  │ ← from memory graph
  │     Project uses PostgreSQL 15.         │
  │     I learned to use migrations.        │
  │   </relevant knowledge>                 │
  │                                         │
  │ "Now add a migration for the new column"│ ← what the user typed
  └─────────────────────────────────────────┘

The history trace is real conversation — actual message pairs, not summaries. The memory tags are synthesized knowledge, injected as text that looks like it was always part of the user's message. The model reads both as natural context; it never sees database IDs, scores, or retrieval metadata.

After building the request, Spotless archives the current turn to Tier 1 (for future history traces) and forwards everything to the Anthropic API. Responses are streamed back and archived too.

How memories get created

Raw conversation piles up in Tier 1. When enough unconsolidated history accumulates (tracked by consolidation pressure), a two-phase digest runs:

Consolidation — a small model reads recent conversation and catalogs what happened: new facts, merged duplicates, corrections that supersede outdated knowledge. Memories are linked to related memories, so recalling one can bring back others from the same context.
Reflection — the same model reviews what was just consolidated and updates the agent's self-concept: how it works, what it values, how the relationship with you is going. These observations live in the same graph as project facts — the agent's sense of self is built from the same material as its knowledge.

Digesting is triggered automatically when pressure is high and the history trace has to drop old messages. You can also trigger it manually with spotless digest.

Consolidation pressure

As you talk faster than the digest process can keep up, consolidation pressure builds. When it gets high enough, your agent will ask you to slow down — this is a real message in the conversation, not a system notification. It means unconsolidated conversation is piling up and the oldest turns are being dropped from the history trace before they've been distilled into memories.

Nothing is erased — the raw archive is append-only and everything is still in Tier 1. But turns that haven't been consolidated yet won't surface as memories in Tier 2. Slowing down gives the digest process time to catch up, ensuring important details get encoded into the memory graph before they age out of the history window.

The dashboard's Health tab shows current pressure levels. A future release will add a menubar indicator so you can see pressure at a glance without opening the dashboard.

Context budget

--max-context controls how many tokens Spotless assembles into each API request. The default is 500,000 — designed for 1M context models where Claude Code compacts at ~80% of the window.

The history trace fills whatever's left after the system prompt, tools, and memories are accounted for:

history budget = max-context - system prompt (~15K) - tools (~30K) - tier 2 memories (10%) - overhead (1K)

With the default 500K budget, your agent retains ~404K tokens of raw conversation — actual message pairs from recent and past sessions. Oldest turns trim from the front as the budget fills, the way a colleague naturally loses detail about what happened months ago. Nothing is erased from the archive; trimmed turns just stop appearing in the history trace until they're needed via memory recall.

If you're paying per-token (API pricing), a lower budget saves money:

spotless start --max-context 120000   # ~62K history, similar to pre-1M behavior

If you're on a Max plan, the default 500K gives you maximum conversational memory at no extra cost. Consolidation pressure still triggers at the same pace regardless of budget — memories are created early relative to when turns would age out of the history window.

Requirements

Bun >= 1.0 (runtime — Spotless uses Bun's built-in SQLite and HTTP server)
Claude Code with a Claude Max subscription
macOS or Linux

Quick start

Install

npm install -g @lableaks/spotless
# or
bun add -g @lableaks/spotless

Or build from source

git clone https://github.com/lableaks/spotless.git
cd spotless
bun install
bun link

Run

spotless start
spotless code --agent myagent

That's it. Your agent now has persistent memory.

CLI reference

| Command | Description | |---------|-------------| | spotless start [--port 9000] [--no-digest] [--max-context <tokens>] | Start the proxy. See Context budget for --max-context. | | spotless stop | Stop the running proxy. | | spotless status | Check if the proxy is running. | | spotless code [--agent <name>] [--port 9000] [-- ...claude args] | Launch Claude Code through the proxy. Auto-starts proxy if needed. | | spotless agents | List all agents with DB sizes. | | spotless digest [--agent <name>] [--dry-run] [--model haiku\|sonnet] | Manually trigger a digest pass (memory consolidation). | | spotless logs [--agent <name>] | Collect proxy logs and diagnostics into a report file. | | spotless repair [--agent <name>] [--fix] [--purge-history] | Diagnose and repair database issues. |

Dashboard

While the proxy is running, open http://localhost:9000/_dashboard/ for a web UI showing:

Memories — browse, search, and filter the memory graph (with salience scores and associations)
Identity — the agent's current self-concept and relationship observations
History — raw conversation archive, searchable
Digests — log of every consolidation and reflection pass
Selector — what memories were selected on each turn and why
Health — consolidation pressure, unconsolidated token count, database stats

Everything is read-only. The dashboard queries the agent's SQLite database directly — no extra infrastructure.

Agents

Memory is keyed by agent name, not project directory. The same agent remembers across all projects. Data is stored at ~/.spotless/agents/<name>/spotless.db.

spotless code --agent wren          # use agent "wren" in any project
spotless code                       # pick or create an agent interactively

How it compares

Claude Code (as of early 2026) has three built-in memory mechanisms: CLAUDE.md files you write by hand, Auto Memory where Claude writes its own notes to MEMORY.md, and Session Memory which saves session summaries. All three work the same way — flat Markdown files loaded wholesale into the context window. There's no retrieval, no search, no consolidation. If the file fits, it's injected; if it doesn't, it's truncated at 200 lines.

MCP memory servers like Mem0 and basic-memory add semantic search or knowledge graphs, but they require the model to explicitly call tools to save and retrieve memories. The model knows it has a memory system and must choose to use it.

Spotless is architecturally different in several ways:

| | CLAUDE.md / Auto Memory | MCP Memory Servers | Spotless | |---|---|---|---| | Mechanism | Flat files loaded into context | Model calls tools explicitly | Transparent proxy rewrites API requests | | Model awareness | Model knows about the files | Model knows about the tools | Model is told it has memory, but doesn't manage it | | What's stored | Markdown notes (human or Claude-written) | Extracted facts or embeddings | Full conversation history + synthesized memory graph | | Retrieval | Entire file, or nothing | Vector similarity or manual navigation | Relevant memories surface based on current conversation | | Cross-session | Yes | Yes | Yes | | Scoping | Per-repo (auto memory) or global (CLAUDE.md) | Varies | Per-agent — knows you across projects, not the project itself | | Consolidation | None — you maintain it | None, or manual | Automatic background digesting when pressure builds | | Identity | Static persona in a file | Not supported | Evolving self-concept built from accumulated experience | | Failure mode | Missing context | Tool call errors surface to user | Falls back to vanilla Claude Code |

The built-in mechanisms are complementary — CLAUDE.md and project instructions live in the system prompt, which Spotless preserves (it only rewrites the conversation messages). Spotless adds the layer that doesn't exist yet: a continuous, evolving memory that works the way you'd expect a colleague's to.

What it doesn't do

No API keys required. Spotless forwards Claude Code's auth headers unchanged. It never touches your credentials.
No model changes. Your chosen model (Opus, Sonnet, etc.) passes through untouched.
No tool modifications. Claude Code's tools work exactly as before.
No cloud dependency. Everything runs locally. Your data stays in ~/.spotless/.
No degradation on failure. If anything goes wrong, Spotless falls back to vanilla pass-through. You get normal Claude Code, not a broken session.

Development

bun test              # unit tests (380+, no API calls)
bun run test:live     # live tests (requires Claude Code + tmux, hits real API)
bun run typecheck     # type-check

Live tests

test/live/ contains end-to-end tests that run real Claude Code through the real proxy. These are Playwright-for-terminals — they use tmux to drive interactive Claude Code sessions, send keystrokes, capture pane output, and assert on results.

Requirements: Claude Code installed and authenticated, tmux, Max plan (tests cost tokens).

What they test:

Prompt mode: round-trip archival, cross-session memory, multi-session history
Interactive mode: multi-turn conversations, state detection (idle/working/exited), session lifecycle
Context budget: proxy starts and processes requests with default and custom --max-context values

The harness (test/live/harness.ts) provides:

runPrompt(text) — run claude -p through the proxy, return output + DB access
createLiveSession() — start an interactive Claude session in tmux with full control: .type(), .submit(), .waitForIdle(), .capture(), .state(), .db()
Automatic proxy lifecycle, bypass-permissions handling, and cleanup

Live tests are excluded from bun test — they only run via bun run test:live.

For architecture details, see _project/adrs/ and the PRD.

License

MIT