tailings
v0.0.1
Published
Pull a directory's coding-agent session history (Claude Code, Codex, OpenCode, Gemini) into that directory so the next agent is instantly caught up.
Maintainers
Readme
Tailings
The tail-end of the value you already received. In mining, tailings are the material left over after the valuable ore is extracted — except here the "waste" (your pruned agent transcripts and the memories agents wrote) is the gold. The labs throw it away on a clock; we pan it back out and keep it.
Tailings pulls a directory's coding-agent history into that directory. Run it in a folder and it gathers the sessions + memories that Claude Code, Codex, OpenCode, and Gemini produced for that folder, and drops them in as plain text — so when you switch agents (or hand off to a weaker one), the next agent reads them and is instantly caught up on the prior work and decisions. It grabs the chats before the tools prune/delete them, because once they're gone they're gone.
This file is the brief for whoever builds it next. Nothing is scaffolded yet on
purpose. Read this, then read tvagent, then build.
Scope — read this twice
- Per-directory, not global.
tailings pull --dir .returns only the sessions whose working directory is this one (and its subdirs). It must never dump your entire chat history into a folder. The directory filter is the whole point. - Just text. Transcripts are small text — KB to a few MB each. Don't treat
size as a problem; agents grep
node_modulesall day. No need to summarize for size reasons (a distill pass is optional polish, not a requirement). - The consumer is the next agent, in any tool — even a weak one. The memories these tools write are already well-structured for an agent to read. Tailings' job is just to co-locate them in the working directory so the next agent finds them without being told where to look. A weaker model + this context beats a stronger model starting cold.
- Local-first, no selling, no outcome prediction. This is purely about not losing your own context. Keep it boring.
Why this exists
- Claude Code prunes session transcripts after ~30 days. Codex and OpenCode have their own retention policies and large, churning local stores.
- Agent-written memories (Claude's per-project
memory/, Codex's git-backed~/.codex/memories/) are some of the highest-signal artifacts an agent produces — and are exactly the kind of thing a lab is likely to stop exposing locally over time. - Once it's gone, it's gone. The only durable hedge is to copy the on-disk artifacts onto our own storage before the source deletes them, then make that archive usable by the next agent.
The one thing to internalize first: tvagent already did the hard part
../tvagent is a working Effect + TypeScript monorepo that locates and parses
all three tools' local stores. It does this purely to extract token usage
and then discards the conversation content.
Tailings is the inverse: keep the content (and memories), discard the token math. So ~80% of the locating/parsing code is already written next door — copy the structure, change what you keep.
Files to study and lift from (../tvagent):
| Concern | File | What to reuse |
|---|---|---|
| Where each tool stores data | packages/core/src/LocalPaths.ts | The LocalPaths Effect service — already resolves all three roots. |
| Claude transcript parsing | packages/adapter-claude/src/ClaudeLocal.ts | discoverClaudeFiles, the JSONL line decode, the cwd field handling. |
| Codex rollout parsing | packages/adapter-codex/src/CodexLocal.ts | discoverCodexFiles, session_meta/turn_context decoding for cwd. |
| OpenCode SQLite reads | packages/adapter-opencode/src/OpenCodeLocal.ts | node:sqlite DatabaseSync read-only pattern via Effect.acquireRelease + Effect.scoped. |
| Domain modeling | packages/core/src/Domain.ts | Tagged errors (SourceReadError, UsageDecodeError), Schema decoding idioms. |
The adapters in tvagent only build UsageEvents. For Tailings, the equivalent
output is a SessionArchive / MemoryArchive that preserves the actual
message content.
Where the data actually lives (verified on this machine)
| Tool | Sessions (full transcripts) | Memories / agent knowledge | Keyed by dir? |
|---|---|---|---|
| Claude Code | ~/.claude/projects/<encoded-cwd>/<uuid>.jsonl | ~/.claude/projects/<encoded-cwd>/memory/*.md + memory/MEMORY.md; global ~/.claude/CLAUDE.md | Yes, natively |
| Codex | ~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl | ~/.codex/memories/ (git repo: MEMORY.md, memory_summary.md, raw_memories.md, rollout_summaries/) + memories_1.sqlite | No — date-partitioned |
| OpenCode | ~/.local/share/opencode/opencode.db → session / message / part tables | ~/.config/opencode/ (AGENTS.md, knowledge/); per-project .opencode/ | Yes (session.directory column) |
| Gemini ⚠️ | Not installed on this machine — verify before building. Expected: ~/.gemini/tmp/<project-hash>/ (logs + /chat save checkpoints) | GEMINI.md (hierarchical, like CLAUDE.md); ~/.gemini/ | Likely (per-project-hash dirs) — confirm |
tvagentcovers Claude/Codex/OpenCode but not Gemini — Gemini is net-new for Tailings. Confirm its real on-disk layout the first time it's present before writing the adapter; treat the paths above as a hypothesis.
Directory keying — the crux of "run it in any dir"
Tailings must work as tailings pull --dir . and return only that directory's
history. The difficulty differs per tool:
- Claude — trivial. The cwd is encoded into the folder name by replacing
both
/and.with-. So/Users/me/Desktop/projects→-Users-me-Desktop-projects, and/Users/me/.config/x→-Users-me--config-x(note the double dash from the dot). Compute the encoded name fromprocess.cwd()and read that one folder. Every JSONL record also carries acwdfield, so you can filter exactly (covers subdirs). - OpenCode — trivial.
SELECT … FROM session WHERE directory = ?, then joinmessage/partonsession_id. Content is JSON inmessage.dataandpart.data. - Codex — the hard one. Sessions are date-partitioned, and
~/.codex/session_index.jsonlonly hasid/thread_name/updated_at(no cwd). You must scanrollout-*.jsonl, readcwdfrom thesession_meta(andturn_context) records, and filter. Build and maintain our owncwd → [session files]index (e.g. a small SQLite at~/.local/share/tailings/index.db), updated incrementally on each run. - Gemini — unknown until verified. If sessions live under a per-project-hash
folder (
~/.gemini/tmp/<hash>/), recover the mapping the same way: read the cwd recorded inside the logs/checkpoints, or hash the cwd if the scheme is derivable. Don't guess — inspect a real session first.
⚠️ Cautions for the builder:
- These transcripts contain secrets/keys/tokens that have been pasted into chats. Scrub before anything leaves the machine. Add a redaction pass.
- The Codex store is large (
logs_2.sqlitewas ~1.6 GB here). Read selectively, stream, never load whole stores into memory.
How to use the SDKs + Effect
The project already depends on all three SDKs and Effect (mirror the versions
pinned in ../tvagent/package.json):
@anthropic-ai/claude-agent-sdk^0.3.15@openai/codex-sdk^0.135.0@opencode-ai/sdk^1.15.12effect4.0.0-beta.74
Be precise about what the SDKs are and aren't for. The SDKs drive agents; they do not hand you back history the tool has already pruned. So:
Extraction layer → filesystem / SQLite, NOT the SDKs
Saving chats from deletion = copying the on-disk artifacts before the source
deletes them. That's direct FS + node:sqlite reads — exactly the tvagent
pattern. This is the core of Tailings and does not need any SDK.
- One Effect service per source (
ClaudeArchive,CodexArchive,OpenCodeArchive), each exposingarchive()(copy everything to our store) andpull(dir, period)(return one directory's content). - Reuse the tvagent idioms:
Context.Servicefor services,Layer.effectfor wiring,Effect.fn(...)for traced operations,Schemafor decoding each record shape, tagged errors for failures, andEffect.acquireRelease+Effect.scopedfor SQLite handles (open{ readOnly: true }). - A
TailingsPathsservice (copyLocalPaths.ts) resolves source roots and our own archive root (~/.local/share/tailings/).
Where the SDKs do earn their place
@anthropic-ai/claude-agent-sdk— optional distillation. Not for size — the raw text is fine to keep and read. It's only worth a summarization pass if you want a tightAGENTS.mdindex (decisions, files touched, gotchas, open threads) sitting above the full transcripts in./.tailings/. Use a cheap model (e.g.claude-haiku-4-5-20251001); it's volume work. Skip it for v1.@opencode-ai/sdk— cleaner read path (optional). When an OpenCode server is running, its client (session.list(), message APIs) is a more stable way to read sessions than parsing the raw DB. Keep the raw-SQLite reader as the offline fallback (no server required), the way tvagent does.@openai/codex-sdk— rehydration (later). Useful for programmatically resuming/seeding Codex threads from an archive, not for reading old rollouts.
Effect everywhere
Keep the whole pipeline in Effect: typed errors instead of throws, Layers for
dependency injection (so adapters are swappable and testable with in-memory
fixtures like tvagent's *.test.ts files), and Schema as the single source of
truth for every on-disk record shape. Mirror tvagent's test style — feed sample
JSONL lines / rows through the normalizers and assert on the decoded output.
How the next agent consumes the archive (the real unlock)
An npx/CLI process cannot inject into a running agent's context window. It
doesn't need to — all three tools auto-read a shared instructions file on
startup:
- Claude Code reads
CLAUDE.md(andAGENTS.md) - Codex reads
AGENTS.md - OpenCode reads
AGENTS.md - Gemini reads
GEMINI.md(andAGENTS.md)
So tailings pull should write a cross-tool digest into the target directory's
AGENTS.md (or a ./.tailings/CONTEXT.md referenced from it), and keep the
fuller per-session text alongside in ./.tailings/. Then every next agent, in
any tool, picks it up natively — no piping, no MCP required. A weaker model
with this folder beats a stronger model starting cold. That's the whole payoff.
Keep AGENTS.md itself tight (an index + the high-signal bits: decisions,
touched files, gotchas, and the agent-written memories) and let the raw
per-session transcripts live in ./.tailings/ for an agent to open on demand —
they're just text; size isn't the constraint.
Secondary delivery modes (nice-to-have, build later): stdout for piping, an MCP
server (tailings mcp) for on-demand querying, a slash-command wrapper.
Proposed CLI surface
# THE command. Gather ONLY this directory's sessions + memories, from every
# agent tool that touched it, into the directory for the next agent to read.
tailings pull --dir . # → ./AGENTS.md (+ ./.tailings/), this dir ONLY
tailings pull --dir . --since 30d # bound how far back
tailings pull --dir . --out - # …to stdout instead of files
tailings pull --tools claude,codex,opencode,gemini # restrict sources
# Secondary: preservation. Copy raw sessions to our own store BEFORE the tools
# prune them. This store can be global, but `pull` ALWAYS re-filters by directory —
# nothing from another dir ever lands in your folder.
tailings sync # beat-the-prune-clock + refresh the cwd index
tailings mcp # expose the archive over MCP (later)pull is the product. sync exists only so the data survives long enough to be
pulled — run it on a schedule (launchd / cron) so the store stays ahead of each
tool's pruning clock. (tvagent has scripts/print-launchd-plist.mjs worth
copying for the macOS scheduling story.)
Suggested build order
- Port
LocalPaths→TailingsPaths(add the archive root). - Claude adapter first (easiest: native dir keying, plain JSONL). Get
pull --dir .writing this dir's raw transcripts to./.tailings/end-to-end. - OpenCode adapter (SQLite,
WHERE directory = ?). - Codex adapter + the
cwd → sessionsindex it requires. - Gemini adapter — verify the on-disk layout first (not present here yet).
- Memories: co-locate Claude
memory/, Codex~/.codex/memories/, OpenCodeknowledge/, and GeminiGEMINI.mdfor the matching dir. - Emit
AGENTS.mdas the defaultpulloutput (index + high-signal bits). sync/archive writing copies into~/.local/share/tailings/so data survives the prune clock; schedule it.- (optional) Distillation pass via the Claude Agent SDK (raw → brief) for
tighter
AGENTS.md— polish, not a requirement. MCP server last.
Open questions to resolve while building
- Archive format: keep raw JSONL/SQLite copies verbatim and derived digests? (Recommend yes — never lose fidelity; derive on top.)
- Dedup across re-runs (sessions grow; don't re-archive unchanged files — use mtime/size like tvagent's incremental OpenCode reader).
- Redaction: not needed for the local-only flow, but since
pullwrites into the working directory, make sure./.tailings/is.gitignored by default so pasted secrets in old transcripts don't get committed.
Built so far — pull (v1)
The pull command is implemented and verified end-to-end on this machine against
real Claude Code, Codex, and OpenCode stores. Deliberately not built yet (the
"sink"/preservation side and polish the brief marks optional): tailings sync,
the MCP server, and the distillation pass.
The CLI is built on Effect's CLI framework (effect/unstable/cli) — real
--help/--version/shell-completions, with a styled terminal summary.
pnpm install
pnpm build # bundles the CLI to dist/cli.js (shebang'd)
node dist/cli.js pull --dir . # → ./AGENTS.md (+ ./.tailings/), this dir ONLY
node dist/cli.js doctor # which agent stores are present on this machine
# or, after `pnpm link --global` (bin: tailings):
tailings pull --dir . --since 30d
tailings pull --tools claude,codex --out - # digest to stdout, no files
tailings # interactive: confirm, then pull the current dirWhat a run does:
- Resolves each tool's store via
TailingsPaths(ported fromtvagent'sLocalPaths) and gathers only this directory's sessions:- Claude — selects the encoded project folder(s) by dir prefix, parses the
JSONL, and re-confirms each session's recorded
cwdis in-tree. - OpenCode —
node:sqliteread-only;sessionrows filtered bydirectory, joined tomessage/partand rendered to plain text. - Codex — scans
rollout-*.jsonl, recovering each file'scwdfromsession_meta/turn_context, cached in our own incremental index at~/.local/share/tailings/index.db(keyed by mtime+size). Filters to in-tree. - Gemini — best-effort skeleton; skips cleanly when
~/.geminiis absent and never mis-attributes history when the (unverified) layout isn't matched.
- Claude — selects the encoded project folder(s) by dir prefix, parses the
JSONL, and re-confirms each session's recorded
- Co-locates agent memories (Claude per-project
memory/, Codex/OpenCode global knowledge). - Writes full transcripts to
./.tailings/sessions/<tool>/, drops a self-ignoring./.tailings/.gitignore, and splices a tight digest into./AGENTS.mdbetween<!-- tailings:start -->markers (never clobbering hand-written content).
Layout
packages/
core/ Domain, TailingsPaths, Period, Render (shared transcript→markdown)
adapter-claude/ ClaudeArchive (+ normalizer tests)
adapter-codex/ CodexArchive + CodexNormalize (+ index, tests)
adapter-opencode/ OpenCodeArchive + OpenCodeNormalize (+ tests)
adapter-gemini/ GeminiArchive (best-effort)
cli/ App (layers + run*), Commands (Effect CLI), Pull (orchestrator),
AgentsMd (digest+merge), Main (#! entry → Command.run)
core/Terminal.ts ANSI styling + table renderer (house style, ported from ai-hr)pnpm test runs tsc --noEmit + Vitest (21 tests over the normalizers and the
AGENTS.md merge). Everything stays in Effect: tagged errors, Layer DI per source,
Schema for on-disk record shapes — mirroring the tvagent idioms.
