tailings

v0.0.1

Published

a day ago

Pull a directory's coding-agent session history (Claude Code, Codex, OpenCode, Gemini) into that directory so the next agent is instantly caught up.

0High
0Medium
0Low

waryawayne

tailings claude-code codex opencode gemini coding-agents cli sessions memory local-first effect node

Tailings

The tail-end of the value you already received. In mining, tailings are the material left over after the valuable ore is extracted — except here the "waste" (your pruned agent transcripts and the memories agents wrote) is the gold. The labs throw it away on a clock; we pan it back out and keep it.

Tailings pulls a directory's coding-agent history into that directory. Run it in a folder and it gathers the sessions + memories that Claude Code, Codex, OpenCode, and Gemini produced for that folder, and drops them in as plain text — so when you switch agents (or hand off to a weaker one), the next agent reads them and is instantly caught up on the prior work and decisions. It grabs the chats before the tools prune/delete them, because once they're gone they're gone.

This file is the brief for whoever builds it next. Nothing is scaffolded yet on purpose. Read this, then read tvagent, then build.

Scope — read this twice

Per-directory, not global. tailings pull --dir . returns only the sessions whose working directory is this one (and its subdirs). It must never dump your entire chat history into a folder. The directory filter is the whole point.
Just text. Transcripts are small text — KB to a few MB each. Don't treat size as a problem; agents grep node_modules all day. No need to summarize for size reasons (a distill pass is optional polish, not a requirement).
The consumer is the next agent, in any tool — even a weak one. The memories these tools write are already well-structured for an agent to read. Tailings' job is just to co-locate them in the working directory so the next agent finds them without being told where to look. A weaker model + this context beats a stronger model starting cold.
Local-first, no selling, no outcome prediction. This is purely about not losing your own context. Keep it boring.

Why this exists

Claude Code prunes session transcripts after ~30 days. Codex and OpenCode have their own retention policies and large, churning local stores.
Agent-written memories (Claude's per-project memory/, Codex's git-backed ~/.codex/memories/) are some of the highest-signal artifacts an agent produces — and are exactly the kind of thing a lab is likely to stop exposing locally over time.
Once it's gone, it's gone. The only durable hedge is to copy the on-disk artifacts onto our own storage before the source deletes them, then make that archive usable by the next agent.

The one thing to internalize first: `tvagent` already did the hard part

../tvagent is a working Effect + TypeScript monorepo that locates and parses all three tools' local stores. It does this purely to extract token usage and then discards the conversation content.

Tailings is the inverse: keep the content (and memories), discard the token math. So ~80% of the locating/parsing code is already written next door — copy the structure, change what you keep.

Files to study and lift from (../tvagent):

| Concern | File | What to reuse | |---|---|---| | Where each tool stores data | packages/core/src/LocalPaths.ts | The LocalPaths Effect service — already resolves all three roots. | | Claude transcript parsing | packages/adapter-claude/src/ClaudeLocal.ts | discoverClaudeFiles, the JSONL line decode, the cwd field handling. | | Codex rollout parsing | packages/adapter-codex/src/CodexLocal.ts | discoverCodexFiles, session_meta/turn_context decoding for cwd. | | OpenCode SQLite reads | packages/adapter-opencode/src/OpenCodeLocal.ts | node:sqlite DatabaseSync read-only pattern via Effect.acquireRelease + Effect.scoped. | | Domain modeling | packages/core/src/Domain.ts | Tagged errors (SourceReadError, UsageDecodeError), Schema decoding idioms. |

The adapters in tvagent only build UsageEvents. For Tailings, the equivalent output is a SessionArchive / MemoryArchive that preserves the actual message content.

Where the data actually lives (verified on this machine)

| Tool | Sessions (full transcripts) | Memories / agent knowledge | Keyed by dir? | |---|---|---|---| | Claude Code | ~/.claude/projects/<encoded-cwd>/<uuid>.jsonl | ~/.claude/projects/<encoded-cwd>/memory/*.md + memory/MEMORY.md; global ~/.claude/CLAUDE.md | Yes, natively | | Codex | ~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl | ~/.codex/memories/ (git repo: MEMORY.md, memory_summary.md, raw_memories.md, rollout_summaries/) + memories_1.sqlite | No — date-partitioned | | OpenCode | ~/.local/share/opencode/opencode.db → session / message / part tables | ~/.config/opencode/ (AGENTS.md, knowledge/); per-project .opencode/ | Yes (session.directory column) | | Gemini ⚠️ | Not installed on this machine — verify before building. Expected: ~/.gemini/tmp/<project-hash>/ (logs + /chat save checkpoints) | GEMINI.md (hierarchical, like CLAUDE.md); ~/.gemini/ | Likely (per-project-hash dirs) — confirm |

tvagent covers Claude/Codex/OpenCode but not Gemini — Gemini is net-new for Tailings. Confirm its real on-disk layout the first time it's present before writing the adapter; treat the paths above as a hypothesis.

Directory keying — the crux of "run it in any dir"

Tailings must work as tailings pull --dir . and return only that directory's history. The difficulty differs per tool:

Claude — trivial. The cwd is encoded into the folder name by replacing both / and . with -. So /Users/me/Desktop/projects → -Users-me-Desktop-projects, and /Users/me/.config/x → -Users-me--config-x (note the double dash from the dot). Compute the encoded name from process.cwd() and read that one folder. Every JSONL record also carries a cwd field, so you can filter exactly (covers subdirs).
OpenCode — trivial. SELECT … FROM session WHERE directory = ?, then join message / part on session_id. Content is JSON in message.data and part.data.
Codex — the hard one. Sessions are date-partitioned, and ~/.codex/session_index.jsonl only has id / thread_name / updated_at (no cwd). You must scan rollout-*.jsonl, read cwd from the session_meta (and turn_context) records, and filter. Build and maintain our own cwd → [session files] index (e.g. a small SQLite at ~/.local/share/tailings/index.db), updated incrementally on each run.
Gemini — unknown until verified. If sessions live under a per-project-hash folder (~/.gemini/tmp/<hash>/), recover the mapping the same way: read the cwd recorded inside the logs/checkpoints, or hash the cwd if the scheme is derivable. Don't guess — inspect a real session first.

⚠️ Cautions for the builder:

These transcripts contain secrets/keys/tokens that have been pasted into chats. Scrub before anything leaves the machine. Add a redaction pass.
The Codex store is large (logs_2.sqlite was ~1.6 GB here). Read selectively, stream, never load whole stores into memory.

How to use the SDKs + Effect

The project already depends on all three SDKs and Effect (mirror the versions pinned in ../tvagent/package.json):

@anthropic-ai/claude-agent-sdk ^0.3.15
@openai/codex-sdk ^0.135.0
@opencode-ai/sdk ^1.15.12
effect 4.0.0-beta.74

Be precise about what the SDKs are and aren't for. The SDKs drive agents; they do not hand you back history the tool has already pruned. So:

Extraction layer → filesystem / SQLite, NOT the SDKs

Saving chats from deletion = copying the on-disk artifacts before the source deletes them. That's direct FS + node:sqlite reads — exactly the tvagent pattern. This is the core of Tailings and does not need any SDK.

One Effect service per source (ClaudeArchive, CodexArchive, OpenCodeArchive), each exposing archive() (copy everything to our store) and pull(dir, period) (return one directory's content).
Reuse the tvagent idioms: Context.Service for services, Layer.effect for wiring, Effect.fn(...) for traced operations, Schema for decoding each record shape, tagged errors for failures, and Effect.acquireRelease + Effect.scoped for SQLite handles (open { readOnly: true }).
A TailingsPaths service (copy LocalPaths.ts) resolves source roots and our own archive root (~/.local/share/tailings/).

Where the SDKs do earn their place

@anthropic-ai/claude-agent-sdk — optional distillation. Not for size — the raw text is fine to keep and read. It's only worth a summarization pass if you want a tight AGENTS.md index (decisions, files touched, gotchas, open threads) sitting above the full transcripts in ./.tailings/. Use a cheap model (e.g. claude-haiku-4-5-20251001); it's volume work. Skip it for v1.
@opencode-ai/sdk — cleaner read path (optional). When an OpenCode server is running, its client (session.list(), message APIs) is a more stable way to read sessions than parsing the raw DB. Keep the raw-SQLite reader as the offline fallback (no server required), the way tvagent does.
@openai/codex-sdk — rehydration (later). Useful for programmatically resuming/seeding Codex threads from an archive, not for reading old rollouts.

Effect everywhere

Keep the whole pipeline in Effect: typed errors instead of throws, Layers for dependency injection (so adapters are swappable and testable with in-memory fixtures like tvagent's *.test.ts files), and Schema as the single source of truth for every on-disk record shape. Mirror tvagent's test style — feed sample JSONL lines / rows through the normalizers and assert on the decoded output.

How the next agent consumes the archive (the real unlock)

An npx/CLI process cannot inject into a running agent's context window. It doesn't need to — all three tools auto-read a shared instructions file on startup:

Claude Code reads CLAUDE.md (and AGENTS.md)
Codex reads AGENTS.md
OpenCode reads AGENTS.md
Gemini reads GEMINI.md (and AGENTS.md)

So tailings pull should write a cross-tool digest into the target directory's AGENTS.md (or a ./.tailings/CONTEXT.md referenced from it), and keep the fuller per-session text alongside in ./.tailings/. Then every next agent, in any tool, picks it up natively — no piping, no MCP required. A weaker model with this folder beats a stronger model starting cold. That's the whole payoff.

Keep AGENTS.md itself tight (an index + the high-signal bits: decisions, touched files, gotchas, and the agent-written memories) and let the raw per-session transcripts live in ./.tailings/ for an agent to open on demand — they're just text; size isn't the constraint.

Secondary delivery modes (nice-to-have, build later): stdout for piping, an MCP server (tailings mcp) for on-demand querying, a slash-command wrapper.

Proposed CLI surface

# THE command. Gather ONLY this directory's sessions + memories, from every
# agent tool that touched it, into the directory for the next agent to read.
tailings pull --dir .                                 # → ./AGENTS.md (+ ./.tailings/), this dir ONLY
tailings pull --dir . --since 30d                     # bound how far back
tailings pull --dir . --out -                         # …to stdout instead of files
tailings pull --tools claude,codex,opencode,gemini    # restrict sources

# Secondary: preservation. Copy raw sessions to our own store BEFORE the tools
# prune them. This store can be global, but `pull` ALWAYS re-filters by directory —
# nothing from another dir ever lands in your folder.
tailings sync                                         # beat-the-prune-clock + refresh the cwd index
tailings mcp                                          # expose the archive over MCP (later)

pull is the product. sync exists only so the data survives long enough to be pulled — run it on a schedule (launchd / cron) so the store stays ahead of each tool's pruning clock. (tvagent has scripts/print-launchd-plist.mjs worth copying for the macOS scheduling story.)

Suggested build order

Port LocalPaths → TailingsPaths (add the archive root).
Claude adapter first (easiest: native dir keying, plain JSONL). Get pull --dir . writing this dir's raw transcripts to ./.tailings/ end-to-end.
OpenCode adapter (SQLite, WHERE directory = ?).
Codex adapter + the cwd → sessions index it requires.
Gemini adapter — verify the on-disk layout first (not present here yet).
Memories: co-locate Claude memory/, Codex ~/.codex/memories/, OpenCode knowledge/, and Gemini GEMINI.md for the matching dir.
Emit AGENTS.md as the default pull output (index + high-signal bits).
sync/archive writing copies into ~/.local/share/tailings/ so data survives the prune clock; schedule it.
(optional) Distillation pass via the Claude Agent SDK (raw → brief) for tighter AGENTS.md — polish, not a requirement. MCP server last.

Open questions to resolve while building

Archive format: keep raw JSONL/SQLite copies verbatim and derived digests? (Recommend yes — never lose fidelity; derive on top.)
Dedup across re-runs (sessions grow; don't re-archive unchanged files — use mtime/size like tvagent's incremental OpenCode reader).
Redaction: not needed for the local-only flow, but since pull writes into the working directory, make sure ./.tailings/ is .gitignored by default so pasted secrets in old transcripts don't get committed.

Built so far — `pull` (v1)

The pull command is implemented and verified end-to-end on this machine against real Claude Code, Codex, and OpenCode stores. Deliberately not built yet (the "sink"/preservation side and polish the brief marks optional): tailings sync, the MCP server, and the distillation pass.

The CLI is built on Effect's CLI framework (effect/unstable/cli) — real --help/--version/shell-completions, with a styled terminal summary.

pnpm install
pnpm build                       # bundles the CLI to dist/cli.js (shebang'd)
node dist/cli.js pull --dir .    # → ./AGENTS.md (+ ./.tailings/), this dir ONLY
node dist/cli.js doctor          # which agent stores are present on this machine
# or, after `pnpm link --global` (bin: tailings):
tailings pull --dir . --since 30d
tailings pull --tools claude,codex --out -   # digest to stdout, no files
tailings                          # interactive: confirm, then pull the current dir

What a run does:

Resolves each tool's store via TailingsPaths (ported from tvagent's LocalPaths) and gathers only this directory's sessions:
- Claude — selects the encoded project folder(s) by dir prefix, parses the JSONL, and re-confirms each session's recorded cwd is in-tree.
- OpenCode — node:sqlite read-only; session rows filtered by directory, joined to message/part and rendered to plain text.
- Codex — scans rollout-*.jsonl, recovering each file's cwd from session_meta/turn_context, cached in our own incremental index at ~/.local/share/tailings/index.db (keyed by mtime+size). Filters to in-tree.
- Gemini — best-effort skeleton; skips cleanly when ~/.gemini is absent and never mis-attributes history when the (unverified) layout isn't matched.
Co-locates agent memories (Claude per-project memory/, Codex/OpenCode global knowledge).
Writes full transcripts to ./.tailings/sessions/<tool>/, drops a self-ignoring ./.tailings/.gitignore, and splices a tight digest into ./AGENTS.md between  markers (never clobbering hand-written content).

Layout

packages/
  core/          Domain, TailingsPaths, Period, Render (shared transcript→markdown)
  adapter-claude/    ClaudeArchive   (+ normalizer tests)
  adapter-codex/     CodexArchive + CodexNormalize   (+ index, tests)
  adapter-opencode/  OpenCodeArchive + OpenCodeNormalize   (+ tests)
  adapter-gemini/    GeminiArchive   (best-effort)
  cli/           App (layers + run*), Commands (Effect CLI), Pull (orchestrator),
                 AgentsMd (digest+merge), Main (#! entry → Command.run)
  core/Terminal.ts   ANSI styling + table renderer (house style, ported from ai-hr)

pnpm test runs tsc --noEmit + Vitest (21 tests over the normalizers and the AGENTS.md merge). Everything stays in Effect: tagged errors, Layer DI per source, Schema for on-disk record shapes — mirroring the tvagent idioms.