memto-cli
v0.5.0
Published
Wake up any past AI coding-agent session and ask it a question. Works with Claude Code, Codex, Hermes, OpenClaw.
Maintainers
Readme
🎬 Who this is for
If you run one AI coding tab at a time, you don't need this.
If you run five — résumé in one, startup in another, debugging a customer issue in a third, deep research in a fourth, taxes in a fifth — then today the answer to "where is the LaTeX file for my résumé?" lives in exactly one agent's head.
The other four have no idea. You, the human, are the only thing connecting them, and your short-term memory is the bottleneck.
memto is built for that scenario: multiple AI coding agents running in parallel across unrelated projects. Not for enterprise teams locked to a single tool. Not for deep single-codebase work. For the super-individual with five tabs open.
🧭 Three axioms
Product decisions come from these three. None of them are negotiable.
- The memory IS the session. No extraction, no embeddings, no "facts in a vector DB". The raw transcript file your agent CLI already wrote — that's the memory. We just make it queryable.
- Never mutate the past. Every
askforks a non-destructive copy. Your original session files are never touched. Rolling back is always a no-op because nothing changed. - Agent-native, zero ops. One bundled CLI,
--jsonon every command, a bundled skill that teaches your agents when to call it. No daemon. No database. No cloud.npx memto-cliand go.
✨ What memto gives you
| | |
|---|---|
| 🔗 Cross-runtime, no extraction | One unified interface for Claude Code, Codex, Hermes, and OpenClaw. Every adapter reads native files directly; no conversion step, no ingestion pipeline. This is the only thing in the market that does this. |
| 🪞 Fork-safe by design | Every ask copies the session, asks on the copy, deletes the copy. Original files untouched. You can safely query a 3-month-old session without fear of polluting it. |
| ⚡ Two-tier access | memto messages reads the transcript directly. memto ask forks and revives the original agent. Pick the one that fits the question — agents learn to read first, synthesize second. |
| 🤖 Agent-native output | --json everywhere. Ships a markdown skill so any modern agent CLI picks up the usage pattern automatically. No MCP server needed. |
| 🧪 No DB, no daemon, no cloud | Contrast with Mem0 / Letta / Zep / chum-mem — all require ingestion pipelines and external stores. memto ships a single 60 KB JS file. |
| ⏱ Auto-scaled timeouts | 120s floor + 1s per MB of transcript. Large sessions (60 MB+) don't silently die from premature kills. |
| 🕵️ Prompt wrapper filtering | Runtime-specific noise (<environment_context>, Sender (untrusted metadata):, slash-command blobs, skill-injection headers) gets stripped so first_user_prompt is what the human actually typed. |
| 🧪 61 tests, 4 runtimes verified | Every adapter has synthetic-fixture tests. All four runtimes end-to-end verified against real local stores. |
🏃 Install
# one-shot, no install
npx memto-cli list
# global install
npm i -g memto-cli && memto --helpTeach your agents to call memto automatically — drop the bundled skill into your agent's skills directory:
curl -fsSL https://raw.githubusercontent.com/shizhigu/memto/main/skills/memto.md \
> ~/.claude/skills/memto.md # adjust path for your agentOnce dropped in, your agent automatically learns when to use memto messages vs memto ask.
🔍 Five commands
memto list — see every past session, merged
memto list --limit 10[claude-code] 2026-04-10 refactor-billing-service
cwd: ~/Projects/billing
first: migrate Stripe webhooks to async handlers, preserve idempotency…
model: claude-opus-4-6
[codex ] 2026-04-09 fix-memory-leak-in-parser
cwd: ~/Projects/lsp-server
first: investigate heap growth during long document parses
[hermes ] 2026-04-08 onboarding-email-sequence
first: draft a 5-email welcome series for new B2B signups
[openclaw ] 2026-04-05 deploy-staging
first: verify the CD pipeline is green before Tuesday's release cutEvery runtime, one merged view. Pipe to jq for filtering:
memto list --json --limit 30 | jq '.[] | select(.cwd | test("billing"))'memto grep — find the session that holds the answer
memto grep "retry.*policy" -i --role user --json
memto grep "stripe.*webhook" --runtime claude-code --since 2026-03-01 --jsonScans every session's transcript in parallel (default: all four runtimes, most-recent-first, up to 200 per runtime). Returns hits grouped by session, each with role + timestamp + snippet. 2–20 seconds for 170+ sessions.
This is the right first command for any "find the thing" question — usually you don't know up front which session holds the answer.
memto messages — read the transcript directly
memto messages --id <session_id> --last 10 --json
memto messages --id <session_id> --grep "retry" --role user --jsonSub-second, zero tokens. Use this for content lookup — file paths, error messages, decisions stated verbatim. 80% of memory queries can be answered here without ever forking.
memto ask — fork and revive the original agent
memto ask --id <session_id> --question "what did we decide about retry logic?"━━━ [claude-code] refactor-billing-service ━━━
We settled on exponential backoff keyed by (customer_id, event_type),
capped at 24h, with idempotency keys persisted to Redis for 7 days.Use when raw content isn't enough — when you need the original agent's synthesis, not just its transcript. Fork is non-destructive; originals are never touched.
memto reconstruct — ask a window, not the whole session
# what did past-me think during messages 20..40?
memto reconstruct --id <session_id> --from-msg 20 --upto-msg 40 \
--question "what was my position on the retry debate?"
# what did I believe before I learned X?
memto reconstruct --id <session_id> --upto 2026-03-15T10:30:00Z \
--question "what's the leading approach?"Forks the session, truncates to the window [from, upto], then asks. The
agent answers from that slice only — no hindsight from later messages, no
noise from unrelated earlier episodes. This is the closest thing memto has
to cognitive science's "reconstructive episodic memory": you're not
replaying the whole session, you're reconstructing what the agent could
have known at that moment.
🧩 Architecture
you / your agent
│
│ memto list · messages · ask
▼
┌────────────────────────────────────┐
│ memto — one CLI, npx-able │
└──────────────┬─────────────────────┘
│ NormalizedSession / NormalizedMessage
▼
┌────────────────────────────────────┐
│ @memto/session-core │
│ claude-code · codex · hermes · openclaw
└──┬──────────┬──────────┬───────┬───┘
▼ ▼ ▼ ▼
~/.claude ~/.codex ~/.hermes ~/.openclawFour native stores, one normalized shape. Each adapter reads its runtime's files directly — no ingestion, no duplicate store. SQLite for hermes uses bun:sqlite under bun and better-sqlite3 under node (picked at runtime).
📚 Use it as a library
import { listAllSessions, getMessages, ask } from '@memto/session-core';
// 1. enumerate
const sessions = await listAllSessions({
limitPerRuntime: 20,
sampling: { strategy: 'head-and-tail', head: 2, tail: 2 },
});
// 2. read transcript directly
const resumeSession = sessions.find(s => /résumé/i.test(s.title ?? ''));
if (resumeSession) {
const msgs = await getMessages(resumeSession.runtime, resumeSession.id);
const hit = msgs.find(m => /\.tex/.test(m.text));
if (hit) console.log(hit.text);
}
// 3. synthesize — wake up the original agent
if (resumeSession) {
const { answer, timed_out } = await ask(resumeSession, 'where is the LaTeX file?');
if (!timed_out) console.log(answer);
}🧠 The mental model
Think of memory not as a database but as a fleet of dormant coworkers.
Each past session is one coworker. They kept detailed notes while they were working — the full transcript, every file they touched, every decision they made. They went home at the end of the day.
When you want to know something, you don't try to rebuild their knowledge from scratch by reading their notes. Either:
- You read their notes directly — that's
memto messages. Fast, free, but you have to scan. - You tap one on the shoulder — that's
memto ask. "Hey, quick question." They wake up, answer from the full context already in their head, then go back to sleep.
The "tap on the shoulder" is called fork-resume — we clone their session state just enough to run the question, get the answer, and discard the clone. The original session file is never modified.
🎯 Why not Mem0 / Letta / Zep?
| | memto | Mem0 / Zep | Letta | |---|---|---|---| | Unit of memory | whole past session, queryable live | extracted facts in a vector DB | hierarchical summary tiers in one agent | | Cross-runtime | ✅ 4 runtimes, 1 interface | ❌ app-specific | ❌ per-agent | | Non-destructive read | ✅ fork-safe | n/a | ✅ internal only | | External dependencies | 0 — just node | ChromaDB etc. | Postgres / SQLite | | First-time cost | none — indexes what your CLIs already wrote | re-ETL every conversation | re-architect your agent | | Best for | the super-individual running 5+ AI tabs | single-app long-term memory | single-agent role-played memory |
The fundamental divide: everything on the right takes your agent conversations, extracts structured claims from them, and stores those claims elsewhere. memto doesn't extract. The raw session IS the memory — you just wake it up and ask.
📦 What's in the box
memto/
├── packages/
│ ├── cli/ ← the `memto` binary
│ └── session-core/ ← universal adapter + fork/ask orchestration
│ └── src/
│ ├── types.ts
│ ├── jsonl.ts ← streaming JSONL reader
│ ├── sqlite.ts ← bun:sqlite / better-sqlite3 shim
│ ├── derive.ts ← title / prompt / sampling helpers
│ ├── resume.ts ← ask() orchestrator per runtime
│ └── adapters/
│ ├── claude-code.ts
│ ├── codex.ts
│ ├── hermes.ts
│ └── openclaw.ts
├── skills/
│ └── memto.md ← standard-format skill; drop into your agent's skills/
├── examples/
└── assets/🛣 Roadmap
- v0.4 — Cursor / Windsurf / Zed adapters · live file-watch indexing · richer summary hooks
- v0.5 — cross-device encrypted sync · per-session privacy tags
- v0.6 — team-shared memory (opt-in sharing of specific sessions between people) · simple web dashboard
File an issue if one of these matters to you, or open a PR.
🤝 Contributing
See CONTRIBUTING.md. TL;DR: each adapter is ~200 lines, tests use synthetic fixtures, PRs welcome.
