@kky42/mem-cli
v0.2.3
Published
Agent memory CLI using markdown + local embeddings + SQLite
Downloads
681
Maintainers
Readme
mem (mem-cli)
A tiny, local “memory” tool for agents:
- Store memories as plain Markdown files
- Search them fast (semantic embeddings)
- Keep everything on disk (no server)
Install
npm i -g @kky42/mem-cliIf npm fails with EEXIST .../bin/mem:
which mem
rm "$(which mem)" # or: npm i -g --force @kky42/mem-cliTry it in 60 seconds (public workspace)
mem init --public
mem add short "I am Kevin." --public
echo "Prefer low-cost index funds for stock exposure." | mem add long --public --stdin
mem search "equity allocation" --public
mem state --publicWhat gets created:
~/.mem-cli/public/MEMORY.md(long-term memory)~/.mem-cli/public/memory/YYYY-MM-DD.md(daily notes)~/.mem-cli/public/index.db(local search index)
You can also edit the Markdown files directly; run mem reindex --public afterwards.
Private workspace (token-protected)
mem init --token "my-token-123"
mem add short "User prefers concise answers." --token "my-token-123"
mem search "preferences" --token "my-token-123"To avoid repeating --token, you can set MEM_CLI_TOKEN:
export MEM_CLI_TOKEN="my-token-123"
mem init
mem add short "User prefers concise answers."
mem search "preferences"Precedence:
--publicalways uses the public workspace (ignoresMEM_CLI_TOKEN).--token <token>overridesMEM_CLI_TOKEN.- Otherwise
MEM_CLI_TOKEN(trimmed, non-empty) is used.
Keep your token somewhere safe (password manager / env var). mem-cli only stores a hash and cannot recover a lost token.
Semantic search (local embeddings)
mem search is semantic-only (embeddings).
- Default embedding model: Qwen3-Embedding-0.6B (GGUF) via
hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf - Model cache dir:
~/.mem-cli/model-cache - If
settings.embeddings.modelPathstarts withhf:, the model is downloaded lazily on the first embeddings-backed command and stored in the cache dir. - If embeddings can’t load (e.g.
node-llama-cppmissing),mem searchwill error.
macOS note:
node-llama-cppuses Metal by default on macOS (including integrated GPUs). If Metal causes issues, run withexport NODE_LLAMA_CPP_GPU=off.
Daemon (fast repeated queries)
By default, mem add|search|reindex runs via a background daemon so the embeddings model stays loaded (no model load per CLI call).
- Disable:
MEM_CLI_DAEMON=0 - Idle shutdown:
MEM_CLI_DAEMON_IDLE_MS=600000(ms; default 10 min) - Stop now (advanced):
mem __daemon --shutdown
E2E performance (agent scenarios)
Run:
bash scripts/e2e-performance.shTo measure end-to-end mem search latency (CLI + daemon overhead), run:
bash scripts/e2e-performance-v2.shTo benchmark mem reindex time on large synthetic workspaces, run:
bash scripts/e2e-reindex-performance.shLatest recorded scores (v0.1.4, 2026-01-28, Qwen3-Embedding-0.6B-Q8_0.gguf):
Test device:
- MacBook Pro (Apple M1 Max, 32GB RAM)
| Metric | Value | | --- | --- | | Overall score | 0.917 | | Avg query latency | 20ms | | P95 query latency | 22ms |
| Dataset | Scenario | Docs | Queries | R@1 | R@5 | R@10 | MRR@10 | Score | | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | | stackoverflow | coding | 25 | 25 | 80.0% | 100.0% | 100.0% | 0.880 | 0.940 | | askubuntu | automation_tasks | 25 | 25 | 96.0% | 100.0% | 100.0% | 0.973 | 0.987 | | ux | design_tasks | 25 | 25 | 84.0% | 92.0% | 100.0% | 0.885 | 0.942 | | money | finance_investment | 25 | 25 | 80.0% | 96.0% | 100.0% | 0.869 | 0.935 | | pm | personal_work_management | 25 | 25 | 76.0% | 100.0% | 100.0% | 0.863 | 0.932 | | meta.stackoverflow | community_management | 25 | 25 | 80.0% | 96.0% | 100.0% | 0.875 | 0.938 | | movielens | user_preference | 200 | 30 | 33.3% | 83.3% | 100.0% | 0.554 | 0.777 |
Reindex benchmark (v0.1.4, 2026-01-29, synthetic docs; daemon off; mock embeddings):
| Docs | Approx bytes | Indexed chunks | mem reindex wall time |
| ---: | ---: | ---: | ---: |
| 1000 | 766112 | 1000 | 1.18s |
| 10000 | 7669077 | 10000 | 43.74s |
Notes:
- The benchmark is cached + size-limited to run locally; timings depend on hardware.
e2e-performance.shcallsdist/core/*directly (no CLI spawn / daemon overhead). For end-to-end latency, usee2e-performance-v2.sh.- See
docs/performance-datasets.mdanddocs/performance_records.mdfor dataset definitions + history.
Configuration
All configuration lives in one place:
~/.mem-cli/settings.json(shared by all workspaces)
Settings are read on each mem command (daemon included), so runtime settings take effect immediately.
Some settings affect how the index is built (e.g. chunking.*, embeddings.modelPath) and require rebuilding the index per workspace:
mem reindex --publicmem reindex --token ...(repeat for each token workspace)mem reindex --all(rebuilds all workspaces on disk)
mem reindex is safe to run any time; it will no-op when the workspace index is already up to date.
If you don’t run mem reindex, the next mem search / mem add in that workspace will auto-detect the mismatch and rebuild (the first run may be slower).
mem reindex --public only rebuilds the public workspace; private token workspaces keep their existing index until you reindex (or use them and let auto-rebuild happen).
If you don’t have a private workspace token, you can’t run mem ... --token for that workspace (tokens can’t be recovered; create a new token workspace and move the Markdown files if needed).
Note: mem-cli records the embedding model in the index and won’t run “new model” queries against “old model” vectors — it will rebuild the workspace first.
Use with an agent (Codex skill)
This repo includes a Codex skill at skills/mem-cli/SKILL.md. To install it:
mkdir -p ~/.codex/skills/mem-cli
cp skills/mem-cli/SKILL.md ~/.codex/skills/mem-cli/SKILL.mdThen the agent can use mem for:
- Writing memories:
mem add short|long - Retrieval before answering:
mem search
Tip: for private workspaces, set MEM_CLI_TOKEN so the agent can run mem init|add|search|summary|state|reindex without repeating --token.
