@jhizzard/rumen
v0.5.3
Published
Async learning layer that runs on top of any pgvector memory store. The LLM is stateless. Rumen isn't.
Maintainers
Readme
Rumen
The LLM is stateless. Rumen isn't.
Rumen is an async learning layer that runs on top of any pgvector memory store (such as Mnestra). It wakes up on a schedule, looks at what you did recently, cross-references it with everything you've ever done, and writes the connections back into your memory store as insights.
A rumen is the first chamber of a ruminant's stomach where food is continuously broken down and re-processed long after the animal stops eating. The word ruminate literally comes from it. The metaphor IS the product: your thoughts keep getting processed after you stop working.
Safety Warning
WARNING: Rumen v0.1 writes to a
rumen_insightstable. It does NOT modify or delete any existing memory rows. Run against a TEST instance for the first two weeks of use. Do NOT point at production memory stores until validated.
Rumen is non-destructive by design. It only ever INSERTs new rows into its own tables (rumen_jobs, rumen_insights, rumen_questions). Nothing in your existing memory store is modified. Validate on a non-critical database first.
What Rumen does (v0.4)
The full Extract → Relate → Synthesize → Surface pipeline is live as of v0.4.0.
The loop:
- Extract — pull recent session memories (last 24–72 hours) from Mnestra. Filter out trivial sessions (<3 events).
- Relate — for each signal, run a hybrid keyword + semantic search (via OpenAI
text-embedding-3-largeembeddings) across all historical memories. Falls back to keyword-only gracefully whenOPENAI_API_KEYis unset. Keep top-5 candidates with similarity > 0.7. - Synthesize — pass related memories through Claude Haiku to produce real insight text with confidence scoring.
- Surface — write a new row into
rumen_insightsfor each signal, withsource_memory_ids[]populated so the connection is traceable.
Pairs with Mnestra
Rumen is a reasoning layer, not a memory store. It assumes the schema exposed by Mnestra:
memory_items(id, content, source_type, project, created_at, embedding vector(1536))memory_sessions(id, project, summary, created_at)memory_hybrid_search(query_text, query_embedding, limit_count, project_filter)SQL function
See docs/MNESTRA-COMPATIBILITY.md for the full compatibility contract. Rumen currently only works with Mnestra-compatible schemas.
Mnestra stores your developer memory. Rumen learns from it while you're not looking, and writes new memories back into the store with source_type='insight' so every existing Mnestra consumer automatically benefits.
Install
npm install @jhizzard/rumenPeer requirement: a Postgres database with the vector extension and the Mnestra schema (migrations in the Mnestra repo).
Deploy as a Supabase Edge Function
Rumen is designed to run as a scheduled Supabase Edge Function, triggered by pg_cron every 15 minutes.
Apply the Rumen tables:
psql "$DATABASE_URL" -f migrations/001_rumen_tables.sqlDeploy the Edge Function:
supabase functions deploy rumen-tick supabase secrets set DATABASE_URL="$DATABASE_URL"Schedule it via
pg_cron:psql "$DATABASE_URL" -f migrations/002_pg_cron_schedule.sql(Edit the function URL in the SQL file first.)
Verify:
SELECT * FROM rumen_jobs ORDER BY started_at DESC LIMIT 5;
Connection URL convention
Per docs/MNESTRA-COMPATIBILITY.md and the operational lessons inherited from Podium, Rumen always uses Supabase Shared Pooler IPv4 URLs, never Dedicated Pooler. The URL format:
postgresql://postgres.<project-ref>:<encoded-pw>@aws-0-<region>.pooler.supabase.com:6543/postgres?connection_limit=1Do not append ?pgbouncer=true — that parameter is Prisma-specific and rejected by node-postgres/libpq.
Set it as DATABASE_URL in your Supabase function secrets.
Run locally (development)
cp .env.example .env # then fill in DATABASE_URL
npm install
npm run test:localscripts/test-locally.ts runs a single Rumen job against a local or test Postgres, printing all [rumen-*] log output to stdout. Use this to validate extract/relate behavior without deploying.
Logging convention
Every log line in Rumen uses one of these tags:
| Tag | Phase |
|---|---|
| [rumen] | General job lifecycle |
| [rumen-extract] | Pulling structured events from session memories |
| [rumen-relate] | Semantic search for prior art |
| [rumen-synthesize] | LLM synthesis via Claude Haiku |
| [rumen-question] | Follow-up question generation |
| [rumen-surface] | Writing insights back to DB |
This makes Supabase Edge Function logs trivially greppable.
Cost controls
Guardrails in place:
- Max 10 sessions per run (override with
MAX_SESSIONS_PER_RUN) - Skip sessions with fewer than 3 events
- Skip sessions that already have a
rumen_jobsrow referencing them
Roadmap
| Version | Adds | Status |
|---|---|---|
| v0.1 | Extract + Relate + Surface. Read-only cross-reference. | Shipped |
| v0.2 | Synthesize step via Claude Haiku. Real insight text, confidence scoring, batching. | Shipped |
| v0.3 | Questions. Rumen starts asking the developer things. Morning briefing surface. | Shipped |
| v0.4 | Vector embeddings in Relate (hybrid keyword+semantic search via OpenAI text-embedding-3-large), per-signal error tolerance, graceful fallback when OPENAI_API_KEY is unset. | This release |
Why
Nothing else does this:
- Obsidian plugins index notes — they don't run when you stop editing.
- Mem0 stores memories — it doesn't cross-reference or synthesize.
- LangGraph orchestrates agents — it doesn't have persistent cross-project memory.
- Cursor / Copilot are in-editor assistants — they forget when you close the editor.
Rumen keeps working when you stop. It cross-references across all your projects automatically, and (in future versions) asks you follow-up questions about work you thought was done. The moat is the loop: your memory store captures → Rumen learns → insights flow back into the store. Each pass makes the store smarter about you specifically.
License
MIT © 2026 Joshua Izzard
