semantic-memory
v0.3.0
Published
Local semantic memory with PGlite + pgvector - budget Qdrant for AI agents
Maintainers
Readme
semantic-memory
Local semantic memory with PGlite + pgvector. Budget Qdrant that runs anywhere Bun runs.
Why
You want semantic search for your AI agents but don't want to run a vector database server. This gives you:
- Zero infrastructure - PGlite is Postgres compiled to WASM, runs in-process
- Real vector search - pgvector with HNSW indexes, not some janky cosine similarity loop
- Collection-based organization - Different collections for different contexts (codebase, research, notes)
- Configurable tool descriptions - The Qdrant MCP pattern: same tool, different behaviors via env vars
- Effect-TS - Proper error handling, resource management, composable services
Install
# npm/bun/pnpm
npm install semantic-memory
# Need Ollama for embeddings
brew install ollama
ollama pull mxbai-embed-largeCLI
# Via npx
npx semantic-memory store "The auth flow uses JWT tokens stored in httpOnly cookies"
npx semantic-memory find "how does authentication work"
# Or install globally
npm install -g semantic-memory
semantic-memory store "React component patterns" --collection code
semantic-memory find "components" --collection code
# Full-text search (no embeddings)
semantic-memory find "JWT" --fts
# Add metadata
semantic-memory store "API rate limits are 100 req/min" --metadata '{"source":"docs","priority":"high"}'
# List, get, delete
semantic-memory list
semantic-memory get <id>
semantic-memory delete <id>
# Validate a memory (refresh its relevance timestamp)
semantic-memory validate <id>
# Stats
semantic-memory statsCollections for Context
Collections let you organize memories by purpose. The collection name carries semantic meaning:
# Codebase analysis - store patterns, architecture notes, API quirks
semantic-memory store "Auth uses httpOnly JWT cookies with 7-day refresh" --collection codebase
semantic-memory store "The useOptimistic hook requires a reducer pattern" --collection codebase
semantic-memory find "authentication" --collection codebase
# Research/learning - concepts, connections, questions
semantic-memory store "Effect-TS uses generators for async, not Promises" --collection research
semantic-memory find "effect async patterns" --collection research
# Project onboarding - gotchas, tribal knowledge, "why is it like this"
semantic-memory store "Don't use React.memo on components with children - causes stale closures" --collection gotchas
semantic-memory find "performance issues" --collection gotchas
# Personal knowledge - decisions, preferences, breakthroughs
semantic-memory store "Prefer composition over inheritance for React components" --collection decisions
semantic-memory find "react patterns" --collection decisionsSearch across all collections or within one:
# Search everything
semantic-memory find "authentication"
# Search specific collection
semantic-memory find "authentication" --collection codebaseThe Qdrant Pattern
The killer feature: tool descriptions are configurable.
Same semantic memory, different agent behaviors:
# Codebase assistant - searches before generating, stores patterns found
TOOL_STORE_DESCRIPTION="Store code patterns, architecture decisions, and API quirks discovered while analyzing the codebase. Include file paths and context." \
TOOL_FIND_DESCRIPTION="Search codebase knowledge. Query BEFORE making changes to understand existing patterns." \
semantic-memory find "auth patterns"
# Research assistant - accumulates and connects ideas
TOOL_STORE_DESCRIPTION="Store concepts, insights, and connections between ideas. Include source references." \
TOOL_FIND_DESCRIPTION="Search research notes. Use to find related concepts and prior findings." \
semantic-memory find "async patterns"
# Onboarding assistant - captures tribal knowledge
TOOL_STORE_DESCRIPTION="Store gotchas, workarounds, and 'why is it like this' explanations. Future devs will thank you." \
TOOL_FIND_DESCRIPTION="Search for known issues and gotchas. Check BEFORE debugging to avoid known pitfalls." \
semantic-memory find "common mistakes"The description tells the LLM when and how to use the tool. Change the description, change the behavior. No code changes.
OpenCode Integration
Drop this in ~/.config/opencode/tool/semantic-memory.ts:
import { tool } from "@opencode-ai/plugin";
import { $ } from "bun";
// Rich descriptions that shape agent behavior
// Override via env vars for different contexts
const STORE_DESC =
process.env.TOOL_STORE_DESCRIPTION ||
"Persist important discoveries, decisions, and learnings for future sessions. Use for: architectural decisions, debugging breakthroughs, user preferences, project-specific patterns. Include context about WHY something matters.";
const FIND_DESC =
process.env.TOOL_FIND_DESCRIPTION ||
"Search your persistent memory for relevant context. Query BEFORE making architectural decisions, when hitting familiar-feeling bugs, or when you need project history. Returns semantically similar memories ranked by relevance.";
async function run(args: string[]): Promise<string> {
const result = await $`npx semantic-memory ${args}`.text();
return result.trim();
}
export const store = tool({
description: STORE_DESC,
args: {
information: tool.schema.string().describe("The information to store"),
collection: tool.schema
.string()
.optional()
.describe("Collection name (e.g., 'codebase', 'research', 'gotchas')"),
metadata: tool.schema
.string()
.optional()
.describe("Optional JSON metadata"),
},
async execute({ information, collection, metadata }) {
const args = ["store", information];
if (collection) args.push("--collection", collection);
if (metadata) args.push("--metadata", metadata);
return run(args);
},
});
export const find = tool({
description: FIND_DESC,
args: {
query: tool.schema.string().describe("Natural language search query"),
collection: tool.schema
.string()
.optional()
.describe("Collection to search (omit for all)"),
limit: tool.schema
.number()
.optional()
.describe("Max results (default: 10)"),
},
async execute({ query, collection, limit }) {
const args = ["find", query];
if (collection) args.push("--collection", collection);
if (limit) args.push("--limit", String(limit));
return run(args);
},
});Per-Project Configuration
For project-specific behavior, create a wrapper script or use direnv:
# .envrc (with direnv)
export TOOL_STORE_DESCRIPTION="Store patterns found in this Next.js codebase. Include file paths."
export TOOL_FIND_DESCRIPTION="Search codebase patterns. Check before implementing new features."Or create project-specific OpenCode tools that hardcode the collection:
// .opencode/tool/codebase-memory.ts
export const remember = tool({
description: "Store a pattern or insight about this codebase",
args: { info: tool.schema.string() },
async execute({ info }) {
return $`npx semantic-memory store ${info} --collection ${process.cwd()}`.text();
},
});Configuration
All via environment variables:
| Variable | Default | Description |
| ----------------------------- | ------------------------ | ----------------------------------- |
| SEMANTIC_MEMORY_PATH | ~/.semantic-memory | Where to store the database |
| OLLAMA_HOST | http://localhost:11434 | Ollama API endpoint |
| OLLAMA_MODEL | mxbai-embed-large | Embedding model (1024 dims) |
| COLLECTION_NAME | default | Default collection |
| MEMORY_DECAY_HALF_LIFE_DAYS | 90 | Days for confidence decay half-life |
| TOOL_STORE_DESCRIPTION | (see code) | MCP tool description for store |
| TOOL_FIND_DESCRIPTION | (see code) | MCP tool description for find |
Confidence Decay
Memories decay in relevance over time unless validated. This helps surface fresh, actively-used knowledge over stale information.
How it works:
- Uses a half-life algorithm:
decay = 0.5 ^ (age_in_days / half_life) - Default half-life is 90 days (configurable via
MEMORY_DECAY_HALF_LIFE_DAYS) - Search scores are multiplied by the decay factor
- Stale memories (>90 days) show a warning
Decay examples:
| Age | Decay Factor | Effect | | -------- | ------------ | -------------- | | Today | 1.0 | Full weight | | 90 days | 0.5 | Half weight | | 180 days | 0.25 | Quarter weight |
Example search output:
Results (decay half-life: 90 days):
1. [score: 0.82, age: 3d, decay: 0.98] JWT tokens should use httpOnly cookies
Collection: codebase | ID: mem_abc123
2. [score: 0.45, age: 120d, decay: 0.40] Use localStorage for auth tokens
Collection: codebase | ID: mem_ghi789
⚠️ Stale (120 days) - consider validating or removingRefreshing memories:
# Validate a memory to reset its decay (marks it as still relevant)
semantic-memory validate <id>Use validate when you confirm a memory is still accurate and useful. This resets the decay clock.
How It Works
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Ollama │────▶│ PGlite │────▶│ pgvector │
│ (embeddings)│ │ (WASM PG) │ │ (HNSW idx) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
│ memories table memory_embeddings
│ - id - memory_id (FK)
│ - content - embedding vector(1024)
│ - metadata (JSONB)
│ - collection
└──────────────────────────────────────────┘
cosine similarity search- Ollama generates embeddings locally with
mxbai-embed-large(1024 dimensions) - PGlite is Postgres compiled to WASM - no server, runs in your process
- pgvector provides real vector operations with HNSW indexes for fast approximate nearest neighbor search
- Effect-TS handles errors, retries, and resource cleanup properly
Use Cases
Codebase Analysis
Store patterns, architecture decisions, and API quirks as you explore a new codebase. Query before making changes.
Session Memory
Remember facts across AI sessions. No more re-explaining context every conversation.
Documentation Cache
Pre-load docs into a collection, search before hallucinating answers.
Research Assistant
Accumulate findings, connect ideas across sources, build up domain knowledge.
Onboarding Knowledge Base
Capture the "why" behind decisions, known gotchas, and tribal knowledge for future team members.
License
MIT
