semantic-memory

v0.3.1

Published

2 months ago

Local semantic memory with PGlite + pgvector - budget Qdrant for AI agents

0High
0Medium
0Low

joelhooks

semantic-memory vector-search embeddings pglite pgvector mcp ai-agents effect-ts

semantic-memory

Local semantic memory with PGlite + pgvector. Budget Qdrant that runs anywhere Bun runs.

Why

You want semantic search for your AI agents but don't want to run a vector database server. This gives you:

Zero infrastructure - PGlite is Postgres compiled to WASM, runs in-process
Real vector search - pgvector with HNSW indexes, not some janky cosine similarity loop
Collection-based organization - Different collections for different contexts (codebase, research, notes)
Configurable tool descriptions - The Qdrant MCP pattern: same tool, different behaviors via env vars
Effect-TS - Proper error handling, resource management, composable services

Install

# npm/bun/pnpm
npm install semantic-memory

# Need Ollama for embeddings
brew install ollama
ollama pull mxbai-embed-large

CLI

# Via npx
npx semantic-memory store "The auth flow uses JWT tokens stored in httpOnly cookies"
npx semantic-memory find "how does authentication work"

# Or install globally
npm install -g semantic-memory
semantic-memory store "React component patterns" --collection code
semantic-memory find "components" --collection code

# Full-text search (no embeddings)
semantic-memory find "JWT" --fts

# Add metadata
semantic-memory store "API rate limits are 100 req/min" --metadata '{"source":"docs","priority":"high"}'

# List, get, delete
semantic-memory list
semantic-memory get <id>
semantic-memory delete <id>

# Validate a memory (refresh its relevance timestamp)
semantic-memory validate <id>

# Stats
semantic-memory stats

Collections for Context

Collections let you organize memories by purpose. The collection name carries semantic meaning:

# Codebase analysis - store patterns, architecture notes, API quirks
semantic-memory store "Auth uses httpOnly JWT cookies with 7-day refresh" --collection codebase
semantic-memory store "The useOptimistic hook requires a reducer pattern" --collection codebase
semantic-memory find "authentication" --collection codebase

# Research/learning - concepts, connections, questions
semantic-memory store "Effect-TS uses generators for async, not Promises" --collection research
semantic-memory find "effect async patterns" --collection research

# Project onboarding - gotchas, tribal knowledge, "why is it like this"
semantic-memory store "Don't use React.memo on components with children - causes stale closures" --collection gotchas
semantic-memory find "performance issues" --collection gotchas

# Personal knowledge - decisions, preferences, breakthroughs
semantic-memory store "Prefer composition over inheritance for React components" --collection decisions
semantic-memory find "react patterns" --collection decisions

Search across all collections or within one:

# Search everything
semantic-memory find "authentication"

# Search specific collection
semantic-memory find "authentication" --collection codebase

The Qdrant Pattern

The killer feature: tool descriptions are configurable.

Same semantic memory, different agent behaviors:

# Codebase assistant - searches before generating, stores patterns found
TOOL_STORE_DESCRIPTION="Store code patterns, architecture decisions, and API quirks discovered while analyzing the codebase. Include file paths and context." \
TOOL_FIND_DESCRIPTION="Search codebase knowledge. Query BEFORE making changes to understand existing patterns." \
semantic-memory find "auth patterns"

# Research assistant - accumulates and connects ideas
TOOL_STORE_DESCRIPTION="Store concepts, insights, and connections between ideas. Include source references." \
TOOL_FIND_DESCRIPTION="Search research notes. Use to find related concepts and prior findings." \
semantic-memory find "async patterns"

# Onboarding assistant - captures tribal knowledge
TOOL_STORE_DESCRIPTION="Store gotchas, workarounds, and 'why is it like this' explanations. Future devs will thank you." \
TOOL_FIND_DESCRIPTION="Search for known issues and gotchas. Check BEFORE debugging to avoid known pitfalls." \
semantic-memory find "common mistakes"

The description tells the LLM when and how to use the tool. Change the description, change the behavior. No code changes.

OpenCode Integration

Drop this in ~/.config/opencode/tool/semantic-memory.ts:

import { tool } from "@opencode-ai/plugin";
import { $ } from "bun";

// Rich descriptions that shape agent behavior
// Override via env vars for different contexts
const STORE_DESC =
  process.env.TOOL_STORE_DESCRIPTION ||
  "Persist important discoveries, decisions, and learnings for future sessions. Use for: architectural decisions, debugging breakthroughs, user preferences, project-specific patterns. Include context about WHY something matters.";
const FIND_DESC =
  process.env.TOOL_FIND_DESCRIPTION ||
  "Search your persistent memory for relevant context. Query BEFORE making architectural decisions, when hitting familiar-feeling bugs, or when you need project history. Returns semantically similar memories ranked by relevance.";

async function run(args: string[]): Promise<string> {
  const result = await $`npx semantic-memory ${args}`.text();
  return result.trim();
}

export const store = tool({
  description: STORE_DESC,
  args: {
    information: tool.schema.string().describe("The information to store"),
    collection: tool.schema
      .string()
      .optional()
      .describe("Collection name (e.g., 'codebase', 'research', 'gotchas')"),
    metadata: tool.schema
      .string()
      .optional()
      .describe("Optional JSON metadata"),
  },
  async execute({ information, collection, metadata }) {
    const args = ["store", information];
    if (collection) args.push("--collection", collection);
    if (metadata) args.push("--metadata", metadata);
    return run(args);
  },
});

export const find = tool({
  description: FIND_DESC,
  args: {
    query: tool.schema.string().describe("Natural language search query"),
    collection: tool.schema
      .string()
      .optional()
      .describe("Collection to search (omit for all)"),
    limit: tool.schema
      .number()
      .optional()
      .describe("Max results (default: 10)"),
  },
  async execute({ query, collection, limit }) {
    const args = ["find", query];
    if (collection) args.push("--collection", collection);
    if (limit) args.push("--limit", String(limit));
    return run(args);
  },
});

Per-Project Configuration

For project-specific behavior, create a wrapper script or use direnv:

# .envrc (with direnv)
export TOOL_STORE_DESCRIPTION="Store patterns found in this Next.js codebase. Include file paths."
export TOOL_FIND_DESCRIPTION="Search codebase patterns. Check before implementing new features."

Or create project-specific OpenCode tools that hardcode the collection:

// .opencode/tool/codebase-memory.ts
export const remember = tool({
  description: "Store a pattern or insight about this codebase",
  args: { info: tool.schema.string() },
  async execute({ info }) {
    return $`npx semantic-memory store ${info} --collection ${process.cwd()}`.text();
  },
});

Configuration

All via environment variables:

| Variable | Default | Description | | ----------------------------- | ------------------------ | ----------------------------------- | | SEMANTIC_MEMORY_PATH | ~/.semantic-memory | Where to store the database | | OLLAMA_HOST | http://localhost:11434 | Ollama API endpoint | | OLLAMA_MODEL | mxbai-embed-large | Embedding model (1024 dims) | | COLLECTION_NAME | default | Default collection | | MEMORY_DECAY_HALF_LIFE_DAYS | 90 | Days for confidence decay half-life | | TOOL_STORE_DESCRIPTION | (see code) | MCP tool description for store | | TOOL_FIND_DESCRIPTION | (see code) | MCP tool description for find |

Confidence Decay

Memories decay in relevance over time unless validated. This helps surface fresh, actively-used knowledge over stale information.

How it works:

Uses a half-life algorithm: decay = 0.5 ^ (age_in_days / half_life)
Default half-life is 90 days (configurable via MEMORY_DECAY_HALF_LIFE_DAYS)
Search scores are multiplied by the decay factor
Stale memories (>90 days) show a warning

Decay examples:

| Age | Decay Factor | Effect | | -------- | ------------ | -------------- | | Today | 1.0 | Full weight | | 90 days | 0.5 | Half weight | | 180 days | 0.25 | Quarter weight |

Example search output:

Results (decay half-life: 90 days):
1. [score: 0.82, age: 3d, decay: 0.98] JWT tokens should use httpOnly cookies
   Collection: codebase | ID: mem_abc123
2. [score: 0.45, age: 120d, decay: 0.40] Use localStorage for auth tokens
   Collection: codebase | ID: mem_ghi789
   ⚠️ Stale (120 days) - consider validating or removing

Refreshing memories:

# Validate a memory to reset its decay (marks it as still relevant)
semantic-memory validate <id>

Use validate when you confirm a memory is still accurate and useful. This resets the decay clock.

How It Works

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Ollama    │────▶│   PGlite    │────▶│  pgvector   │
│ (embeddings)│     │ (WASM PG)   │     │ (HNSW idx)  │
└─────────────┘     └─────────────┘     └─────────────┘
      │                    │                   │
      │              memories table      memory_embeddings
      │              - id                - memory_id (FK)
      │              - content           - embedding vector(1024)
      │              - metadata (JSONB)
      │              - collection
      └──────────────────────────────────────────┘
                    cosine similarity search

Ollama generates embeddings locally with mxbai-embed-large (1024 dimensions)
PGlite is Postgres compiled to WASM - no server, runs in your process
pgvector provides real vector operations with HNSW indexes for fast approximate nearest neighbor search
Effect-TS handles errors, retries, and resource cleanup properly

Use Cases

Codebase Analysis

Store patterns, architecture decisions, and API quirks as you explore a new codebase. Query before making changes.

Session Memory

Remember facts across AI sessions. No more re-explaining context every conversation.

Documentation Cache

Pre-load docs into a collection, search before hallucinating answers.

Research Assistant

Accumulate findings, connect ideas across sources, build up domain knowledge.

Onboarding Knowledge Base

Capture the "why" behind decisions, known gotchas, and tribal knowledge for future team members.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

semantic-memory

Why

Install

CLI

Collections for Context

The Qdrant Pattern

OpenCode Integration

Per-Project Configuration

Configuration

Confidence Decay

How It Works

Use Cases

Codebase Analysis

Session Memory

Documentation Cache

Research Assistant

Onboarding Knowledge Base

License