@dex-ai/memory
v0.4.1
Published
LanceDB-backed memory Extension for @dex-ai/sdk — episodic, semantic, and procedural memory in one package.
Readme
@dex-ai/memory
LanceDB-backed memory Extension for @dex-ai/sdk. Three memory types — episodic, semantic, procedural — in one extension, one LanceDB directory.
Install
npm install @dex-ai/memoryThe package includes @xenova/transformers for the default local embedder (ONNX via Transformers.js). You can avoid loading it at runtime by passing your own embed() function.
Usage
import { Agent } from '@dex-ai/sdk';
import { openai } from '@dex-ai/openai';
import { memoryExtension } from '@dex-ai/memory';
const agent = await Agent.create({
provider: openai({ modelId: 'gpt-4.1' }),
extensions: [
memoryExtension({
path: '~/.dex/memory.lancedb',
userId: 'alice',
// Optional: a cheaper model for memory's summarize + extract calls.
llm: { model: 'gpt-4o-mini' },
// Optional: bring your own embedder (e.g. a remote endpoint).
// If omitted, a local Transformers.js embedder (all-MiniLM-L6-v2) is used.
// embed: async (texts) => await myRemoteEmbedder(texts),
}),
],
});What lives where
Episodic memory — past turns, auto-summarized
Every generate() iteration, the extension fires a background task:
- Summarize the turn's new messages into 1-3 sentences via the LLM.
- Embed the summary and write both to LanceDB.
Background writes are tracked; agent.dispose() awaits them.
Episodic memories are scoped to the active project root so unrelated projects do not share turn history. The extension derives the scope from Dex coding-agent environment state (env.cwd/rootDirs), workspace state, agent metadata (root_cwd/rootCwd), or finally process.cwd(); when possible it resolves cwd to the containing Git root. Stored episodes include metadata.projectRoot and metadata.cwd.
At recall time, the extension fetches:
- The most-recent episodes for the user within the current project scope.
- The most-similar episodes within the current project scope via LanceDB cosine vector search against the user's last message.
- De-duplicated by id, sorted newest-first.
Semantic memory — durable facts
Facts are (subject, predicate, object) tuples, unique by deterministic id from (userId, subject, predicate). Written in two ways:
Automatic extraction at each iteration stop: the LLM extracts durable claims from the turn and upserts them.
Model-driven via tools:
memory_remember_fact({ subject, predicate, object })— upsert by key.memory_forget_fact({ subject, predicate })— delete by key.
At recall time, semantic facts are selected with LanceDB vector similarity when a query embedding is available. Without an embedding, recall falls back to the most-recent facts.
Procedural memory — runbooks
Long-form how-to content. Stored by unique title, tagged, with an embedding over title + body.
Tools:
memory_store_procedure({ title, body, tags? })— upsert by title.memory_forget_procedure({ title })— delete by exact title.memory_list_procedures({ query?, tag?, limit? })— withquery, returns vector-ranked results; withtag, filters; with neither, returns most-recently-updated.memory_get_procedure({ title })— fetch full body.
Auto-inject: at recall time, the extension does a vector similarity lookup against the user's last message. Matching procedures above the threshold (default 0.5) are prepended to the prompt as memory context.
LLM configuration
memoryExtension({
// ...
llm: {
provider: customProvider, // optional — overrides the agent's provider entirely
model: 'gpt-4o-mini', // optional — passes via providerOptions.model per call
},
});Resolution rule:
- If
llm.provideris set, use it. - Otherwise use the agent's provider.
- If
llm.modelis set, it's passed viaproviderOptions.modelon every memory-internal request.
Extension options
interface MemoryExtensionOptions {
path: string; // LanceDB directory or URI
userId: string; // owner for episodic + semantic (procedural is global)
llm?: { provider?: Provider; model?: string };
embed?: (texts: string[]) => Promise<number[][]>; // 384-dim
episodicRecent?: number; // default 5
episodicSimilar?: number; // default 3
semanticLimit?: number; // default 10
proceduralThreshold?: number; // default 0.5
autoWrite?: boolean; // default true; set false to disable auto-summarize+extract
autoWriteMinMessages?: number; // default 2
}Injected prompt shape
When memory is available, the extension injects a single synthetic memory message before the latest user message. Example:
<memory>
Recent context:
- 5m ago: discussed the auth flow; chose JWT over session cookies.
Relevant facts:
- user prefers TypeScript
- project uses PostgreSQL 15
Relevant procedures (use memory_get_procedure to fetch details):
- [deploy-dex] (id: abc123, similarity: 0.71)
</memory>This message is not persisted to actx.messages — it's a per-turn request rewrite.
Requirements + gotchas
- LanceDB native package:
@lancedb/lancedbships platform-specific optional packages. Install dependencies on the deployment platform so the matching native package is present. - Transformers.js first-run download: ~25 MB ONNX model, cached in
~/.cache/huggingface. Initialembed()takes 10-30s; subsequent calls are fast. - Background writes are fire-and-forget. If the process is killed mid-turn, that turn's memory may not be persisted.
agent.dispose()awaits in-flight writes normally. - Prompt quality of auto-extraction depends on the configured model. Cheap models work but may produce noisier facts. Review with memory tools occasionally and use
memory_forget_factto prune. - Procedural is global, not user-scoped.
Schema
LanceDB tables:
episodic id | user_id | summary | metadata_json | created_at | vector(FLOAT[384])
semantic id | user_id | subject | predicate | object | source | created_at | updated_at | has_vector | vector(FLOAT[384])
procedural id | title | body | tags[] | created_at | updated_at | vector(FLOAT[384])Testing
npm test