@lumetra/engram
v0.6.0
Published
Official TypeScript client for Engram — durable, explainable memory for AI agents.
Downloads
1,123
Maintainers
Readme
@lumetra/engram
Official TypeScript client for Engram — durable, explainable memory for AI agents.
- Zero runtime dependencies (uses built-in
fetch). - ESM + CommonJS, full
.d.tstypings. - Node 18+, Bun, Deno, edge runtimes.
Install
npm install @lumetra/engram
# or
yarn add @lumetra/engram
# or
pnpm add @lumetra/engramServer-side only. Don't import
@lumetra/engramin browser code — youreng_live_...API key would be visible in client JS and steal-able from any user. Always call the SDK from a server route, edge function, API handler, or worker. The Engram API key is a Bearer token; treat it like any backend secret.
Quickstart
import { EngramClient } from '@lumetra/engram';
const engram = new EngramClient({
apiKey: process.env.ENGRAM_API_KEY, // or set ENGRAM_API_KEY and omit
});
// Store a fact
await engram.storeMemory('User prefers dark mode.', 'user-123');
// Recall — returns a synthesized answer plus the memories that contributed
const result = await engram.query('What are this user\'s UI preferences?', {
buckets: ['user-123'],
});
console.log(result.answer);
console.log(result.explanation?.retrieved_memories);Automatic 429 retry
The Engram API enforces a per-tenant concurrent-request cap and returns 429 Too Many Requests with a Retry-After header when you exceed it. The client honors that header automatically (up to maxRetriesOn429 attempts, default 3, capped at 30s per sleep) so bursty workloads don't fail on the first contention spike. Pass maxRetriesOn429: 0 in the constructor to opt out and surface 429 as EngramError immediately.
Configuration
new EngramClient({
apiKey: 'eng_live_...', // or ENGRAM_API_KEY env var
baseUrl: 'https://api.lumetra.io', // or ENGRAM_BASE_URL env var
timeoutMs: 30_000, // optional, default 30s
fetch: customFetch, // optional, defaults to globalThis.fetch
});BYOK reminder. Engram is bring-your-own-key end-to-end. Configure an OpenAI / Anthropic / Groq / Together / Fireworks key on the Lumetra portal before your first call, or
storeMemory/querywill return HTTP 412.
API surface
Memories
storeMemory(content, bucket?, { dedup? })— store a single fact.bucketdefaults to"default".dedupis"off" | "loose" | "strict"; omit to use the server's default policy. See Dedup below.storeMemories(contents[], bucket?)— batched store.bucketdefaults to"default".listMemories(bucket?, { limit?, offset? })— paginated list (limitdefaults to 20,offsetto 0).deleteMemory(memoryId, bucket?)— delete one memory.bucketdefaults to"default".clearMemories(bucket)— delete every memory in a bucket. No default — explicit bucket required (prevents accidental wipes).
Query knobs
query and queryStream accept these tuning options (all optional):
| Field | Type | What it does |
|---|---|---|
| maxTokens | number | Cap synthesis output. Lower for agent loops / cost control. |
| minSimilarityThreshold | number | Drop retrieved chunks below this raw cosine similarity. Citations-grade precision. |
| topKPerBucket | number \| Record<string,number> | Per-bucket retrieval depth. { edgar_AAPL: 20, prices_AAPL: 4 } lets you express "deep here, shallow there." |
| returnFormat | 'prose' \| 'json' | When 'json', server returns JSON; result includes parsed answer_json. |
| responseSchema | Record<string, unknown> (JSON Schema) | Hint the model with a target shape. Best-effort; validate client-side for strict. |
Example:
const r = await engram.query("Apple's active legal proceedings", {
buckets: ['edgar_AAPL', 'patents_AAPL'],
topKPerBucket: { edgar_AAPL: 20, patents_AAPL: 5 },
maxTokens: 400,
returnFormat: 'json',
responseSchema: {
type: 'array',
items: {
properties: {
case_name: { type: 'string' },
jurisdiction: { type: 'string' },
status: { type: 'string' },
},
},
},
});
const cases = r.answer_json as Array<{case_name: string; jurisdiction: string; status: string}> | undefined;
for (const c of cases ?? []) console.log(c);Query
query(question, { buckets?, topK?, skipSynthesis?, returnExplanation? })bucketsfuses across multiple buckets in one call. Defaults to["default"].topKdefaults to8.skipSynthesis: truereturns retrieval-only — no server-side LLM call. Defaults tofalse.returnExplanationdefaults totrue.- response shape:
{ answer, memories_found, explanation: { retrieved_memories, graph_facts, entity_matches, context_tokens, profile }, usage }. Eachgraph_facts[i]includesmemory_idso you can match it againstretrieved_memories[].memory_idand render the citing memory.
queryStream(question, options?)— same args, returns anAsyncIterable<QueryStreamEvent>that streams the answer
Dedup
The server runs a similarity check before storing. By default ("loose", similarity ≥ 0.95) it collapses near-duplicate writes into the existing memory so re-ingesting the same source doesn't bloat the bucket. For most narrative content this is what you want.
For templated time-series content (financial filings, daily metrics, log rows) where rows are structurally similar but each carries unique values, the default collapses real data. Use dedup: 'off' to disable.
Every response now includes a status field. When status === 'merged', the write was absorbed into an existing memory and three extra fields are present:
const r = await engram.storeMemory('Acme Q1 revenue: $245M', 'finance');
if (r.status === 'merged') {
console.log(`merged into ${r.deduped_into} (${r.merge_reason}, sim=${r.similarity_score?.toFixed(3)})`);
}merge_reason is one of content_hash, embedding_similarity, conflict_keep_existing, concurrent_insert_race.
Opt out for time-series ingest:
for (const row of monthlyPrices) {
await engram.storeMemory(row, 'prices_AAPL', { dedup: 'off' });
}'strict' is a middle ground — only collapses near-identical content (≥ 0.99).
Streaming
For broad questions, synthesis can take 10–25 seconds. queryStream yields the answer incrementally so you can render it as it's produced instead of waiting for the full response:
for await (const event of engram.queryStream('Summarize what I worked on this week', { buckets: ['work'] })) {
if (event.type === 'delta') {
process.stdout.write(event.content);
} else if (event.type === 'done') {
console.log();
console.log(`Used ${event.usage?.output_tokens} tokens`);
}
}Two frame types (discriminated by type):
type QueryStreamEvent =
| { type: 'delta'; content: string }
| { type: 'done'; usage?: QueryUsage; synthesis_usage?: unknown; explanation?: QueryExplanation };deltaframes carry incremental synthesis output, in order. Zero or more.doneis emitted exactly once at the end with final usage and explanation.
Break out of the for await loop to abort the request — the iterator's return() cancels the upstream fetch.
Buckets
listBuckets()— all buckets in your tenantcreateBucket(name, description?)deleteBucket(bucket)— No default — explicit bucket required (prevents accidental wipes).
Profile
getProfile(bucket?)— the canonical profile prepended to recall.bucketdefaults to"default".regenerateProfile(bucket?)— rebuild from current memories.bucketdefaults to"default".
Errors
All HTTP failures throw EngramError:
import { EngramError } from '@lumetra/engram';
try {
await engram.storeMemory('...');
} catch (err) {
if (err instanceof EngramError) {
console.error(err.status, err.body);
}
}Development
npm install
npm run typecheck
npm run build # emits dist/
npm run dev # watch modenpm run prepublishOnly runs typecheck + clean build automatically when publishing.
License
MIT — Copyright (c) 2026 Lumetra.
