memory-runtime
v0.2.0
Published
Stateless context runtime SDK for LLM applications
Maintainers
Readme
memory-runtime
Stateless context runtime for LLM applications.
memory-runtime is a stateless SDK that replaces the "sliding window" chat history approach with structured state and snapshot-based context management. It reduces prompt sizes by ~80% while preserving high retrieval quality even in large sessions—with zero database dependencies.
Install
npm i memory-runtimeQuick Start
import { createSession } from "memory-runtime";
// Create or load session
let session = createSession({ sessionId: "user-123" });
// Ingest code, docs, or messages
session.ingest({
type: "snippet",
payload: {
source: "auth.ts",
content: "// authentication code...",
pinned: true // Won't be dropped when buffer fills
}
});
session.ingest({
type: "user_message",
payload: { content: "How does authentication work?" }
});
// Compile budget-aware prompt
const { messages, snapshot } = session.compile({
userMessage: "Explain the auth flow",
budgetTokens: 2000
});
// Call your LLM provider
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages
});
// Observe response for state extraction
const { snapshot: updatedSnapshot } = session.observe({
assistantText: response.choices[0].message.content
});
// Store snapshot (your choice: DB, Redis, localStorage, etc.)
await storage.save(userId, updatedSnapshot);Stateless Architecture
No SQLite. No filesystem. Pure snapshots.
Every compile() and observe() call returns an updated snapshot—a JSON-serializable object containing:
- State: Extracted constraints, decisions, open threads, glossary
- Artifacts: Code snippets, diffs, doc chunks (with rolling buffer)
- Events: Message history (bounded)
- Meta: Custom metadata (e.g., KB ingestion tracking)
Your application owns storage. The library never touches the filesystem.
Snapshot Management
Where to Store Snapshots
- Server-side: PostgreSQL, Redis, or any database
- Client-side: localStorage (compact mode recommended)
- Serverless: Pass snapshots in request/response payloads
- Encrypted cookies: For small sessions with compact mode
Security Note
⚠️ Snapshots contain user content — treat them as sensitive data. Encrypt at rest and in transit.
Size Optimization
Use compact mode for smaller payloads:
const { snapshot } = session.compile({
userMessage: "...",
budgetTokens: 2000,
returnSnapshot: "compact" // Strips artifact content, keeps only IDs
});Compact snapshots include:
- Full state (constraints, decisions, etc.)
- Artifact metadata (IDs, sources, timestamps) — content removed
- Last 10 events only
Typical sizes:
- Full snapshot: 50-200 KB (depending on artifacts)
- Compact snapshot: 5-20 KB (~90% smaller)
KB Ingestion Pattern
Track knowledge base changes via content hashing:
import { createHash } from 'crypto';
const kbHash = createHash('sha256').update(kbContent).digest('hex').slice(0, 16);
// Check if KB needs re-ingestion
if (snapshot.meta?.kbHash !== kbHash) {
// Ingest KB artifacts
session.ingest(makeSnippetArtifact(kbContent, 'kb.txt', { pinned: true }));
// Update metadata
snapshot.meta = { ...snapshot.meta, kbHash, kbIngested: true };
}API Reference
createSession(options?)
Factory for creating sessions:
createSession({
sessionId?: string, // Optional ID (auto-generated if omitted)
snapshot?: Snapshot, // Load from existing snapshot
limits?: {
maxEvents?: number, // Default: 50
maxArtifacts?: number // Default: 50
}
})Session.fromSnapshot(snapshot, limits?)
DX-friendly alternative:
const session = Session.fromSnapshot(previousSnapshot);session.ingest(event)
Ingest events with compile-time type safety:
// User message
session.ingest({
type: "user_message",
payload: { content: string }
});
// Assistant response
session.ingest({
type: "assistant_response",
payload: { content: string }
});
// Artifacts (code, docs, diffs)
session.ingest({
type: "snippet" | "doc_chunk" | "repo_diff",
payload: {
source: string,
content: string,
meta?: object,
pinned?: boolean // Pinned artifacts survive buffer churn
}
});Helper functions for common artifacts:
import { makeSnippetArtifact, makeRepoDiffArtifact } from "memory-runtime";
const snippet = makeSnippetArtifact(code, "file.ts", { startLine: 10, endLine: 20 });
session.ingest(snippet);
const diff = makeRepoDiffArtifact(gitDiffOutput, { repoPath: "/path/to/repo" });
session.ingest(diff);session.compile(options)
Generate messages within token budget:
const result = session.compile({
userMessage: string,
budgetTokens: number,
stablePrefix?: string, // Optional system prompt prefix
returnSnapshot?: "full" | "compact"
});
// Returns:
{
messages: Message[], // Ready for LLM API
debug: {
includedArtifacts: string[], // IDs of included artifacts
droppedArtifacts: string[], // IDs of dropped artifacts
tokenEstimate: number, // Estimated tokens (never exceeds budget)
rationale: string // Explanation of selection
},
snapshot: Snapshot // Updated snapshot
}session.observe(input)
Extract structured state from assistant responses:
const result = session.observe({
assistantText: string,
returnSnapshot?: "full" | "compact"
});
// Returns:
{
snapshot: Snapshot // State updated with extracted constraints/decisions/etc.
}Extraction markers (optional, for structured updates):
Decision: Use JWT for authentication
Constraint: Tokens must expire after 30 minutes
Open: How should we handle refresh tokens?
Glossary: JWT - JSON Web Token for stateless authenticationsession.exportSnapshot(mode?)
Export current snapshot:
const fullSnapshot = session.exportSnapshot("full"); // Complete snapshot
const compactSnapshot = session.exportSnapshot("compact"); // Minimal payloadsession.clear()
Reset state/artifacts, preserve sessionId and meta:
session.clear();Not Summarization
This is not an LLM-based summarization script. Instead, memory-runtime uses a deterministic compilation engine:
- State: Structured records of decisions, constraints, glossary terms
- Artifacts: Content-addressed storage for file snapshots and diffs
- Budgeting: Deterministic truncation and prioritization that fits your token limit
Determinism guarantee: Same snapshot + same inputs = identical output.
Examples
See examples/stateless-conversation.ts for a complete 3-turn demo showing:
- Snapshot serialization (JSON.stringify/parse)
- KB ingestion with hash tracking
- Compact vs full snapshot modes
- State persistence across turns
Run it:
npm run build
tsx examples/stateless-conversation.tsTesting
# Determinism: same snapshot + inputs = same output
tsx scripts/test-determinism.ts
# Budget enforcement: never exceed budgetTokens
tsx scripts/test-budget.ts
# Pinning: pinned artifacts survive buffer churn
tsx scripts/test-pinning.ts
# Compact mode: verify size reduction
tsx scripts/test-snapshot-size.tsLicense: MIT
