@munesoft/memoryx

v1.1.2

Published

11 days ago

Memory for AI agents in one line. Auto-compression, semantic recall, prompt-ready context. No embeddings, no vector DB, no setup. Zero dependencies.

Downloads

181

@munesoft/memoryx

Memory for AI agents in one line. Short-term + long-term memory, semantic recall, prompt-ready context, automatic compression. No embeddings. No vector DB. No setup. Zero dependencies.

import memoryx from "@munesoft/memoryx";

const memory = memoryx();

await memory.remember("User prefers dark mode");
const context = await memory.context("user");

That's it. Five verbs total. Works in Node, the browser, any JS runtime. Production-ready.

TL;DR (for humans and LLMs)

What it is: A universal memory layer for AI agents and JavaScript apps.
Why use it: Give stateless LLMs continuity in one line. No vector DB. No embedding setup. No token-counting boilerplate.
What's special: Real auto-compression — old low-signal entries fold into digests instead of overflowing your prompt budget. 78–84% token savings on typical agent logs.
Install: npm install @munesoft/memoryx
API: remember() · recall() · context() · forget() · clear() · compress()
Storage: in-memory (default) · file · Redis · custom adapter

Why memoryx?

AI agents are stateless. Every prompt is a blank slate. memoryx gives them continuity — without the complexity tax that vector DBs and embedding pipelines bring.

Three verbs you'll use 100x a day:

await memory.remember(data);              // store anything
await memory.recall("topic");              // search and rank
await memory.context("scope");             // prompt-ready, budget-aware

Two more for cleanup:

await memory.forget("topic");              // remove matching entries
await memory.clear();                      // wipe everything

One more for scale:

await memory.compress({ olderThan: "7d" }); // fold old entries into digests

That's the entire API. No BufferMemory vs VectorStoreMemory vs SummaryMemory. No embedding setup. No token-counting boilerplate. You write remember and recall because that's what you mean.

Quick start

npm install @munesoft/memoryx

import memoryx from "@munesoft/memoryx";

const memory = memoryx();

// Store anything — strings, objects, arrays, primitives
await memory.remember("User prefers dark mode");
await memory.remember({ user: "alice", plan: "pro" });
await memory.remember("Meeting at 3pm Tuesday", { tags: ["calendar"] });

// Recall — substring + keyword + fuzzy, ranked by relevance + recency + importance
const prefs = await memory.recall("user preferences");

// Build prompt-ready context — recency-pinned, budget-aware
const context = await memory.context("user");

// Use it in any LLM call
const response = await llm.chat({
  system: `Relevant memory:\n${context}`,
  user: userMessage,
});

The killer feature: real compression

Most memory libraries grow until they overflow your prompt budget. memoryx automatically folds old, low-signal entries into compressed digests that preserve the high-importance facts.

const memory = memoryx({
  longTermLimit: 1000,    // hit this and compression triggers automatically
  compress: true,         // default
});

// ...500 agent turns later...

const result = await memory.compress({ olderThan: "7d" });
// → { compressed: 423, digests: 6, tokensSaved: 11700 }

Real benchmark numbers (benchmarks/compression-bench.mjs, 500 typical agent log entries):

| Mode | Entries | Tokens | Saved | |---|---|---|---| | No compression | 500 | ~15,000 | — | | Manual compress() | 101 | ~3,300 | 78% | | Auto-compression at limit | 61 | ~2,388 | 84% |

All 10 high-importance entries remained recallable after compression.

Compression is extractive by default — no LLM call, no API key, no latency hit. Plug in your own summarizer when you want richer compression:

const memory = memoryx({
  summarizer: async (entries) => {
    const response = await openai.chat.completions.create({
      model: "gpt-4o-mini",
      messages: [{
        role: "user",
        content: `Summarize:\n${entries.map(e => e.text).join("\n")}`,
      }],
    });
    return response.choices[0].message.content;
  },
});

Features

⚡ One-line setup — memoryx() and you're done
🧠 Multi-layer memory — short-term, long-term, session
🔍 Semantic recall — substring + Jaccard + fuzzy (no embeddings required)
📝 Context builder — context(scope) returns prompt-ready, budget-aware text
🗜️ Auto compression — old entries fold into digests, not silent eviction
⏱ TTL — entries expire automatically
🏷 Namespaces + tags — slice memory cleanly
⭐ Importance scoring — pin critical facts so they survive compression
🔁 Auto-deduplication — duplicate or near-duplicate entries get merged
🪝 Hooks — onRemember, onRecall, onCompress, onForget, onClear
🔌 Pluggable storage — memory (default), file, Redis, or your own adapter
🔌 Pluggable summarizer — extractive default, plug in any LLM
📦 ESM + CJS — tree-shakable, fully typed
📦 Zero dependencies — runs in Node, the browser, anywhere
🚀 Fast — sub-3ms recall on 1k-entry in-memory stores
🤖 AI-friendly API — intent-named verbs that LLMs generate correctly

API

`memoryx(options?)`

const memory = memoryx({
  ai: true,                    // prompt-overflow-aware context building
  shortTerm: true,             // enable short-term layer (default true)
  longTerm: true,              // enable long-term layer (default true)
  session: true,               // enable session layer (default true)
  store: "memory",             // "memory" | "file" | { type: "redis", client }
  namespace: "default",        // default namespace
  shortTermLimit: 100,         // promotes to long-term beyond this
  longTermLimit: 1000,         // compresses (or evicts) beyond this
  contextBudget: 2000,         // approximate tokens for context()
  dedupe: true,                // auto-merge near-duplicates
  dedupeThreshold: 0.85,       // similarity for dedupe (0–1)
  compress: true,              // auto-compress on overflow (default true)
  summarizer: extractive,      // pluggable — see "compression"
  hooks: { onRemember, onRecall, onCompress, onForget, onClear },
});

`remember(data, options?)`

Store anything. Returns the entry id.

await memory.remember(data, {
  ttl: 60_000,                 // expire after 60 seconds
  tags: ["user", "preference"],
  importance: 0.9,             // 0–1 — high-importance survives compression
  namespace: "ecommerce",
  layer: "long",               // "short" | "long" | "session"
});

`recall(query, options?)`

Search and rank. Returns scored results.

const results = await memory.recall("user preferences", {
  limit: 10,
  minScore: 0.2,
  namespace: "ecommerce",
  recentFirst: false,
  tags: ["preference"],
  layer: "long",
});

// [{ entry, data, score, scores: { relevance, recency, importance } }]

Scoring blends substring containment, Jaccard token overlap, and fuzzy token similarity for relevance, then combines with recency (7-day half-life) and importance.

`context(scope?, options?)`

Build prompt-ready text under a token budget.

const ctx = await memory.context("user preferences", {
  budget: 500,
  namespace: "user",
  recentCount: 3,              // always include the N most-recent entries
});

When scope is given, entries relevant to it rank higher. The output is plain text, one entry per line, ready to splice into a prompt.

`compress(options?)`

Manually compress old entries into digests.

const result = await memory.compress({
  olderThan: "7d",             // number (ms) or duration: "60s", "1h", "7d"
  namespace: "chat",
  minBatch: 3,
});
// → { compressed: 423, digests: 6, tokensSaved: 11700 }

Compression also runs automatically when longTermLimit is exceeded.

`forget(query)` / `clear(namespace?)`

await memory.forget("temporary scratch notes");
await memory.clear();              // wipe all
await memory.clear("ecommerce");   // wipe one namespace

`stream(chunk, options?)`

Capture streaming AI output without losing partials:

for await (const chunk of llmStream) {
  await memory.stream(chunk, { streamId: "response-42" });
}
const full = await memory.recall("...", { tags: ["stream:response-42"] });

For AI agents

This section is written so an LLM agent (Claude, GPT, Cursor, Claude Code, etc.) generating code that uses memoryx will produce correct, idiomatic calls on the first try.

The canonical pattern:

import memoryx from "@munesoft/memoryx";

const memory = memoryx({ ai: true });

// At the start of every turn:
const context = await memory.context(userMessage, { budget: 500 });

// After every turn:
await memory.remember(`User: ${userMessage}`, { tags: ["chat"] });
await memory.remember(`Assistant: ${reply}`, { tags: ["chat"], importance: 0.7 });

When to set importance:

0.9–1.0 — explicit user preferences, system facts, critical decisions
0.5 (default) — normal observations
0.1–0.3 — low-signal logs, intermediate scratch notes

When to use tags:

Group related entries (["chat"], ["task:onboarding"], ["error"])
Filter recall (memory.recall("...", { tags: ["chat"] }))

When to use namespace:

Per-user isolation (namespace: userId)
Per-feature isolation (namespace: "auth" vs namespace: "billing")

When to call compress():

After ~500 entries in a long-running agent, or
On a timer (setInterval(() => memory.compress({ olderThan: "1h" }), 60_000 * 60))

Common mistakes to avoid:

❌ memory.recall(entireConversation) — pass a topic, not the full text
❌ await memory.remember(JSON.stringify(obj)) — just pass obj, memoryx handles serialization
❌ Building context manually from recall() results — use context(), it ranks and budgets for you

Storage adapters

In-memory (default)

const memory = memoryx();  // uses MemoryAdapter

File (Node.js)

const memory = memoryx({ store: "file" });
const memory = memoryx({ store: { type: "file", path: "./memory.json" } });

Writes are debounced (50ms) to avoid I/O thrashing.

Redis

memoryx is zero-dependency, so you bring your own client:

import Redis from "ioredis";

const memory = memoryx({
  store: {
    type: "redis",
    client: new Redis(),
    prefix: "myapp:memory",
  },
});

Works with ioredis, node-redis, Upstash, or anything implementing get / set / del / keys.

Custom

import type { StorageAdapter, MemoryEntry } from "@munesoft/memoryx";

class MyAdapter implements StorageAdapter {
  async get(id: string): Promise<MemoryEntry | undefined> { /* ... */ }
  async set(entry: MemoryEntry): Promise<void> { /* ... */ }
  async delete(id: string): Promise<boolean> { /* ... */ }
  async clear(): Promise<void> { /* ... */ }
  async all(): Promise<MemoryEntry[]> { /* ... */ }
  async size(): Promise<number> { /* ... */ }
}

const memory = memoryx({ store: new MyAdapter() });

Hooks

const memory = memoryx({
  hooks: {
    onRemember: (entry) => console.log("stored", entry.id),
    onRecall: (query, results) => console.log("recalled", query, results.length),
    onCompress: (result) => console.log("compressed", result.tokensSaved, "tokens"),
    onForget: (ids) => console.log("forgot", ids),
    onClear: () => console.log("cleared"),
  },
});

Hooks may be async. Errors thrown inside a hook are swallowed — they never crash your application.

Use cases

AI agents and chatbots

Give your agent memory across turns, sessions, and runs. Pull context(userMessage) at the start of every turn, remember() the result at the end. Done.

Personalization

Track user preferences, recent behavior, and patterns without standing up a database. Works in the browser too.

RAG augmentation

Use memoryx as your working memory alongside a vector DB. Vector DB for semantic doc retrieval, memoryx for short-term conversational state.

Workflow memory

Store intermediate results in long-running pipelines. TTL + namespaces keep different stages cleanly separated.

Browser-based agents

Zero dependencies, no Node-only APIs in the default path. Drop it into a Chrome extension, an Electron app, or a frontend SPA.

memoryx vs alternatives

| | memoryx | LangChain memory | mem0 | Vector DB | |---|---|---|---|---| | Setup | 1 line | many classes | API key + setup | infra + embeddings | | Embeddings required | ❌ no | ⚠️ usually | ✅ yes | ✅ yes | | Vector DB required | ❌ no | ⚠️ for vector memory | ⚠️ managed | ✅ yes | | Auto compression | ✅ built-in | ❌ manual | ⚠️ partial | ❌ | | Browser support | ✅ yes | ⚠️ limited | ❌ no | ❌ no | | Dependencies | 0 | many | several | many | | Cost at small scale | $0 | $0 | API costs | infra costs |

When to use memoryx: AI agents, chatbots, personalization, browser memory, working memory in RAG pipelines, prototyping.

When to use a vector DB instead: Document retrieval over millions of chunks, semantic search across an entire knowledge base, RAG over static corpora.

You can use both. memoryx for working memory, a vector DB for knowledge retrieval.

Performance

Storing 1000 entries…
  remember × 1000     ~150ms total

Running 1000 recalls on a 1000-entry store…
  recall (limit=10)   ~2ms / op
  recall (limit=1)    ~2ms / op

Compression saves 78%–84% of tokens on typical agent logs while keeping all high-importance entries recallable.

TypeScript

Full types are bundled. Strict mode clean.

import memoryx, {
  Memory,
  MemoryAdapter,
  FileAdapter,
  RedisAdapter,
  extractiveSummarizer,
  type MemoryEntry,
  type MemoryxOptions,
  type MemoryOptions,
  type RecallOptions,
  type RecallResult,
  type CompressOptions,
  type CompressResult,
  type StorageAdapter,
  type Summarizer,
} from "@munesoft/memoryx";

FAQ

Do I need OpenAI / embeddings / a vector DB?

No. memoryx uses substring + Jaccard + fuzzy matching by default. It works offline, in the browser, and at $0 cost. You can plug in an LLM-backed summarizer if you want richer compression, but it's optional.

How is this different from LangChain memory?

LangChain has multiple memory classes (BufferMemory, VectorStoreMemory, SummaryMemory, CombinedMemory) and pushes you toward vector DBs and embeddings early. memoryx is one class, one mental model, and zero infrastructure by default. You scale up only when you need to.

How is this different from mem0?

mem0 is a hosted service with API costs and embedding requirements. memoryx is a local library with zero dependencies. Different tradeoffs — use mem0 if you want managed infrastructure, memoryx if you want full control.

Does it work in the browser?

Yes. The default in-memory adapter has no Node-specific APIs. The file adapter is Node-only and degrades gracefully in the browser.

How does compression actually work?

When long-term storage exceeds longTermLimit, the lowest-signal entries (low importance + low recency) are folded into a single digest entry that preserves the highest-importance content. The originals are removed; the digest stays searchable. You can also trigger this manually with memory.compress({ olderThan: "7d" }).

What if I want LLM-quality summaries instead of extractive?

Pass a summarizer option:

const memory = memoryx({
  summarizer: async (entries) => {
    // your LLM call here
    return summary;
  },
});

Is the data encrypted at rest?

No — memoryx is a memory layer, not a security product. If you're storing sensitive data, encrypt it before passing to remember(), or use a custom adapter that handles encryption.

Philosophy

Memory turns AI from reactive to intelligent.

memoryx is opinionated about being unopinionated:

No required setup — useful in 1 line, scales when you need it
No external services — works offline, in the browser, anywhere JS runs
No vector DB required — keyword + fuzzy is enough for most agent memory
No silent data loss — compression preserves signal; eviction is the fallback, not the default
Drop-in scaling — swap the adapter, not the API

The future isn't frameworks controlling everything. It's small, dominant primitives used everywhere.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@munesoft/memoryx

TL;DR (for humans and LLMs)

Table of contents

Why memoryx?

Quick start

The killer feature: real compression

Features

API

memoryx(options?)

remember(data, options?)

recall(query, options?)

context(scope?, options?)

compress(options?)

forget(query) / clear(namespace?)

stream(chunk, options?)

For AI agents

Storage adapters

In-memory (default)

File (Node.js)

Redis

Custom

Hooks

Use cases

AI agents and chatbots

Personalization

RAG augmentation

Workflow memory

Browser-based agents

memoryx vs alternatives

Performance

TypeScript

FAQ

Do I need OpenAI / embeddings / a vector DB?

How is this different from LangChain memory?

How is this different from mem0?

Does it work in the browser?

How does compression actually work?

What if I want LLM-quality summaries instead of extractive?

Is the data encrypted at rest?

Philosophy

License

`memoryx(options?)`

`remember(data, options?)`

`recall(query, options?)`

`context(scope?, options?)`

`compress(options?)`

`forget(query)` / `clear(namespace?)`

`stream(chunk, options?)`