@mzhub/cortex

v0.1.3

Published

20 hours ago

Persistent memory for AI agents - A tiered memory system with fact extraction and conflict resolution

0High
0Medium
0Low

mudiazuwa

ai memory llm agents knowledge-graph context openai anthropic gemini groq cerebras ai-memory chatbot-memory

The Problem

AI agents forget.

Not sometimes. Always.

Every conversation starts from zero. Every user has to re-explain themselves. Every preference is lost the moment the session ends.

Monday   User: "I'm allergic to peanuts"
         Bot:  "Noted!"

Friday   User: "What snack should I get?"
         Bot:  "Try our peanut butter cups!"

This is the default behavior of every LLM. They have no memory. Only context windows that reset.

Why Current Memory Systems Fail

The common solution is a vector database. Store everything as embeddings. Retrieve by similarity.

This fails silently when facts change.

March    User: "I work at Google"
         → Stored as embedding ✓

June     User: "I just joined Microsoft"
         → Also stored as embedding ✓

July     User: "Where do I work?"
         → Vector search returns BOTH
         → LLM sees contradictory information
         → Hallucinates or hedges

The core issue:

| What vectors do | What memory requires | | ------------------ | ---------------------- | | Find similar text | Track current truth | | Retrieve matches | Replace outdated facts | | Rank by similarity | Resolve contradictions |

Vector databases answer: "What text matches this query?"

They cannot answer: "What is true about this user right now?"

Read the full explanation →

The Solution: Brain-Inspired Architecture

cortex doesn't just store facts. It thinks like a brain.

┌─────────────────────────────────────────────────────────────┐
│                        User Message                         │
└──────────────────────────┬──────────────────────────────────┘
                           │
            ┌──────────────▼──────────────┐
            │      🧠 FAST BRAIN          │
            │      (Your LLM)             │
            │                             │
            │  • Reasoning                │
            │  • Conversation             │
            │  • Immediate responses      │
            └──────────────┬──────────────┘
                           │
            ┌──────────────▼──────────────┐
            │      Response to User       │ ◄── Returns immediately
            └──────────────┬──────────────┘
                           │
                           │ (async, non-blocking)
                           ▼
            ┌─────────────────────────────┐
            │      🔄 SLOW BRAIN          │
            │      (cortex)               │
            │                             │
            │  • Extract facts            │
            │  • Detect contradictions    │
            │  • Synthesize patterns      │
            │  • Consolidate memories     │
            └─────────────────────────────┘

Built-In Brain Components

| Component | Biological Equivalent | What It Does | | --------------------------- | ---------------------- | ------------------------------------------------------- | | Importance Scoring | Amygdala | Safety-critical facts (allergies) are never forgotten | | Episodic Memory | Hippocampus | Links facts to conversations ("when did I learn this?") | | Hebbian Learning | Neural Plasticity | Frequently accessed facts get stronger | | Deep Sleep | Sleep Consolidation | Synthesizes patterns across conversations | | Memory Stages | Short/Long-term Memory | Facts progress from temporary → permanent | | Contradiction Detection | Prefrontal Cortex | Flags conflicting information in real-time | | Knowledge Graph | Associative Cortex | Links related facts together | | Behavioral Prediction | Pattern Recognition | Detects user habits and preferences |

Learn about the brain architecture →

Quick Start

Install

npm install @mzhub/cortex

Use

import { MemoryOS, JSONFileAdapter } from "@mzhub/cortex";

const memory = new MemoryOS({
  llm: { provider: "openai", apiKey: process.env.OPENAI_API_KEY },
  adapter: new JSONFileAdapter({ path: "./.cortex" }),
});

async function chat(userId, message) {
  // 1. Ask: "What do I know about this user?"
  const context = await memory.hydrate(userId, message);

  // 2. Include it in your LLM call
  const response = await yourLLM({
    system: context.compiledPrompt,
    user: message,
  });

  // 3. Learn from this conversation (non-blocking)
  memory.digest(userId, message, response);

  return response;
}

That's it. The agent now remembers.

Optional: Hierarchical Memory (HMM)

For advanced use cases, enable the Memory Pyramid — compressing thousands of facts into wisdom.

import { HierarchicalMemory } from "@mzhub/cortex";

const hmm = new HierarchicalMemory(adapter, provider, { enabled: true });

// Top-down retrieval: wisdom first, details only if needed
const { coreBeliefs, patterns, facts } = await hmm.hydrateHierarchical(userId);

// Compress facts into patterns ("User is health-conscious")
await hmm.synthesizePatterns(userId);

The Memory Pyramid:

    Level 4: Core Beliefs (BIOS)
    ────────────────────────────
    • Allergies, identity, safety rules
    • ALWAYS loaded, never forgotten

    Level 3: Patterns (Wisdom)
    ────────────────────────────
    • "User is health-conscious"
    • Synthesized from many facts
    • 1 token instead of 50

    Level 2: Facts (Knowledge)
    ────────────────────────────
    • "User ate salad on Tuesday"
    • Standard discrete facts

    Level 1: Raw Logs (Stream)
    ────────────────────────────
    • Ephemeral conversation buffer
    • Auto-flushed after extraction

Learn more about HMM →

Before and After

Without cortex

User: "Recommend a restaurant"
Bot:  "What kind of food do you like?"
User: "I told you last week, I'm vegan"
Bot:  "Sorry, I don't have memory of previous conversations"

Token-heavy prompts (full history)
Repeated clarifications
Inconsistent behavior
User frustration

With cortex

User: "Recommend a restaurant"
Bot:  "Here are some vegan spots near Berlin..."

Preferences remembered
Facts updated when they change
Critical info never forgotten
Predictable behavior

What Gets Stored

cortex stores facts, not chat logs.

┌─────────────────────────────────────────────────────────────┐
│                    User: [email protected]                   │
├───────────────┬─────────────────────────────────────────────┤
│ name          │ John                            (importance: 5) │
│ diet          │ vegan                           (importance: 7) │
│ location      │ Berlin                          (importance: 5) │
│ allergies     │ peanuts                         (importance: 10)│
│ PATTERN       │ health-conscious                (importance: 7) │
├───────────────┴─────────────────────────────────────────────┤
│ Memory Stage: long-term  │  Access Count: 47  │  Sentiment: + │
└─────────────────────────────────────────────────────────────┘

When facts change, they are replaced, not appended. Critical facts (importance ≥ 9) are always included in context.

Safety and Cost Considerations

Security

| Risk | Mitigation | | --------------------------- | ------------------------------------- | | Prompt injection via memory | Content scanning, XML safety wrapping | | PII storage | Detection and optional redaction | | Cross-user leakage | Strict user ID isolation | | Forgetting critical info | Importance scoring (amygdala pattern) |

Built-in Protections:

// Prompt injection is mitigated automatically
// Memory content is XML-escaped and wrapped with safety instructions
const context = await memory.hydrate(userId, message);
// context.compiledPrompt contains:
// <memory_context type="data" trusted="false">
// [escaped content - injection patterns are neutered]
// </memory_context>

// PII detection warns in debug mode
const memory = new MemoryOS({
  llm: { provider: "openai", apiKey: "..." },
  options: { debug: true }, // Enables PII warnings
});

// Path traversal attacks are blocked
// userId "../../../etc/passwd" becomes safe "______etc_passwd"

Cost Control

| Risk | Mitigation | | ------------------------ | ----------------------------------------- | | Runaway extraction costs | Daily token/call budgets | | Token bloat from memory | Hierarchical retrieval (patterns > facts) | | Stale data accumulation | Memory consolidation + automatic decay |

// Built-in budget limits
const budget = new BudgetManager({
  maxTokensPerUserPerDay: 100000,
  maxExtractionsPerUserPerDay: 100,
});

Reliability

Provider Resilience:

// All LLM providers include automatic:
// - 30 second timeout (configurable)
// - 3 retry attempts with exponential backoff
// - Retry on 429, 500, 502, 503, 504 status codes

const memory = new MemoryOS({
  llm: {
    provider: "openai",
    apiKey: process.env.OPENAI_API_KEY,
    // Optional: customize retry behavior
    retry: {
      timeoutMs: 60000, // 60 second timeout
      maxRetries: 5, // 5 attempts
      retryDelayMs: 2000, // Start with 2s delay
    },
  },
});

Configuration Validation:

// Invalid config is caught immediately, not at runtime
new MemoryOS({
  llm: { provider: "fake", apiKey: "" },
});
// Throws: "MemoryOS: config.llm.provider 'fake' is not supported.
//         Valid providers: openai, anthropic, gemini, groq, cerebras."

new MemoryOS({
  llm: { provider: "openai", apiKey: "" },
});
// Throws: "MemoryOS: config.llm.apiKey is required.
//         Get your API key from your LLM provider..."

PostgreSQL Race Condition Protection:

// Unique constraint prevents duplicate facts from concurrent digest() calls
// Automatically created on PostgresAdapter initialization

Who This Is For

Good fit:

AI agents with recurring users
Support bots that need context
Personal assistants
Workflow automation (n8n, Zapier)
Any system where users expect to be remembered

Not a fit:

One-time chat interactions
Document search / RAG
Stateless demos
Replacing vector databases entirely

cortex complements vectors. It does not replace them.

Documentation

Philosophy

Memory should be explicit, not inferred from similarity
Facts should be overwriteable, not append-only
Critical information should never be forgotten
Agents should think like brains, not databases
Infrastructure should be boring and reliable

Changelog

v0.1.2

Security: XML escaping in prompt safety wrapper prevents injection via </memory_context>
Security: PII detection warnings in debug mode
Reliability: Runtime config validation with helpful error messages
Reliability: Provider timeout (30s) and retry (3x with exponential backoff)
Reliability: Unique constraint on PostgreSQL prevents duplicate facts from race conditions
Data Integrity: Importance scores clamped to valid 1-10 range
Data Integrity: Sentiment validation on extracted operations

License

MIT — Built by MZ Hub