@awareness-sdk/local

v0.11.6

Published

3 months ago

Local-first AI agent memory system. No account needed.

0High
0Medium
0Low

edwinhao

ai agent memory local mcp claude cursor windsurf awareness

@awareness-sdk/local

Local-first AI agent memory system. No account needed.

Benchmark: LongMemEval (ICLR 2025)

Awareness Memory is evaluated on LongMemEval — the industry standard benchmark for long-term conversational memory, published at ICLR 2025. 500 human-curated questions across 5 core capabilities.

╔══════════════════════════════════════════════════════════════╗
║                                                              ║
║   Awareness Memory — LongMemEval Benchmark Results           ║
║   ─────────────────────────────────────────────────           ║
║                                                              ║
║   Benchmark:  LongMemEval (ICLR 2025)                       ║
║   Dataset:    500 human-curated questions                    ║
║   Variant:    LongMemEval_S (~115k tokens per question)      ║
║                                                              ║
║   ┌─────────────────────────────────────────────────┐        ║
║   │                                                 │        ║
║   │   Recall@1    77.6%    (388 / 500)              │        ║
║   │   Recall@3    91.8%    (459 / 500)              │        ║
║   │   Recall@5    95.6%    (478 / 500)  ◀ PRIMARY   │        ║
║   │   Recall@10   97.4%    (487 / 500)              │        ║
║   │                                                 │        ║
║   └─────────────────────────────────────────────────┘        ║
║                                                              ║
║   Method:     Hybrid RRF (BM25 + Semantic Vector Search)     ║
║   Embedding:  all-MiniLM-L6-v2 (384d)                       ║
║   LLM Calls:  0  (pure retrieval, no generation cost)        ║
║   Hardware:   Apple M1, 8GB RAM — 14 min total               ║
║                                                              ║
╚══════════════════════════════════════════════════════════════╝

Leaderboard

┌─────────────────────────────────────────────────────────────┐
│          Long-Term Memory Retrieval — R@5 Leaderboard       │
│          LongMemEval (ICLR 2025, 500 questions)             │
├─────────────────────────────────┬───────────┬───────────────┤
│  System                         │  R@5      │  Note         │
├─────────────────────────────────┼───────────┼───────────────┤
│  MemPalace (ChromaDB raw)       │  96.6%    │  R@5 only *   │
│  ★ Awareness Memory (Hybrid)    │  95.6%    │  Hybrid RRF   │
│  OMEGA                          │  95.4%    │  QA Accuracy  │
│  Mastra (GPT-5-mini)            │  94.9%    │  QA Accuracy  │
│  Mastra (GPT-4o)                │  84.2%    │  QA Accuracy  │
│  Supermemory                    │  81.6%    │  QA Accuracy  │
│  Zep / Graphiti                 │  71.2%    │  QA Accuracy  │
│  GPT-4o (full context)          │  60.6%    │  QA Accuracy  │
├─────────────────────────────────┴───────────┴───────────────┤
│  * MemPalace 96.6% is Recall@5 only, not QA Accuracy.      │
│    Palace hierarchy was NOT used in the evaluation.         │
└─────────────────────────────────────────────────────────────┘

Accuracy by Question Type

┌─────────────────────────────────────────────────────────────┐
│     Awareness Memory — R@5 by Question Type                 │
│                                                             │
│  knowledge-update        ████████████████████████████ 100%  │
│  multi-session           ███████████████████████████▋  98.5%│
│  single-session-asst     ███████████████████████████▌  98.2%│
│  temporal-reasoning      █████████████████████████▊    94.7%│
│  single-session-user     ████████████████████████▎     88.6%│
│  single-session-pref     ███████████████████████▏      86.7%│
│                                                             │
│  Overall                 █████████████████████████▉    95.6%│
│                                                             │
│  ┌───────────────────────────────────────────────┐          │
│  │  Ablation Study                               │          │
│  │  ─────────────────────────────────────────    │          │
│  │  Vector-only:   92.6%  ▓▓▓▓▓▓▓▓▓▓▓▓▓░░░     │          │
│  │  BM25-only:     91.4%  ▓▓▓▓▓▓▓▓▓▓▓▓▓░░░     │          │
│  │  Hybrid RRF:    95.6%  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░  ★  │          │
│  │                        Hybrid = +3% over any  │          │
│  │                        single method alone    │          │
│  └───────────────────────────────────────────────┘          │
│                                                             │
│  arxiv.org/abs/2410.10813          awareness.market         │
└─────────────────────────────────────────────────────────────┘

Zero LLM calls. Runs on Apple M1 8GB in 14 minutes. Reproducible benchmark scripts →

Install

npm install -g @awareness-sdk/local

Quick Start

# Start the local daemon
awareness-local start

import { record, retrieve } from "@awareness-sdk/local/api";

await record({ content: "Refactored auth middleware." });
const result = await retrieve({ query: "What did we refactor?" });
console.log(result.results);

Perception (Record-Time Signals)

When you call record(), the response may include a perception array -- automatic signals the system surfaces without you asking. These are computed from pure DB queries (no LLM calls), adding less than 50ms of latency.

Signal types:

| Type | Description | |------|-------------| | contradiction | New content conflicts with an existing knowledge card | | resonance | Similar past experience found in memory | | pattern | Recurring theme detected (e.g., same category appearing often) | | staleness | A related knowledge card hasn't been updated in a long time | | related_decision | A past decision is relevant to what you just recorded |

const result = await awareness_record({
  action: "remember",
  content: "Decided to use RS256 for JWT signing",
  insights: { knowledge_cards: [{ title: "JWT signing", category: "decision", summary: "..." }] }
});

if (result.perception) {
  for (const signal of result.perception) {
    console.log(`[${signal.type}] ${signal.message}`);
    // [pattern] This is the 4th 'decision' card -- recurring theme
    // [resonance] Similar past experience: "JWT auth migration"
  }
}

What makes Awareness different

Most memory systems pick one extraction strategy. Awareness combines them:

Hybrid retrieval by default — BM25 full-text + vector cosine + knowledge-graph 1-hop expansion, fused with Reciprocal Rank Fusion. 95.6% R@5 on LongMemEval, zero LLM calls on the retrieval side.
Salience-aware extraction (v0.7.3+) — the client's own LLM self-scores every card on novelty / durability / specificity; cards scoring below 0.4 on either novelty or durability are dropped server-side. Framework metadata (Sender (untrusted metadata), turn_brief, [Operational context ...]) is filtered before extraction runs, so raw logs never leak into your knowledge base.
Project isolation — X-Awareness-Project-Dir header scopes memory per project. Your work memory doesn't leak into your personal memory, even on the same machine.
Learning over time — Ebbinghaus-style card decay, skill crystallization from repeated patterns (F-032 / F-034), workspace graph self-prune to keep index.db bounded (F-050).
Zero-LLM backend — all extraction runs on the client's LLM (Claude, GPT-4, Gemini, local Llama). The backend is a coordinator + storage layer; no inference costs pass through to you.
One memory, many clients — same daemon reachable via Claude Code skills, OpenClaw plugin, npm / pip / ClawHub, and a plain MCP server. Install any one surface and the rest just work against the same memory.

See docs/analysis/MEMPALACE_COMPARISON_2026-04-17.md for the honest side-by-side against MemPalace (96.6% R@5 via raw verbatim storage) — what we'd adopt from their approach and what we keep from ours.

License

Apache-2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@awareness-sdk/local

Benchmark: LongMemEval (ICLR 2025)

Leaderboard

Accuracy by Question Type

Install

Quick Start

Perception (Record-Time Signals)

What makes Awareness different

License