@awareness-sdk/local
v0.11.6
Published
Local-first AI agent memory system. No account needed.
Downloads
2,853
Maintainers
Readme
@awareness-sdk/local
Local-first AI agent memory system. No account needed.
Benchmark: LongMemEval (ICLR 2025)
Awareness Memory is evaluated on LongMemEval — the industry standard benchmark for long-term conversational memory, published at ICLR 2025. 500 human-curated questions across 5 core capabilities.
╔══════════════════════════════════════════════════════════════╗
║ ║
║ Awareness Memory — LongMemEval Benchmark Results ║
║ ───────────────────────────────────────────────── ║
║ ║
║ Benchmark: LongMemEval (ICLR 2025) ║
║ Dataset: 500 human-curated questions ║
║ Variant: LongMemEval_S (~115k tokens per question) ║
║ ║
║ ┌─────────────────────────────────────────────────┐ ║
║ │ │ ║
║ │ Recall@1 77.6% (388 / 500) │ ║
║ │ Recall@3 91.8% (459 / 500) │ ║
║ │ Recall@5 95.6% (478 / 500) ◀ PRIMARY │ ║
║ │ Recall@10 97.4% (487 / 500) │ ║
║ │ │ ║
║ └─────────────────────────────────────────────────┘ ║
║ ║
║ Method: Hybrid RRF (BM25 + Semantic Vector Search) ║
║ Embedding: all-MiniLM-L6-v2 (384d) ║
║ LLM Calls: 0 (pure retrieval, no generation cost) ║
║ Hardware: Apple M1, 8GB RAM — 14 min total ║
║ ║
╚══════════════════════════════════════════════════════════════╝Leaderboard
┌─────────────────────────────────────────────────────────────┐
│ Long-Term Memory Retrieval — R@5 Leaderboard │
│ LongMemEval (ICLR 2025, 500 questions) │
├─────────────────────────────────┬───────────┬───────────────┤
│ System │ R@5 │ Note │
├─────────────────────────────────┼───────────┼───────────────┤
│ MemPalace (ChromaDB raw) │ 96.6% │ R@5 only * │
│ ★ Awareness Memory (Hybrid) │ 95.6% │ Hybrid RRF │
│ OMEGA │ 95.4% │ QA Accuracy │
│ Mastra (GPT-5-mini) │ 94.9% │ QA Accuracy │
│ Mastra (GPT-4o) │ 84.2% │ QA Accuracy │
│ Supermemory │ 81.6% │ QA Accuracy │
│ Zep / Graphiti │ 71.2% │ QA Accuracy │
│ GPT-4o (full context) │ 60.6% │ QA Accuracy │
├─────────────────────────────────┴───────────┴───────────────┤
│ * MemPalace 96.6% is Recall@5 only, not QA Accuracy. │
│ Palace hierarchy was NOT used in the evaluation. │
└─────────────────────────────────────────────────────────────┘Accuracy by Question Type
┌─────────────────────────────────────────────────────────────┐
│ Awareness Memory — R@5 by Question Type │
│ │
│ knowledge-update ████████████████████████████ 100% │
│ multi-session ███████████████████████████▋ 98.5%│
│ single-session-asst ███████████████████████████▌ 98.2%│
│ temporal-reasoning █████████████████████████▊ 94.7%│
│ single-session-user ████████████████████████▎ 88.6%│
│ single-session-pref ███████████████████████▏ 86.7%│
│ │
│ Overall █████████████████████████▉ 95.6%│
│ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Ablation Study │ │
│ │ ───────────────────────────────────────── │ │
│ │ Vector-only: 92.6% ▓▓▓▓▓▓▓▓▓▓▓▓▓░░░ │ │
│ │ BM25-only: 91.4% ▓▓▓▓▓▓▓▓▓▓▓▓▓░░░ │ │
│ │ Hybrid RRF: 95.6% ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░ ★ │ │
│ │ Hybrid = +3% over any │ │
│ │ single method alone │ │
│ └───────────────────────────────────────────────┘ │
│ │
│ arxiv.org/abs/2410.10813 awareness.market │
└─────────────────────────────────────────────────────────────┘Zero LLM calls. Runs on Apple M1 8GB in 14 minutes. Reproducible benchmark scripts →
Install
npm install -g @awareness-sdk/localQuick Start
# Start the local daemon
awareness-local startimport { record, retrieve } from "@awareness-sdk/local/api";
await record({ content: "Refactored auth middleware." });
const result = await retrieve({ query: "What did we refactor?" });
console.log(result.results);Perception (Record-Time Signals)
When you call record(), the response may include a perception array -- automatic signals the system surfaces without you asking. These are computed from pure DB queries (no LLM calls), adding less than 50ms of latency.
Signal types:
| Type | Description |
|------|-------------|
| contradiction | New content conflicts with an existing knowledge card |
| resonance | Similar past experience found in memory |
| pattern | Recurring theme detected (e.g., same category appearing often) |
| staleness | A related knowledge card hasn't been updated in a long time |
| related_decision | A past decision is relevant to what you just recorded |
const result = await awareness_record({
action: "remember",
content: "Decided to use RS256 for JWT signing",
insights: { knowledge_cards: [{ title: "JWT signing", category: "decision", summary: "..." }] }
});
if (result.perception) {
for (const signal of result.perception) {
console.log(`[${signal.type}] ${signal.message}`);
// [pattern] This is the 4th 'decision' card -- recurring theme
// [resonance] Similar past experience: "JWT auth migration"
}
}What makes Awareness different
Most memory systems pick one extraction strategy. Awareness combines them:
- Hybrid retrieval by default — BM25 full-text + vector cosine + knowledge-graph 1-hop expansion, fused with Reciprocal Rank Fusion. 95.6% R@5 on LongMemEval, zero LLM calls on the retrieval side.
- Salience-aware extraction (v0.7.3+) — the client's own LLM self-scores every card on
novelty/durability/specificity; cards scoring below 0.4 on either novelty or durability are dropped server-side. Framework metadata (Sender (untrusted metadata),turn_brief,[Operational context ...]) is filtered before extraction runs, so raw logs never leak into your knowledge base. - Project isolation —
X-Awareness-Project-Dirheader scopes memory per project. Your work memory doesn't leak into your personal memory, even on the same machine. - Learning over time — Ebbinghaus-style card decay, skill crystallization from repeated patterns (F-032 / F-034), workspace graph self-prune to keep
index.dbbounded (F-050). - Zero-LLM backend — all extraction runs on the client's LLM (Claude, GPT-4, Gemini, local Llama). The backend is a coordinator + storage layer; no inference costs pass through to you.
- One memory, many clients — same daemon reachable via Claude Code skills, OpenClaw plugin, npm / pip / ClawHub, and a plain MCP server. Install any one surface and the rest just work against the same memory.
See docs/analysis/MEMPALACE_COMPARISON_2026-04-17.md for the honest side-by-side against MemPalace (96.6% R@5 via raw verbatim storage) — what we'd adopt from their approach and what we keep from ours.
License
Apache-2.0
