@jim80net/memex-core

v0.4.0

Published

2 months ago

Shared core engine for memex skill routers — embeddings, indexing, caching, telemetry

0High
0Medium
0Low

@jim80net/memex-core

Shared core engine for semantic skill, memory, and rule routing in AI agent systems. Provides local embedding generation (via ONNX), a vector-similarity search index, file caching, session tracking, telemetry, cross-device sync, and execution traces -- all with no external API keys required. This package is consumed by platform-specific routers like memex-claude (Claude Code hooks) and memex-openclaw (OpenClaw plugin).

Install

npm install @jim80net/memex-core
# or
pnpm add @jim80net/memex-core

For local embeddings (recommended), also install the optional ONNX dependency:

pnpm add @huggingface/transformers

Quick Start

import {
  LocalEmbeddingProvider,
  SkillIndex,
  resolveCoreConfig,
} from "@jim80net/memex-core";

// 1. Create an embedding provider
const provider = new LocalEmbeddingProvider("Xenova/all-MiniLM-L6-v2", "/tmp/models");

// 2. Resolve config (merges your overrides with defaults)
const config = resolveCoreConfig({ topK: 5, threshold: 0.35 });

// 3. Build the index
const index = new SkillIndex(config, provider, "/tmp/cache/memex-cache.json");
await index.build({
  skillDirs: ["./skills"],
  memoryDirs: ["./memory"],
  ruleDirs: ["./rules"],
});

// 4. Search
const results = await index.search("how do I deploy?", config.topK, config.threshold);
for (const { skill, score, bestQueryIndex } of results) {
  console.log(`${skill.name} (${skill.type}): ${score.toFixed(3)} [query #${bestQueryIndex}]`);
  const content = await index.readSkillContent(skill.location);
  console.log(content);
}

Architecture

| Module | Purpose | |--------|---------| | embeddings | EmbeddingProvider interface with two implementations: LocalEmbeddingProvider (ONNX via @huggingface/transformers) and OpenAIEmbeddingProvider. Also exports cosineSimilarity(). | | skill-index | SkillIndex class -- the main engine. Scans directories for skills, rules, and memories; embeds their queries; caches embeddings; searches by cosine similarity. Also exports parseFrontmatter() and parseMemoryFile(). | | cache | Persistent embedding cache (version 2). Loads/saves a JSON file keyed by file location and gated by mtime. | | config | DEFAULT_CORE_CONFIG and resolveCoreConfig() for merging partial config with type-safe defaults. | | session | SessionTracker interface and InMemorySessionTracker for tracking which rules have been shown per session (graduated disclosure). | | telemetry | Match telemetry: records how often each skill/rule/memory is matched, across which sessions. Tracks per-query hit counts, observations (ASI from deep-sleep), and generates formatted reports. | | sync | Git-based cross-device sync: pull with auto-conflict resolution, commit and push local changes. | | traces | TraceAccumulator for recording execution traces (skills injected, tools called, outcome) per session. | | file-lock | Advisory file locking via mkdir (atomic on all platforms). withFileLock() for safe concurrent writes. | | path-encoder | encodeProjectPath() -- encodes absolute paths into safe directory names. | | project-mapping | Resolves a working directory to a canonical project ID (git remote URL, manual mapping, or encoded path fallback). | | project-registry | Tracks known project directories with lastSeen timestamps. | | types | All TypeScript types and interfaces. | | version | VERSION constant, injected at compile time or defaulting to "dev". |

Key Concepts

EmbeddingProvider

An interface with a single method:

interface EmbeddingProvider {
  embed(texts: string[]): Promise<number[][]>;
}

Two built-in implementations:

LocalEmbeddingProvider -- Runs ONNX models locally via @huggingface/transformers. No API key needed. Default model: Xenova/all-MiniLM-L6-v2. Lazily initializes the model on first call.
OpenAIEmbeddingProvider -- Calls the OpenAI embeddings API. Requires an API key and model name. Batches in groups of 2048.

SkillIndex

The central class. Constructed with (config, provider, cachePath).

build(scanDirs) -- Scans skillDirs, memoryDirs, and ruleDirs for markdown files. Parses frontmatter, generates embeddings for queries, and caches results. Skips unchanged files (mtime-gated). The consumer constructs the ScanDirs object -- no paths are hardcoded.
search(query, topK, threshold, typeFilter?, scoringMode?, maxDropoff?) -- Embeds the query, computes cosine similarity against all indexed entries, applies per-skill boost, and returns the top matches. Each result includes bestQueryIndex (the index into skill.queries that matched best).
readSkillContent(location) -- Reads the body content of a matched skill, stripping frontmatter. Handles memory sections (locations like path#SectionName).
needsRebuild() -- Returns true if the cache TTL (cacheTimeMs) has expired.

ScanDirs

type ScanDirs = {
  skillDirs: string[];   // directories containing skill-name/SKILL.md subdirectories
  memoryDirs: string[];  // directories containing *.md memory files
  ruleDirs: string[];    // directories containing *.md rule files
};

The consumer builds this from platform-specific paths (e.g., ~/.claude/skills/, ~/.openclaw/skills/). This is how the core stays platform-agnostic.

MemexPaths

type MemexPaths = {
  cacheDir: string;
  modelsDir: string;
  sessionsDir: string;
  syncRepoDir: string;
  projectsDir: string;
  telemetryPath: string;
  registryPath: string;
  tracesDir: string;
};

A descriptor for all filesystem paths the engine uses. The consumer constructs this and passes individual paths to the relevant functions. The core never assumes path locations.

Scoring Modes

"relative" (default) -- If the best match clears the threshold floor, include up to topK results that are within maxDropoff of the best score. Good for surfacing a cluster of related content.
"absolute" -- Each result must individually exceed the threshold. Stricter, but may return fewer results.

Frontmatter Extensions

Skills and rules use YAML frontmatter with these fields:

---
name: my-skill
description: What this skill does
type: skill          # skill | memory | rule | workflow | session-learning | tool-guidance | stop-rule
queries:
  - "how do I deploy"
  - "deployment steps"
keywords:
  - deploy
  - release
paths:
  - "src/**/*.ts"
hooks:
  - PreToolUse
one-liner: Short reminder text for repeated matches
boost: 0.05
---

queries and keywords are embedded and used for similarity search. one-liner is used for graduated disclosure (full content on first match, one-liner on subsequent matches in the same session). boost is an optional float added to the raw similarity score before threshold comparison -- use it to nudge skills that are consistently near the threshold boundary.

Configuration

resolveCoreConfig() merges a partial config with these defaults:

| Field | Default | Description | |-------|---------|-------------| | enabled | true | Master switch | | embeddingModel | "Xenova/all-MiniLM-L6-v2" | Model name for embeddings | | embeddingBackend | "local" | "local" (ONNX) or "openai" | | cacheTimeMs | 300000 (5 min) | How long before needsRebuild() returns true | | topK | 3 | Max results per search | | threshold | 0.35 | Minimum similarity score | | scoringMode | "relative" | "relative" or "absolute" | | maxDropoff | 0.1 | Max score gap from best match (relative mode only) | | maxInjectedChars | 8000 | Character budget for injected context | | types | ["skill", "memory", "workflow", "session-learning", "rule"] | Which entry types to index | | skillDirs | [] | Additional skill directories | | memoryDirs | [] | Additional memory directories |

Consumers typically extend MemexCoreConfig with platform-specific fields (hooks config, sync config, sleep schedule, etc.) and handle file loading themselves.

Sync

The sync module provides Git-based cross-device sync via syncPull and syncCommitAndPush. Both accept a SyncConfig object.

Case-insensitive project IDs (default)

Project IDs are lowercased by default across all three resolution paths (manual mappings, git remote URLs, and encoded _local/ path fallbacks). A clone of [email protected]:Jim80Net/Repo.git and [email protected]:jim80net/repo.git now map to the same canonical id: github.com/jim80net/repo.

To preserve the original case, set caseSensitive: true in your sync config:

const syncConfig: SyncConfig = {
  enabled: true,
  repo: "[email protected]:me/memex-sync.git",
  autoPull: true,
  autoCommitPush: true,
  projectMappings: {},
  caseSensitive: true, // preserve case as-is
};

On first sync after upgrading from a version that wrote mixed-case directories, syncPull will run a one-shot migration that renames legacy paths to lowercase and writes a .memex-sync/version.json marker so the scan only runs once. The migration is safe across devices (only runs against post-pull state), idempotent, and handles case-insensitive filesystems (macOS APFS, Windows NTFS) correctly.

Development

pnpm install --ignore-scripts   # skip onnxruntime postinstall
pnpm test                       # run vitest
pnpm typecheck                  # tsc --noEmit
pnpm lint                       # biome check
pnpm lint:fix                   # biome check --write
pnpm check                      # lint + typecheck + test
pnpm build                      # compile to dist/

Note: pnpm install without --ignore-scripts may fail because onnxruntime-node tries to download CUDA binaries in its postinstall. The ONNX runtime loads at runtime, not install time, so this is safe to skip.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@jim80net/memex-core

Install

Quick Start

Architecture

Key Concepts

EmbeddingProvider

SkillIndex

ScanDirs

MemexPaths

Scoring Modes

Frontmatter Extensions

Configuration

Sync

Case-insensitive project IDs (default)

Development

License