npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@kognitivedev/memory-bench

v0.1.6

Published

Dedicated benchmark framework for long-term memory systems

Readme

@kognitivedev/memory-bench

Benchmark harness for long-term memory systems.

@kognitivedev/memory-bench evaluates memory-backed agents and runtimes against normalized datasets and LongMemEval-style fixtures without coupling the benchmark flow to one database or one app.

Quick Start · Why This Package Exists · How It Works · Adapters · Storage · Outputs · Latest Results

Why This Package Exists

Memory benchmarks are usually the first place architecture starts leaking:

  • benchmark code reaches directly into app tables
  • scoring logic gets mixed with storage cleanup
  • reports are hard to reproduce
  • migration/setup steps become destructive

This package splits those concerns cleanly:

  • benchmark core owns loading, execution, official evaluation, and report writing
  • integration adapters own app-specific cleanup, ingestion, and answering hooks
  • artifact persistence can run through the shared storage abstraction

What You Get

  • dataset loaders for normalized JSON and LongMemEval-style fixtures
  • official-style LongMemEval judging
  • KognitiveMemoryBenchAdapter for memory runtimes built on @kognitivedev/memory
  • Markdown, JSON, and JSONL report outputs
  • optional benchmark artifact persistence through @kognitivedev/storage

Quick Start

Install the package:

bun add @kognitivedev/memory-bench

Run a benchmark:

import { runMemoryBenchmark } from "@kognitivedev/memory-bench";

const report = await runMemoryBenchmark({
  projectId: "project-uuid",
  dataset,
  adapter,
  consolidationMode: "before-question",
});

How It Works

The runner executes this flow for each case:

  1. reset case state
  2. ingest each session
  3. optionally consolidate after each session
  4. optionally consolidate before the final question
  5. answer the benchmark question
  6. run official evaluation
  7. write reports and benchmark artifacts

That keeps the evaluation loop stable while letting each system supply its own ingestion and retrieval behavior.

Kognitive Adapter

KognitiveMemoryBenchAdapter is the reusable adapter for Kognitive-style runtimes.

It expects an injected runtime:

import { KognitiveMemoryBenchAdapter } from "@kognitivedev/memory-bench";

const adapter = new KognitiveMemoryBenchAdapter({
  runtime: {
    processMemoryJob: (userId, projectId, sessionId) =>
      memoryService.processMemoryJob(userId, projectId, sessionId),
    logConversation: (log) => memoryService.logConversation(log),
    getSnapshot: (userId, projectId) =>
      memoryService.getSnapshot(userId, projectId),
  },
  hooks: {
    resetCase: async ({ projectId, userId, caseId }) => {
      // clear memory and session artifacts for this case
    },
    persistSession: async ({ projectId, userId, caseId, session }) => {
      // optional session persistence for consolidation inputs
    },
    consolidate: async ({ projectId, userId, caseId }) => {
      // optional cross-session consolidation
    },
    extractTopicMemories: async ({ projectId, userId, caseId, session }) => {
      // optional topic-memory extraction
    },
    buildTopicContext: async ({ projectId, userId, caseId, question }) => {
      return "";
    },
  },
});

This is deliberate. The benchmark package owns the flow. The app owns environment-specific hooks.

Artifact Storage

The runner can persist benchmark artifacts through @kognitivedev/storage.

import { InMemoryStorageBackend } from "@kognitivedev/storage";
import { runMemoryBenchmark } from "@kognitivedev/memory-bench";

const artifactStorage = new InMemoryStorageBackend();

const report = await runMemoryBenchmark({
  projectId: "project-uuid",
  dataset,
  adapter,
  artifactStorage,
  artifactRunId: "demo-run",
});

Collections written by the runner:

  • memory_bench_runs
  • memory_bench_case_results

This is the right benchmark boundary: storage-backed persistence without binding the benchmark core to one database implementation.

Outputs

Each benchmark run writes:

  • report.json for the full structured report
  • predictions.jsonl for official evaluator input
  • official-evaluation.json when official evaluation is enabled
  • README.md as a human-readable summary

Running From This Repo

Backend composition currently lives in /Users/vserifsaglam/work/memory-experiment/apps/backend/scripts/run-memory-benchmark.ts.

From /Users/vserifsaglam/work/memory-experiment/apps/backend:

bun run db:migrate
bun run benchmark:memory -- \
  --project <project-id-or-slug> \
  --dataset longmemeval \
  --input ../../benchmarks/memory/fixtures/longmemeval-sample.json

Useful flags:

  • --fast forces consolidation=never
  • --consolidation never|after-session|before-question
  • --concurrency <n>
  • --judge-concurrency <n>
  • --limit <n>
  • --judge-model <model-id>
  • --model <model-id>
  • --output <dir>

Publishable Results Guidance

  • official-evaluation.json is the primary benchmark artifact.
  • predictions.jsonl is the handoff format for external reproduction.
  • Official evaluation requires the configured judge model through OpenRouter.

Safe Setup

When you are running benchmarks against a real local database, prefer migrations:

bun run db:migrate

Do not use db:push as the default benchmark setup path when preserving existing data matters.

Latest Results

Latest checked-in smoke run:

| Dataset | Adapter | Model | Consolidation | Cases | Official Accuracy | Avg Latency | | -------------------- | ------------------ | -------------------- | ----------------- | ----: | ----------------: | ----------: | | longmemeval-sample | kognitive-direct | x-ai/grok-4.1-fast | before-question | 2 | 1.000 | 438.0 ms |

Report:

Related Paths