npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@srk0102/engram

v0.2.0

Published

Primitives for pattern-cached LLM decisions on any Postgres backend. Bring your own brain, prompt, schema, and cache key. Engram does storage, retrieval, and caching.

Downloads

140

Readme

@engram/core

Primitives for pattern-cached LLM decisions on Postgres.

You bring the brain, the prompt, the schema, and the cache key. Engram gives you storage, retrieval, and caching on any Postgres.

No opinions on what "allow" or "block" means. No hardcoded decision enum. No middleware that sends 403s behind your back. Just four functions and a migrator.

record()    feed any signal
retrieve()  pull context fast
decide()    cacheKey hit -> return; miss -> brain -> validate -> store
feedback()  raise or lower a cached decision's confidence

Install

npm install @engram/core pg

pg is a peer dependency so your app controls the driver version.


60-second example (Ollama)

import { Engram, OllamaAdapter } from "@engram/core";

const engram = new Engram({
  connectionString: process.env.DATABASE_URL,
  namespace: "my_app",
});
await engram.connect(); // runs migrations idempotently

const brain = new OllamaAdapter({ model: "llama3.2" });

const schema = {
  parse(x: unknown) {
    const o = x as { action?: string };
    if (o?.action !== "allow" && o?.action !== "block") throw new Error("bad");
    return { action: o.action };
  },
};

const result = await engram.decide({
  input: { userId: "u_1", route: "/api/charge", amount: 9.99 },
  brain,
  prompt: (_ctx, i) =>
    `Classify this API call. Return JSON {"action":"allow"|"block"}.\n${JSON.stringify(i)}`,
  schema,
  cacheKey: (i) => `${i.route}:${i.amount < 100 ? "small" : "big"}`,
});

// result.source === "brain"  on first call (slow, calls Ollama)
// result.source === "cache"  on second call (fast, no LLM)

Two calls with the same cacheKey hit the same cache row in the same namespace. That's the whole trick.


The four primitives

record(input)

Append an event. Cheap, append-only. Used later by retrieve().

await engram.record({
  userId: "u_1",
  eventType: "api_call",
  metadata: { path: "/charge", status: 200 },
  sessionId: "sess_abc",      // optional
  ipRegion: "us-west",        // optional
  uaClass: "browser",         // optional
});

retrieve(input)

Pull recent events for a user, optionally with a rolling summary.

const ctx = await engram.retrieve({
  userId: "u_1",
  lookback: "7d",             // or ms number
  eventTypes: ["api_call"],   // optional filter
  limit: 200,
  aggregate: true,            // include per-event-type summary
});
// ctx.events   -> raw rows
// ctx.summary  -> { api_call: { count, avg_per_day, first_seen, ... } }

decide(input)

The full loop: lookup by cacheKey first; on a miss, call the brain, validate with the schema, and store the result under the cacheKey.

const r = await engram.decide({
  input: { userId, route, amount },
  brain,                          // any Brain
  prompt: (ctx, input) => "...",  // returns string
  schema,                         // any { parse(unknown): T }
  cacheKey: (input, ctx) => "..." // returns string
  // optional:
  namespace: "other_app",         // override instance default
  cacheThreshold: 0.5,            // min confidence for a hit
  context: customCtx,             // skip auto-retrieve
  autoContext: false,             // disable auto-retrieve
  brainOptions: { maxTokens: 256, temperature: 0 },
});
// -> { decision, source: "cache" | "brain", cacheId, cacheKey,
//      confidence, hitCount, latencyMs }

Auto-context: when input.userId is a string and you don't pass context, engram calls retrieve({ userId, lookback: "30d" }) for you and hands that to prompt().

feedback(cacheId, wasCorrect)

Raise or lower a cached decision's confidence. Decisions below 0.2 get evicted. No external ML, just a confidence decay/reward scheme.

await engram.feedback(r.cacheId, true);   // worked as expected
await engram.feedback(r.cacheId, false);  // false positive/wrong

Bring your own schema

Any object with .parse(unknown) => T is accepted. This means Zod, Valibot, Yup, or a hand-rolled validator all work with no adapter.

// Zod
import { z } from "zod";
const schema = z.object({ action: z.enum(["allow", "block", "review"]) });

// Hand-rolled
const schema = {
  parse(x: unknown) {
    if (typeof (x as any)?.action !== "string") throw new Error("bad");
    return x as { action: string };
  },
};

The validated value is what gets cached, so next time it comes out of cache already shaped correctly — no re-parsing cost on hits.


Bring your own brain

A Brain is anything with call(prompt, opts?) => Promise<string>. Three adapters ship in the package:

import { OllamaAdapter, AnthropicAdapter, OpenAIAdapter } from "@engram/core";

const local  = new OllamaAdapter({ model: "llama3.2" });
const claude = new AnthropicAdapter({
  model: "claude-haiku-4-5",
  // apiKey: "...",  // or ANTHROPIC_API_KEY env
});
const gpt    = new OpenAIAdapter({
  model: "gpt-4o-mini",
  // apiKey: "...",  // or OPENAI_API_KEY env
  jsonMode: true,   // default, forces response_format json_object
});

Point OpenAIAdapter.url at any OpenAI-compatible server (vLLM, Groq, LM Studio, OpenRouter, Azure) and it works.

Or just pass a function:

const brain = { async call(prompt: string) { return '{"action":"allow"}'; } };

Express middleware

Unopinionated: it attaches the decision to req.engram and calls next(). It does not send 403 or 429 for you. The route handler or the onDecision hook owns the response.

import express from "express";
import { Engram, OllamaAdapter } from "@engram/core";

const app = express();
const engram = new Engram({ connectionString: process.env.DATABASE_URL });
await engram.connect();

app.use(engram.express({
  buildInput: (req) => ({
    userId: req.headers["x-user-id"] as string,
    route: req.path,
    method: req.method,
  }),
  brain: new OllamaAdapter({ model: "llama3.2" }),
  prompt: (_ctx, i) => `Classify ${JSON.stringify(i)} as JSON {"action":"..."}`,
  schema: /* your schema */,
  cacheKey: (i) => `${i.route}:${i.method}`,
  recordAs: "api_call",       // optional: records an event first
  onDecision: (d, _req, res) => {
    if ((d.decision as any).action === "block") {
      res.status(403).json({ error: "blocked" });
    }
  },
  failOpen: true,             // default: on engram error, call next()
}));

app.get("/charge", (req, res) => {
  // req.engram is populated with the decision + metadata
  res.json({ ok: true, via: req.engram?.source });
});

Import from the subpath to skip bundling the Engram class if you already have an instance:

import { engramExpress } from "@engram/core/express";
app.use(engramExpress(engram, { /* options */ }));

Fastify middleware

Same shape as the Express one. It's a preHandler that attaches req.engram and returns. If your onDecision hook calls reply.code(...).send(...), the handler short-circuits.

import Fastify from "fastify";
import { Engram, OllamaAdapter } from "@engram/core";
import { engramFastify } from "@engram/core/fastify";

const app = Fastify();
const engram = new Engram({ connectionString: process.env.DATABASE_URL });
await engram.connect();

app.addHook("preHandler", engramFastify(engram, {
  buildInput: (req) => ({
    userId: req.headers["x-user-id"] as string,
    route: req.url,
    method: req.method,
  }),
  brain: new OllamaAdapter({ model: "llama3.2" }),
  prompt: (_ctx, i) => `Classify ${JSON.stringify(i)} as JSON {"action":"..."}`,
  schema: /* your schema */,
  cacheKey: (i) => `${i.route}:${i.method}`,
}));

app.get("/charge", (req, reply) => {
  reply.send({ ok: true, via: req.engram?.source });
});

CacheKey helpers

cacheKey is the load-bearing function of the whole library. Get it wrong and every request pays the LLM cost. These helpers turn continuous / free-form signals into stable coarse labels that cluster similar inputs.

import { bucket, bucketEnum } from "@engram/core";

const AMOUNT_EDGES  = [50, 500, 5000] as const;
const AMOUNT_LABELS = ["tiny", "small", "medium", "large"] as const;

const AGE_EDGES     = [1, 30, 365] as const;
const AGE_LABELS    = ["brand_new", "new", "established", "veteran"] as const;

const PAYMENT_METHODS = ["card", "paypal", "bank"] as const;

function checkoutKey(i: {
  amount: number; account_age_days: number;
  payment_method: string; device_type: string;
}) {
  return [
    "checkout",
    bucket(i.amount, AMOUNT_EDGES, AMOUNT_LABELS),
    bucket(i.account_age_days, AGE_EDGES, AGE_LABELS),
    bucketEnum(i.payment_method, PAYMENT_METHODS),  // unknown -> "other"
    i.device_type,
  ].join(":");
}

checkoutKey({ amount: 9.99,  account_age_days: 800, payment_method: "card",   device_type: "web" });
// -> "checkout:tiny:veteran:card:web"
checkoutKey({ amount: 8.50,  account_age_days: 820, payment_method: "card",   device_type: "web" });
// -> "checkout:tiny:veteran:card:web"   (same row -- shared LLM decision)
checkoutKey({ amount: 6500,  account_age_days: 0,   payment_method: "crypto", device_type: "mobile" });
// -> "checkout:large:brand_new:other:mobile"

bucket() is (value, edges, labels) => label. labels.length must be edges.length + 1. bucketEnum() is (value, allowed, fallback?) => label — snaps unknown values to "other" (or any fallback you pass).


Bound policies (skip the 5-arg call site)

Rebinding brain + prompt + schema + cacheKey on every decide() call gets old. engram.policy() captures them once and returns a function that takes only input (and an optional context).

const classifyCheckout = engram.policy({
  brain: new OllamaAdapter({ model: "llama3.2" }),
  prompt: (_ctx, i) => `... ${JSON.stringify(i)}`,
  schema: /* ... */,
  cacheKey: checkoutKey,
  namespace: "billing",     // all options from decide() are accepted here
});

// Everywhere else, the call site is just (input):
const r = await classifyCheckout({ userId, amount: 9.99, account_age_days: 800, ... });
// -> { decision, source: "cache" | "brain", cacheId, ... }

// Override context for a single call:
const r2 = await classifyCheckout(input, preBuiltContext);

Debugging a wrong decision

Set ENGRAM_DEBUG=true before your process starts. Engram will log the prompt it sent, the raw brain response, and the string extractJson is parsing. Use this when the LLM emits junk you didn't expect or the schema parser throws.

ENGRAM_DEBUG=true node server.js
[engram.debug] prompt sent to brain:
Classify this API call ...
[engram.debug] raw brain response:
<|channel>thought...<channel|>{"action":"allow"}
[engram.debug] extractJson cleaned input:
{"action":"allow"}

For a known-bad cache entry, evict it:

await engram.feedback(cacheId, false);   // decays confidence
await engram.feedback(cacheId, false);   // drops below 0.2 -> evicted

// or the hammer:
await pool.query("delete from engram.cache where id = $1", [cacheId]);

Cold-start warning

The first LLM call after a model is loaded into memory is slow, because the model loads before it runs. The numbers we saw for gemma4:e4b on an RTX 5060 Ti 16GB with Docker GPU passthrough:

| situation | latency | |---|---| | First call, model not in VRAM | ~120 seconds | | Second call, model warm | ~1.2 seconds | | Cache hit after brain miss | ~4 ms |

The warm-call latency is what your users actually feel on a miss. The cold-start hits only the very first request after a server boot or a model swap. Three ways to avoid waking up the first user with a 2-minute wait:

1. Pre-load at Ollama start. Tell Ollama which model to keep hot:

# docker-compose.yml
ollama:
  image: ollama/ollama:latest
  environment:
    OLLAMA_PRELOAD_MODELS: gemma4:e4b
    OLLAMA_KEEP_ALIVE: "24h"   # keep model loaded between requests
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]

2. Warm-up health check. Have your app hit the model once on startup before accepting traffic, so the first real request is already warm:

# docker-compose.yml
ollama:
  healthcheck:
    test: ["CMD-SHELL", "curl -fsS -X POST http://localhost:11434/api/generate -d '{\"model\":\"gemma4:e4b\",\"prompt\":\"ok\",\"stream\":false}' || exit 1"]
    interval: 10s
    timeout: 180s   # generous for the initial model load
    retries: 3
    start_period: 180s

3. Skip the cold-start entirely. For production, point the brain at a hosted Anthropic / OpenAI endpoint — no model loading, consistent per-request latency.


Configuration

new Engram({
  connectionString: "postgresql://...",
  // OR:
  pool: existingPgPool,
  namespace: "my_app",    // default "default"
  autoMigrate: true,      // default true
  logger: (m, meta) => console.log(m, meta),
});
  • If you pass a pool, engram borrows it and does not close it.
  • If you pass a connectionString, engram creates the pool and will close it on engram.close().
  • Migrations live in the package's migrations/ directory. Each file is hashed; drift between your DB and the shipped file throws EngramMigrationDriftError on connect().

Observability

await engram.stats();       // { total_cached, avg_confidence, total_hits }
await engram.listCache({ namespace: "my_app", limit: 50 });
await engram.dashboard();   // all-in-one snapshot
await engram.userBaseline("u_1");
await engram.detectAnomaly("u_1", { api_call: 80 });

What Engram is not

  • Not a bot-detection product. There is no classify(), no built-in fraud heuristics, no hardcoded decision types.
  • Not an LLM framework. No chains, no agents, no tools.
  • Not a rate-limiter. Decide returns whatever your brain returned; you decide what it means.

It is a durable, typed cache keyed on developer-supplied strings, sitting in front of an LLM call, on Postgres.


License

MIT