npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

memflux

v1.0.0

Published

Intelligent AI response caching middleware — reduce costs and latency for LLM-powered applications

Readme

🤖 memflux

Semantic caching layer for OpenAI & Gemini APIs — save up to 70% on AI costs

npm version License: MIT TypeScript Node.js CI


The Problem

// Without memflux: every call costs money, even for identical questions
const a = await openai.chat.completions.create({ messages: [{ role: 'user', content: 'What is TypeScript?' }] });       // $0.002
const b = await openai.chat.completions.create({ messages: [{ role: 'user', content: 'Explain TypeScript to me' }] });  // $0.002 — same answer!
const c = await openai.chat.completions.create({ messages: [{ role: 'user', content: 'What is TS?' }] });               // $0.002 — again!

Regular caches require exact string matches. They can't help here because users always rephrase the same question differently.


The Solution

import OpenAI from 'openai';
import { aiCache } from 'memflux';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const ai = aiCache(openai);  // ← one line wraps the client

const a = await ai.chat('What is TypeScript?');      // → API call ($0.002)
const b = await ai.chat('Explain TypeScript to me'); // → CACHE HIT (free ✨)
const c = await ai.chat('What is TS?');              // → CACHE HIT (free ✨)

const stats = ai.getStats();
console.log(`Hit rate: ${stats.hitRate}%`);          // → "66.7%"
console.log(`Saved: $${stats.estimatedMoneySaved}`); // → "Saved: $0.004"

memflux converts every question into a semantic embedding vector and compares it against cached questions using cosine similarity. When the meaning is close enough (≥ 85% by default), it returns the cached answer instantly — no API call, no cost.


Installation

npm install memflux

# Optional: Redis persistence (for multi-process / production)
npm install memflux ioredis

# Optional: SQLite persistence (for single-server deployments)
npm install memflux better-sqlite3

Requirements: Node.js ≥ 18, OpenAI SDK ≥ 4 or @google/generative-ai ≥ 0.1


Quick Start

OpenAI

import OpenAI from 'openai';
import { aiCache } from 'memflux';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const ai = aiCache(openai, {
  similarity: {
    threshold: 0.85,       // 85% semantic match = cache hit
    algorithm: 'cosine',   // cosine | euclidean | dot-product
    topK: 5,               // check top 5 candidates
  },
  ttl: 3_600,              // cache entries expire after 1 hour
  debug: false,            // set to true for verbose logs
});

// Single question
const answer = await ai.chat('What is machine learning?');

// With options
const answer2 = await ai.chat('Summarise this article', {
  model: 'gpt-4o',
  temperature: 0.3,
  systemPrompt: 'You are a helpful assistant.',
  bypassCache: false,      // set to true to force a fresh API call
  ttl: 7200,               // per-request TTL override
});

// Multi-turn conversation
const answer3 = await ai.chatWithMessages([
  { role: 'system', content: 'You are an expert in TypeScript.' },
  { role: 'user',   content: 'What are generics?' },
]);

// Full metadata (cache hit? similarity score? latency?)
const result = await ai.chatDetailed('What is TypeScript?');
console.log(result.cached);            // true / false
console.log(result.similarityScore);   // 0.9234
console.log(result.latencyMs);         // 38 (cache hit) or 2847 (API call)
console.log(result.tokensSaved);       // 512

Google Gemini

import { GoogleGenerativeAI } from '@google/generative-ai';
import { aiCacheGemini } from 'memflux/gemini';

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_AI_API_KEY!);

const ai = aiCacheGemini(genAI, {
  defaultModel: 'gemini-2.0-flash',
  similarity: { threshold: 0.85, algorithm: 'cosine', topK: 3 },
  ttl: 86_400,
});

const answer = await ai.chat('Explain quantum computing');

Storage Backends

In-Memory (default — zero dependencies)

const ai = aiCache(openai);
// Data is lost on process restart. Perfect for development.

Redis (production — persistent, multi-process)

import { aiCache, createRedisStore } from 'memflux';

const ai = aiCache(openai, {
  store: createRedisStore({
    url: process.env.REDIS_URL,   // default: redis://localhost:6379
    ttl: 86_400,
    keyPrefix: 'myapp:',          // namespace to avoid key collisions
  }),
});

SQLite (single-server persistent)

import { aiCache, createSQLiteStore } from 'memflux';

const ai = aiCache(openai, {
  store: createSQLiteStore({
    path: './cache.db',
    ttl: 7 * 86_400,  // 7 days
  }),
});
// Cache survives process restarts!

Express.js Middleware

Drop-in middleware for any Express route that accepts a question and returns an AI answer:

import express from 'express';
import OpenAI from 'openai';
import { aiCache, aiCacheMiddleware, aiCacheStatsHandler } from 'memflux';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const ai = aiCache(openai);
const app = express();
app.use(express.json());

// POST /chat with body: { "message": "your question" }
app.post('/chat', aiCacheMiddleware({
  cache: ai,
  includeStats: true,           // include stats in every response
  onHit:  (req, res) => console.log('CACHE HIT!'),
  onMiss: (req, res) => console.log('Cache miss'),
}));

// GET /cache/stats
app.get('/cache/stats', aiCacheStatsHandler(ai));

Response headers automatically added:

X-Cache: HIT
X-Cache-Hit-Rate: 66.7%
X-Cache-Similarity: 0.9234
X-Cache-Latency-Ms: 38
X-Money-Saved: $0.0042

Fastify Plugin

import Fastify from 'fastify';
import OpenAI from 'openai';
import { aiCache } from 'memflux';
import { aiCacheFastifyPlugin } from 'memflux/middleware';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const ai = aiCache(openai);
const app = Fastify({ logger: true });

await app.register(aiCacheFastifyPlugin, {
  cache: ai,
  prefix: '/api',              // registers POST /api/chat, GET /api/cache/stats
  exposeAdminRoutes: true,
});

await app.listen({ port: 3000 });

Configuration Reference

const ai = aiCache(openai, {
  // ── Model ──────────────────────────────────────────────────
  defaultModel: 'gpt-4o-mini',    // Default AI model for completions

  // ── Embedding ──────────────────────────────────────────────
  embedding: {
    model: 'text-embedding-3-small', // Embedding model
    dimensions: 1536,                // Vector dimensions (256–1536)
    batchSize: 100,                  // Texts per batch embedding request
    internalCacheSize: 5_000,        // In-memory embedding cache size
  },

  // ── Similarity ─────────────────────────────────────────────
  similarity: {
    algorithm: 'cosine',   // 'cosine' | 'euclidean' | 'dot-product'
    threshold: 0.85,       // 0.0 = match everything | 1.0 = exact match only
    topK: 5,               // Evaluate top K candidates
  },

  // ── Cache ──────────────────────────────────────────────────
  ttl: 86_400,             // Default TTL in seconds (0 = never expire)
  maxCacheSize: 10_000,    // Max entries before LRU eviction
  namespace: 'memflux',  // Key prefix for Redis/SQLite

  // ── Analytics ─────────────────────────────────────────────
  analytics: {
    enabled: true,
    trackCostSavings: true,
    costPerToken: 0.0000006,         // Adjust to match your model's pricing
    maxRecords: 10_000,
  },

  // ── Store ─────────────────────────────────────────────────
  store: createRedisStore(...),   // Optional custom backing store

  // ── Debug ─────────────────────────────────────────────────
  debug: false,            // Enable verbose stderr logging
});

Threshold Guide

| Threshold | Behaviour | Use case | |-----------|-----------|----------| | 0.70 | Very aggressive caching | High-volume FAQ bots | | 0.80 | Loose matching | Customer support, search | | 0.85 | Default — good balance | General use | | 0.90 | Strict matching | Legal / medical accuracy | | 0.95 | Near-exact only | Code generation |


Statistics & Analytics

const stats = ai.getStats();

stats.totalRequests       // 1000
stats.cacheHits           // 650
stats.cacheMisses         // 350
stats.hitRate             // 65.0 (%)
stats.totalTokensSaved    // 97500
stats.estimatedMoneySaved // 0.0585 ($)
stats.averageLatencyMs.hits   // 38 ms
stats.averageLatencyMs.misses // 2847 ms
stats.topQueries          // [{query, hitCount, savedCost}]
stats.since               // '2025-01-01T00:00:00.000Z'

Cost Projection

import { CostCalculator } from 'memflux';

const calc = new CostCalculator();
const savings = calc.projectMonthlySavings(
  10_000,   // daily requests
  0.60,     // expected 60% hit rate
  0.002,    // $0.002 average cost per uncached request
);

console.log(calc.formatSavings(savings));
// Monthly spend (no cache): $6000.00
// Monthly spend (with cache): $2400.00
// Monthly savings: $3600.00 (60%)
// Annual savings: $43200.00

Cache Management

// Flush all entries + reset statistics
await ai.flush();

// Pre-populate embeddings for anticipated queries (no AI completion calls)
await ai.warmUp([
  'What are your business hours?',
  'How do I reset my password?',
  'What is your refund policy?',
]);

// Direct store access
const size = await ai.store.size();
await ai.store.delete('specific-entry-id');
await ai.store.clear();

How It Works

User question
    │
    ▼
 Normalise text
 (lowercase, trim, remove punctuation)
    │
    ▼
 Embed text via OpenAI text-embedding-3-small
 → 1536-dimensional float vector
    │
    ▼
 Compare against all cached vectors
 using Cosine Similarity
    │
    ├── Score ≥ threshold? ──→ Return cached response (FREE ✨)
    │                           ~38ms avg latency
    │
    └── Score < threshold? ──→ Call AI API (~2000ms)
                                Store response + embedding
                                Return fresh response

Why cosine similarity?

The cosine of the angle between two vectors measures directional similarity — it's sensitive to what the vectors represent (meaning), not how long they are (magnitude). Two different phrasings of the same question will have very similar embedding directions, giving a high cosine score regardless of how the sentence is written.


Performance

| Operation | Time | Notes | |-----------|------|-------| | Cache hit (memory store) | ~5–50ms | Embedding lookup + similarity search | | Cache hit (Redis store) | ~10–80ms | Network round-trip included | | Cache miss | ~1000–5000ms | Full AI API call | | Embedding generation | ~50–150ms | Cached internally after first call |

Throughput (Apple M2, 10,000 cache entries, 1536 dimensions):

  • cosine similarity: ~150,000 comparisons/second
  • cosineSimilarityFast (Float32): ~200,000 comparisons/second

For very large caches (>100k entries), consider upgrading to a vector database (Pinecone, Weaviate) with ANN indexing for sub-10ms search.


File Structure

memflux/
├── src/
│   ├── index.ts                       ← Public API entry point
│   ├── types/                         ← TypeScript interfaces
│   │   ├── cache.types.ts
│   │   ├── config.types.ts
│   │   └── provider.types.ts
│   ├── config/
│   │   ├── default.config.ts          ← Sensible defaults
│   │   └── config.validator.ts        ← Input validation
│   ├── cache/
│   │   └── cache.manager.ts           ← Core orchestration logic
│   ├── embeddings/
│   │   └── embedding.service.ts       ← OpenAI embeddings + internal LRU cache
│   ├── similarity/
│   │   ├── cosine.similarity.ts       ← Cosine algorithm (standard + fast)
│   │   ├── euclidean.similarity.ts    ← Euclidean distance algorithm
│   │   ├── dot-product.similarity.ts  ← Dot-product algorithm
│   │   └── similarity.engine.ts       ← Algorithm selector + top-K search
│   ├── storage/
│   │   ├── memory.store.ts            ← In-memory LRU store (default)
│   │   ├── redis.store.ts             ← Redis persistent store
│   │   └── sqlite.store.ts            ← SQLite persistent store
│   ├── analytics/
│   │   ├── stats.tracker.ts           ← Hit/miss statistics
│   │   └── cost.calculator.ts         ← Cost projections + model pricing
│   ├── middleware/
│   │   ├── express.middleware.ts      ← Express.js integration
│   │   └── fastify.middleware.ts      ← Fastify plugin
│   ├── adapters/
│   │   └── gemini.adapter.ts          ← Google Gemini support
│   └── utils/
│       ├── hash.utils.ts              ← SHA-256 ID generation
│       ├── logger.ts                  ← Zero-dependency logger
│       └── token.counter.ts           ← Lightweight token estimator
├── tests/
│   ├── unit/                          ← Pure unit tests (no API calls)
│   └── integration/                   ← Tests with mocked OpenAI client
├── examples/                          ← Runnable code examples
├── benchmarks/                        ← Performance benchmarks
└── .github/workflows/                 ← CI/CD pipelines

Supported Models

| Provider | Completion Models | Embedding Model | |----------|-----------------|-----------------| | OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo | text-embedding-3-small ✓ | | Google | gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash | text-embedding-004 ✓ |


FAQ

Q: Does it work across different languages? A: Yes. The embedding model captures semantic meaning across Arabic, English, French, Spanish, Chinese, etc. A question in Arabic and the same question in English will often produce a cache hit.

Q: What happens if two questions have the same embedding by coincidence? A: True embedding collisions are astronomically unlikely with 1536-dimensional vectors. In practice, you will never see a false positive that isn't semantically related.

Q: Can I use it with streaming responses? A: Currently, memflux buffers the full response before caching. Streaming is on the roadmap.

Q: Is it thread-safe? A: The in-memory store is not safe across multiple Node.js processes. Use Redis for multi-process deployments.

Q: Can I bring my own vector store (Pinecone, Weaviate)? A: Yes — implement the CacheStore interface and pass it as store in the config.


Contributing

git clone https://github.com/Brah-Timo/memflux.git
cd memflux
npm install
npm test            # run tests
npm run build       # build the package
npm run benchmark   # run benchmarks

License

MIT — Copyright © 2026 TIMSoftDZ memflux contributors