npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

neurocache

v1.0.0

Published

Intelligent caching layer for LLM APIs - reduce costs and improve response times

Readme

NeuroCache

Intelligent caching layer for LLM APIs. Save money and improve response times by caching LLM completions.

npm version License: MIT TypeScript

Why?

LLM API calls are expensive and slow. If your application makes similar requests repeatedly, you're wasting money and time. NeuroCache sits between your app and the LLM provider, caching responses intelligently.

Features:

  • Context Intelligence Layer - Production-safe input optimization and deduplication
  • Deterministic caching with SHA-256 hashing
  • Automatic request deduplication for concurrent calls
  • Multiple storage backends (in-memory, file, Redis)
  • Full TypeScript support with strict typing
  • Provider-agnostic design (currently supports OpenAI)
  • Built-in metrics and cost tracking

Documentation

📖 Complete Documentation:


Installation

npm install neurocache

If you want to use Redis as your cache backend:

npm install redis

Quick Start

import { NeuroCache, OpenAIProvider, MemoryStore } from 'neurocache';

const cache = new NeuroCache({
  provider: new OpenAIProvider({
    apiKey: process.env.OPENAI_API_KEY
  }),
  store: new MemoryStore(),
  ttl: 3600
});

const response = await cache.generate({
  model: 'gpt-4',
  messages: [
    { role: 'user', content: 'What is the capital of France?' }
  ]
});

console.log(response.content); // "The capital of France is Paris."

// Same request again - served from cache instantly
const cached = await cache.generate({
  model: 'gpt-4',
  messages: [
    { role: 'user', content: 'What is the capital of France?' }
  ]
});

// Check your savings
const metrics = cache.getMetrics();
console.log(`Cache hit rate: ${metrics.cacheHits / metrics.totalRequests * 100}%`);
console.log(`Tokens saved: ${metrics.tokensSaved}`);
console.log(`Cost saved: $${metrics.estimatedCostSaved.toFixed(4)}`);

##rovider Options

OpenAI Provider

import { OpenAIProvider } from 'neurocache';

const provider = new OpenAIProvider({
  apiKey: 'your-api-key',
  organization: 'org-id',      // Optional
  baseURL: 'custom-url',       // Optional
  maxRetries: 3,               // Optional (default: 3)
  timeout: 60000               // Optional (default: 60000ms)
});

Store Options

Memory Store

Fast, in-memory cache. Perfect for development and short-lived processes.

import { MemoryStore } from 'neurocache';

const store = new MemoryStore();

File Sts

OpenAI (built-in):

const provider = new OpenAIProvider({
  apiKey: process.env.OPENAI_API_KEY,
  maxRetries: 3,
  timeout: 60000
});

Storage Backends

Memory - Fast, but doesn't persist:

const store = new MemoryStore();

File - Persists to disk:

const store = new FileStore('.cache/neurocache');

Redis - For distributed systems:

const store = new RedisStore({
  host: 'localhost',
  port: 6379,
  keyPrefix: 'neurocache:'
});

With NeuroCache (70% hit rate):

  • 30 API calls (70 cached)
  • Average latency: 600ms (70% faster)
  • Total cost: $0.18 (70% savings)
  • Total time: 60 seconds

Benchmark Results

Run the included benchmark:

npm run benchmark

Expected results:

  • Cache hit latency: <10ms In a typical scenario with repeated questions:
  • Without caching: 100 requests to GPT-4 = $0.60, ~200 seconds
  • With NeuroCache (70% hit rate): 30 API calls = $0.18, ~60 seconds

Cache hits are served in <10ms vs 500-2000ms for API calls.

##he.generate(sameRequest) );

// Only 1 API call is made, others wait for result const responses = await Promise.all(promises);


### 3. Development Testing

```typescript
const cache = new NeuroCache({
  provider: new OpenAIProvider({ apiKey }),
  store: new MemoryStore(),
  ttl: 300, // 5 minutes
  logging: true
});

// Iterate quickly without burning API credits

🧠 Context Intelligence Layer (Level 3)

Production-Safe: Optimizes INPUT context only, no partial response reuse.

Quick Enable

import { NeuroCache } from 'neurocache';

const cache = new NeuroCache({
  provider: new OpenAIProvider({ apiKey }),
  store: new MemoryStore(),
  
  // Enable Context Intelligence (production-safe)
  enableContextIntelligence: true,
  contextOptimizationStrategy: {
    enableDeduplication: true,      // Remove duplicate messages
    normalizeContent: true,         // Cleanup whitespace
    enableHistoryTrimming: false,   // Optional: trim old history
    preserveSystemMessages: true    // Never remove system messages
  },
  minOptimizationThreshold: 10      // Min tokens to trigger optimization
});

What It Does

Context Deduplication: Removes repeated messages automatically

// Input (5 messages with duplicates)
await cache.generate({
  messages: [
    { role: 'user', content: 'Hello' },
    { role: 'assistant', content: 'Hi' },
    { role: 'user', content: 'Hello' },     // Duplicate - removed!
    { role: 'assistant', content: 'Hi' },   // Duplicate - removed!
    { role: 'user', content: 'How are you?' }
  ]
});

// Actually sent to LLM (3 messages, duplicates removed)
// Result: Same quality response, fewer tokens used

Content Normalization: Cleans up whitespace for better cache hits

// These become equivalent after normalization:
"Hello   world"  →  "Hello world"
"Hello\n\nworld" →  "Hello world"

Safety Guarantees:

  • ✅ Deterministic (SHA-256 hashing)
  • ✅ No semantic changes to responses
  • ✅ Input optimization only (no response manipulation)
  • ✅ Graceful fallback on errors
  • ✅ System messages never removed

Results: 10-30% token savings on conversational workloads with duplicate context

👉 Full Documentation | Examples


📈 Metrics & Analytics

Get Metrics

const metrics = cache.getMetrics();

console.log(metrics.totalRequests);       // Total requests
console.log(metrics.cacheHits);           // Cache hits
console.log(metrics.cacheMisses);         // Cache misses
console.log(metrics.tokensSaved);         // Tokens saved
console.log(metrics.estimatedCostSaved);  // Money saved
console.log(metrics.averageLatencySaved); // Avg latency saved

// Context Intelligence metrics (if enabled)
console.log(metrics.duplicateMessagesRemoved);   // Messages deduplicated
console.log(metrics.totalInputTokensSaved);      // Tokens saved from optimization
console.log(metrics.optimizationsApplied);       // Number of optimizations applied

Get Summary

console.log(cache.getMetricsSummary());

Output:

NeuroCache Metrics Summary
==========================
Total Requests: 150
Cache Hits: 105 (70.00%)
Cache Misses: 45
Tokens Saved: 42,500
Cost Saved: $0.4250
Avg Latency Saved: 1,234.56ms
Provider Errors: 0
Store Errors: 0

Get Hit Rate

const hitRate = cache.getCacheHitRate(); // Returns percentage
console.log(`Hit rate: ${hitRate.toFixed(1)}%`);

🧪 Testing

npm test
npm run test:coverage

Advanced

Custom Provider - Implement LLMProvider:

class CustomProvider implements LLMProvider {
  async generate(request: GenerateRequest): Promise<GenerateResponse> {
    // your implementation
  }
  getProviderName(): string { return 'Custom'; }
  getModelName(request: GenerateRequest): string { return request.model; }
}

Custom Store - Implement CacheStore:

class CustomStore implements CacheStore {
  async get(key: string): Promise<CacheEntry | null> { /* ... */ }
  async set(key: string, value: CacheEntry, ttl?: number): Promise<void> { /* ... */ }
  async delete(key: string): Promise<void> { /* ... */ }
  async clear(): Promise<void> { /* ... */ }
  getName(): string { return 'Custom'; }
}

Cache Versioning - Invalidate all caches:

const cache = new NeuroCache({
  provider,
  store,
  version: 'v2' // bump version to invalidate old cache
});

API

Methods:

  • generate(request) - Generate completion with caching
  • getMetrics() - Get metrics snapshot
  • getMetricsSummary() - Formatted metrics string
  • getCacheHitRate() - Hit rate percentage
  • resetMetrics() - Reset all metrics
  • clearCache() - Clear all entries

Contributing

PRs welcome. Please add tests for new features.

License

MIT