@nodellmcache/embedding-cache

v1.0.0

Published

9 days ago

Vector embedding caching for NodeLLMCache with batch dedup and reuse tracking

Downloads

0High
0Medium
0Low

mdmax007

llm cache embeddings vector openai rag nodejs

@nodellmcache/embedding-cache

Vector embedding caching for NodeLLMCache. Embeddings are deterministic for a given (text, model) pair, so they're the highest-ROI thing to cache — and re-embedding is pure wasted spend. This cache eliminates it, with first-class batch support.

The storage backend is injected — use @nodellmcache/memory, @nodellmcache/redis, or any StorageAdapter.

Install

npm install @nodellmcache/embedding-cache @nodellmcache/memory @nodellmcache/core

Quick start

import OpenAI from 'openai'
import { EmbeddingCache } from '@nodellmcache/embedding-cache'
import { MemoryAdapter } from '@nodellmcache/memory'

const openai = new OpenAI()
const cache = new EmbeddingCache({ adapter: new MemoryAdapter() })

// Single embedding
const embedding = await cache.getOrGenerate(
  'semantic search query',
  () =>
    openai.embeddings
      .create({ model: 'text-embedding-3-small', input: 'semantic search query' })
      .then((r) => r.data[0]!.embedding),
  { provider: 'openai', model: 'text-embedding-3-small' },
)

// Batch — the generator only sees texts not already cached, deduplicated.
const texts = ['dog', 'cat', 'dog', 'bird', 'cat']
const embeddings = await cache.getBatch(
  texts,
  (uncached) =>
    openai.embeddings
      .create({ model: 'text-embedding-3-small', input: uncached })
      .then((r) => r.data.map((d) => d.embedding)),
  { provider: 'openai', model: 'text-embedding-3-small' },
)
// API called once for ['dog', 'cat', 'bird']; results returned in input order.

`getBatch` contract

The generator receives only uncached inputs, deduplicated, in first-seen order, and must return embeddings in that same order.
The returned array is aligned to the original inputs (duplicates included).
A wrong-length generator result throws ValidationError.

Keys

embedding:{provider}:{model}:{sha256(text)}. Pass dimensions to namespace models that support variable output sizes (e.g. text-embedding-3-*), so a 256-dim and a 1536-dim embedding of the same text never collide.

Stats

const { hits, misses, hitRate, entryCount, embeddingsReused, apiCallsAvoided } = await cache.stats()

embeddingsReused — embeddings served from cache (hits).
apiCallsAvoided — items requested − embeddings generated, capturing both cache reuse and intra-batch dedup.

Compact storage

Embeddings are stored as number[]. For a smaller footprint, give the cache a compression-enabled adapter — the 'embedding' cache type automatically selects the lz4 hint in @nodellmcache/memory:

new EmbeddingCache({ adapter: new MemoryAdapter({ compression: 'auto' }) })

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@nodellmcache/embedding-cache

Install

Quick start

getBatch contract

Keys

Stats

Compact storage

License

`getBatch` contract