embed-cache

v0.1.2

Published

3 months ago

Content-addressable embedding cache with deduplication and TTL

Downloads

0High
0Medium
0Low

silupanda

embed-cache

Content-addressable embedding cache with deduplication, LRU eviction, TTL support, and batch optimization. Zero external runtime dependencies -- caller supplies the embedder function.

Description

Embedding API calls are the dominant ongoing cost in most RAG (Retrieval-Augmented Generation) pipelines. The same text is routinely embedded multiple times: documents are re-indexed on restart, chunked text reappears across overlapping documents, periodic re-indexing jobs sweep all content even when most of it has not changed, and parallel ingestion workers independently embed the same source files.

embed-cache wraps any embedding function with a transparent, content-addressable cache. Cache keys are derived from the text content itself (SHA-256 of normalized text + model ID), so identical text always hits the cache regardless of what called it or when. When text has not changed, the API is never called. When it has changed, only the changed text is re-embedded.

Key properties:

Content-addressable keys -- same text + same model always produces the same cache key.
Batch optimization -- embedBatch() collects all cache misses and makes a single embedder call.
Change detection -- track documents by ID and detect when content has changed before re-embedding.
Cost tracking -- hit rate, estimated tokens saved, and estimated dollar cost avoided.
LRU eviction -- configurable maximum cache size with least-recently-used eviction.
TTL expiry -- entries expire after a configurable time-to-live, per-entry or globally.
Zero runtime dependencies -- only uses Node.js built-in node:crypto. You bring your own embedder.

Installation

npm install embed-cache

Requires Node.js 18 or later.

Quick Start

import { createCache } from 'embed-cache';

const cache = createCache({
  embedder: async (texts) => {
    // Call OpenAI, Cohere, or any embedding API
    const resp = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: texts,
    });
    return resp.data.map((d) => d.embedding);
  },
  model: 'text-embedding-3-small',
  maxSize: 50_000,
  ttl: 60 * 60 * 1000, // 1 hour
});

// Single embed -- repeated calls never invoke the embedder twice for the same text
const vec = await cache.embed('Hello world');

// Batch embed -- collects all cache misses and makes ONE embedder call
const vecs = await cache.embedBatch(['Hello', 'World', 'Hello']);
// Only calls embedder with ['World'] if 'Hello' is already cached

// Check stats
const s = cache.stats();
console.log(s.hitRate);              // 0-1
console.log(s.tokensEstimatedSaved); // estimated tokens saved via cache hits
console.log(s.costEstimatedSaved);   // estimated USD saved

Features

Batch Optimization

embedBatch() separates hits from misses before calling the embedder:

Compute a content-addressable key for every text in the batch.
Look up all keys in the cache. Cached vectors are returned immediately.
Collect all misses into a single array.
Call embedder(missedTexts) once.
Store the new vectors and return all results in the original input order.

This minimizes API calls when a batch contains repeated or previously seen texts.

Change Detection

Track documents by ID so you can skip re-embedding when content has not changed:

await cache.trackDocument('doc-42', content);

// Later, check if the document has changed
if (await cache.hasChanged('doc-42', newContent)) {
  // Content changed -- re-embed
  await cache.trackDocument('doc-42', newContent);
  const vecs = await cache.embedBatch(chunks);
}

hasChanged() computes a SHA-256 hash of the content and compares it to the stored hash. For untracked documents, it returns true.

Text Normalization

Before computing cache keys, text is normalized to collapse cosmetic variations that produce identical embeddings:

Unicode NFC normalization
Trim leading and trailing whitespace
Collapse runs of internal whitespace to a single space

The normalized form is used only for key computation. The original text is passed to the embedder unchanged.

Normalization is enabled by default. Set normalizeText: false to disable it.

Model-Aware Keys

The model identifier is included in every cache key. Vectors from different models are never mixed. Changing the model option automatically separates the key namespace -- no explicit cache bust is required.

Known model aliases are canonicalized automatically:

| Input | Canonical form | |---|---| | text-embedding-3-small | openai/text-embedding-3-small | | text-embedding-3-large | openai/text-embedding-3-large | | text-embedding-ada-002 | openai/text-embedding-ada-002 | | embed-english-v3.0 | cohere/embed-english-v3.0 | | embed-multilingual-v3.0 | cohere/embed-multilingual-v3.0 |

Unknown model strings are lowercased and used as-is.

LRU Eviction

When the cache reaches maxSize, the least recently used entry is evicted to make room. Every cache hit promotes the accessed entry to the front of the LRU list. Eviction is O(1) via a doubly-linked list.

TTL Expiry

Entries expire lazily on access. When a cached entry is read after its TTL has elapsed, it is deleted and treated as a cache miss. TTL can be set globally via the ttl option or overridden per-call via EmbedOptions.

Cost Tracking

The cache estimates tokens saved on each hit using a character-to-token approximation (Math.ceil(text.length / 4)) and computes dollar cost avoided using the configured modelPricePerMillion.

Serialization

Export the entire cache state as a JSON string for persistence or transfer:

const data = cache.serialize();
// data is a JSON string: { entries: [...], model: "...", version: 1 }

API Reference

`createCache(options: EmbedCacheOptions): EmbedCache`

Factory function. Creates and returns a new EmbedCache instance.

import { createCache } from 'embed-cache';

const cache = createCache({
  embedder: myEmbedderFn,
  model: 'text-embedding-3-small',
});

Parameters:

| Parameter | Type | Required | Default | Description | |---|---|---|---|---| | options.embedder | EmbedderFn | Yes | -- | Function that accepts an array of texts and returns an array of embedding vectors. | | options.model | string | Yes | -- | Model identifier. Included in cache keys to namespace entries by model. | | options.ttl | number | No | undefined | Default time-to-live in milliseconds for all cache entries. | | options.maxSize | number | No | 10000 | Maximum number of cached entries. LRU eviction kicks in when this limit is reached. | | options.modelPricePerMillion | number | No | 0.1 | Price in USD per 1 million tokens. Used for cost savings estimation. | | options.algorithm | 'sha256' \| 'sha1' \| 'md5' | No | 'sha256' | Hash algorithm for cache key derivation. | | options.normalizeText | boolean | No | true | Whether to apply NFC normalization, trim, and whitespace collapsing before hashing. |

Returns: EmbedCache

`EmbedCache.embed(text: string, options?: EmbedOptions): Promise<number[]>`

Embed a single text string. Returns the embedding vector from the cache if available, otherwise calls the embedder, caches the result, and returns it.

const vector = await cache.embed('Hello world');

Parameters:

| Parameter | Type | Required | Default | Description | |---|---|---|---|---| | text | string | Yes | -- | The text to embed. | | options.ttl | number | No | global ttl | Override the default TTL for this specific entry. | | options.bypassCache | boolean | No | false | When true, skip the cache lookup and always call the embedder. The result is not stored in the cache. |

Returns: Promise<number[]> -- the embedding vector.

`EmbedCache.embedBatch(texts: string[], options?: EmbedOptions): Promise<number[][]>`

Embed multiple texts in a single call. Looks up all texts in the cache, collects misses, calls the embedder once for all misses, caches the results, and returns all vectors in the original input order.

const vectors = await cache.embedBatch(['Hello', 'World', 'Hello']);
// vectors[0] and vectors[2] are the same (both from 'Hello')
// The embedder was only called with the uncached texts

Parameters:

| Parameter | Type | Required | Default | Description | |---|---|---|---|---| | texts | string[] | Yes | -- | Array of texts to embed. | | options.ttl | number | No | global ttl | Override the default TTL for entries created by this call. | | options.bypassCache | boolean | No | false | When true, skip all cache lookups and call the embedder with all texts. |

Returns: Promise<number[][]> -- array of embedding vectors in the same order as the input texts.

`EmbedCache.hasChanged(docId: string, content: string): Promise<boolean>`

Check whether a tracked document's content has changed since it was last tracked.

const changed = await cache.hasChanged('doc-42', newContent);
// true if content differs from last trackDocument call, or if docId is untracked

Parameters:

| Parameter | Type | Required | Description | |---|---|---|---| | docId | string | Yes | Unique identifier for the document. | | content | string | Yes | Current content to compare against the stored hash. |

Returns: Promise<boolean> -- true if the content has changed or the document is untracked, false if the content matches.

`EmbedCache.trackDocument(docId: string, content: string): Promise<void>`

Record a document's content hash for future change detection via hasChanged().

await cache.trackDocument('doc-42', content);

Parameters:

| Parameter | Type | Required | Description | |---|---|---|---| | docId | string | Yes | Unique identifier for the document. | | content | string | Yes | Document content to hash and store. |

Returns: Promise<void>

`EmbedCache.stats(): CacheStats`

Return current cache statistics including hit rate, token savings, and cost savings.

const s = cache.stats();
console.log(s.hitRate);              // 0.75
console.log(s.tokensEstimatedSaved); // 12500
console.log(s.costEstimatedSaved);   // 0.0025

Returns: CacheStats object with the following fields:

| Field | Type | Description | |---|---|---| | totalRequests | number | Total number of embed/embedBatch lookups performed. | | hits | number | Number of cache hits. | | misses | number | Number of cache misses. | | hitRate | number | Ratio of hits to total requests (0 to 1). Returns 0 when no requests have been made. | | size | number | Current number of entries in the cache. | | tokensEstimatedSaved | number | Estimated total tokens saved via cache hits. | | costEstimatedSaved | number | Estimated USD saved, computed as tokensEstimatedSaved / 1_000_000 * modelPricePerMillion. | | model | string | The model identifier this cache was created with. | | createdAt | string | ISO 8601 timestamp of when the cache was created. |

`EmbedCache.serialize(): string`

Serialize the entire cache state to a JSON string. The output includes all cached entries, the model identifier, and a version field.

const json = cache.serialize();
// Store to disk, transfer to another environment, etc.

Returns: string -- JSON string with the structure:

{
  "entries": [
    { "key": "abc123...", "vector": [0.1, 0.2, ...] }
  ],
  "model": "text-embedding-3-small",
  "version": 1
}

`EmbedCache.clear(): void`

Remove all cached entries and reset all statistics.

cache.clear();
console.log(cache.size); // 0

`EmbedCache.size: number` (read-only)

The current number of entries in the cache.

console.log(cache.size); // 42

Types

`EmbedderFn`

type EmbedderFn = (texts: string[]) => Promise<number[][]>;

A function that accepts an array of text strings and returns a promise resolving to an array of embedding vectors. Each vector is a number[]. The returned array must have the same length as the input array, with vectors in corresponding order.

`EmbedCacheOptions`

interface EmbedCacheOptions {
  embedder: EmbedderFn;
  model: string;
  ttl?: number;
  maxSize?: number;
  modelPricePerMillion?: number;
  algorithm?: 'sha256' | 'sha1' | 'md5';
  normalizeText?: boolean;
}

`EmbedOptions`

interface EmbedOptions {
  ttl?: number;
  bypassCache?: boolean;
}

`CacheStats`

interface CacheStats {
  totalRequests: number;
  hits: number;
  misses: number;
  hitRate: number;
  size: number;
  tokensEstimatedSaved: number;
  costEstimatedSaved: number;
  model: string;
  createdAt: string;
}

`EmbedCache`

interface EmbedCache {
  embed(text: string, options?: EmbedOptions): Promise<number[]>;
  embedBatch(texts: string[], options?: EmbedOptions): Promise<number[][]>;
  hasChanged(docId: string, content: string): Promise<boolean>;
  trackDocument(docId: string, content: string): Promise<void>;
  stats(): CacheStats;
  serialize(): string;
  clear(): void;
  readonly size: number;
}

Configuration

Hash Algorithms

The algorithm option controls which hash function is used for cache key derivation:

| Algorithm | Key length | Speed | Collision resistance | |---|---|---|---| | sha256 (default) | 64 hex chars | Fast | Excellent -- no known collisions | | sha1 | 40 hex chars | Faster | Weak -- not recommended for adversarial inputs | | md5 | 32 hex chars | Fastest | Broken -- use only when speed matters more than security |

For virtually all use cases, the default sha256 is recommended. Hash computation for a 2 KB text chunk takes under 0.05ms.

Model Price Defaults

When modelPricePerMillion is not provided, it defaults to 0.1 USD per million tokens. For accurate cost tracking, provide the actual price for your model. Reference prices for common models:

| Model | Price per 1M tokens (USD) | |---|---| | text-embedding-3-small | $0.02 | | text-embedding-3-large | $0.13 | | text-embedding-ada-002 | $0.10 | | embed-english-v3.0 | $0.10 | | embed-multilingual-v3.0 | $0.10 |

LRU and TTL Interaction

When both maxSize and ttl are configured, both mechanisms are active independently. An entry can be evicted by LRU pressure (cache is full and the entry is the least recently used) or by TTL expiry (entry is older than its TTL). TTL expiry is lazy -- expired entries are only removed when accessed.

Error Handling

Embedder errors propagate. If the embedder function throws during embed() or embedBatch(), the error is propagated to the caller. Nothing is written to the cache for the failed call.
TTL expiry is transparent. Expired entries are silently removed on access and treated as cache misses. The embedder is called to produce a fresh vector.
LRU eviction is silent. When the cache is full, the least recently used entry is evicted without notification.

Advanced Usage

Bypass Cache for Specific Calls

Force a fresh embedding even when the text is cached:

const fresh = await cache.embed('Hello', { bypassCache: true });

Per-Entry TTL Override

Set a custom TTL for a specific embed call, overriding the global default:

// This entry expires in 5 seconds, regardless of the global TTL
const vec = await cache.embed('time-sensitive query', { ttl: 5000 });

Document Re-Indexing Pipeline

Combine change detection with batch embedding for efficient document re-indexing:

const cache = createCache({
  embedder: myEmbedder,
  model: 'text-embedding-3-small',
  modelPricePerMillion: 0.02,
});

for (const doc of documents) {
  if (await cache.hasChanged(doc.id, doc.content)) {
    const chunks = chunkDocument(doc.content);
    const vectors = await cache.embedBatch(chunks);
    await vectorStore.upsert(doc.id, chunks, vectors);
    await cache.trackDocument(doc.id, doc.content);
  }
}

console.log(cache.stats().costEstimatedSaved); // USD saved

Export and Restore Cache State

Serialize the cache for persistence or transfer between environments:

import { writeFileSync, readFileSync } from 'fs';

// Export
const data = cache.serialize();
writeFileSync('embedding-cache.json', data);

// The serialized format is a JSON string containing all entries,
// the model identifier, and a version field for forward compatibility.

Custom Embedder Functions

Any function matching the EmbedderFn signature works as an embedder:

import { createCache, type EmbedderFn } from 'embed-cache';

// OpenAI
const openaiEmbedder: EmbedderFn = async (texts) => {
  const resp = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts,
  });
  return resp.data.map((d) => d.embedding);
};

// Cohere
const cohereEmbedder: EmbedderFn = async (texts) => {
  const resp = await cohere.embed({
    model: 'embed-english-v3.0',
    texts,
    inputType: 'search_document',
  });
  return resp.embeddings;
};

// Local model (e.g., via HTTP)
const localEmbedder: EmbedderFn = async (texts) => {
  const resp = await fetch('http://localhost:8080/embed', {
    method: 'POST',
    body: JSON.stringify({ texts }),
    headers: { 'Content-Type': 'application/json' },
  });
  const json = await resp.json();
  return json.embeddings;
};

TypeScript

embed-cache is written in TypeScript with strict mode enabled. All public types are exported from the package entry point:

import {
  createCache,
  type EmbedderFn,
  type EmbedCacheOptions,
  type EmbedOptions,
  type CacheStats,
  type EmbedCache,
} from 'embed-cache';

Type declarations are included in the published package (dist/index.d.ts).

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

embed-cache

Description

Installation

Quick Start

Features

Batch Optimization

Change Detection

Text Normalization

Model-Aware Keys

LRU Eviction

TTL Expiry

Cost Tracking

Serialization

API Reference

createCache(options: EmbedCacheOptions): EmbedCache

EmbedCache.embed(text: string, options?: EmbedOptions): Promise<number[]>

EmbedCache.embedBatch(texts: string[], options?: EmbedOptions): Promise<number[][]>

EmbedCache.hasChanged(docId: string, content: string): Promise<boolean>

EmbedCache.trackDocument(docId: string, content: string): Promise<void>

EmbedCache.stats(): CacheStats

EmbedCache.serialize(): string

EmbedCache.clear(): void

EmbedCache.size: number (read-only)

Types

EmbedderFn

EmbedCacheOptions

EmbedOptions

CacheStats

EmbedCache

Configuration

Hash Algorithms

Model Price Defaults

LRU and TTL Interaction

Error Handling

Advanced Usage

Bypass Cache for Specific Calls

Per-Entry TTL Override

Document Re-Indexing Pipeline

Export and Restore Cache State

Custom Embedder Functions

TypeScript

License

`createCache(options: EmbedCacheOptions): EmbedCache`

`EmbedCache.embed(text: string, options?: EmbedOptions): Promise<number[]>`

`EmbedCache.embedBatch(texts: string[], options?: EmbedOptions): Promise<number[][]>`

`EmbedCache.hasChanged(docId: string, content: string): Promise<boolean>`

`EmbedCache.trackDocument(docId: string, content: string): Promise<void>`

`EmbedCache.stats(): CacheStats`

`EmbedCache.serialize(): string`

`EmbedCache.clear(): void`

`EmbedCache.size: number` (read-only)

`EmbedderFn`

`EmbedCacheOptions`

`EmbedOptions`

`CacheStats`

`EmbedCache`