searchfn
v0.2.0
Published
IndexedDB-backed full-text search engine with hybrid in-memory caching and FlexSearch-compatible adapters.
Maintainers
Readme
searchfn
searchfn is a browser-first full-text search engine that keeps its index in IndexedDB while caching hot postings in memory. It ships a FlexSearch-compatible adapter so existing applications can migrate without rewriting worker pipelines.
Features
- 🗄️ Persistent Storage - IndexedDB-backed index with automatic persistence
- ⚡ In-Memory Cache - LRU caching for hot postings and vectors
- 🧠 In-Memory Variant - Lightweight
InMemorySearchFnfor ephemeral data - 🔄 Full CRUD Support - Add, update, remove, and retrieve documents
- 🔍 BM25 Scoring - Relevance-based ranking with configurable pipeline
- 🧵 Worker-Ready - Snapshot export/import for cross-thread usage
- 🔌 FlexSearch Compatible - Drop-in replacement for existing FlexSearch code
Installation
npm install searchfnQuick Start
import { SearchFn } from "searchfn";
const engine = new SearchFn({
name: "demo",
fields: ["title", "body"]
});
await engine.add({
id: "doc-1",
fields: {
title: "Hybrid search",
body: "IndexedDB persistence with in-memory caching"
},
store: { url: "/docs/hybrid-search" }
});
const hits = await engine.searchDetailed("hybrid", { includeStored: true });
console.log(hits[0]);
// { docId: "doc-1", score: 1.23, document: { url: "/docs/hybrid-search" } }
await engine.destroy();Core API
SearchFn
Constructor
const engine = new SearchFn({
name: "my-index", // Index name (required)
fields: ["title", "body"], // Searchable fields (required)
pipeline: { // Optional tokenization config
language: "en", // Language for stop words/stemming (en/es/fr, default: en)
enableStemming: true, // Enable Porter stemming (default: false)
stopWords: new Set([...]) // Custom stop words (overrides language-based defaults)
},
cache: { // Optional cache sizes
terms: 2048, // LRU cache for term postings (default: 2048)
vectors: 512 // LRU cache for document vectors (default: 512)
},
storage: { // Optional IndexedDB config
dbName: "custom-db", // Database name (default: searchfn-{name})
version: 1, // Schema version (default: 1)
chunkSize: 256 // Postings chunk size (default: 256)
}
});Adding Documents
await engine.add({
id: "doc-1", // string | number (required)
fields: { // Indexed fields (required)
title: "Getting Started",
body: "Full-text search guide"
},
store: { // Optional metadata (not indexed, but retrievable)
url: "/guide",
tags: ["tutorial", "docs"]
}
});Bulk Indexing (Performance Optimization)
For indexing large datasets, use batched persistence to achieve 15-30x speedup:
Option 1: Manual Flush (fine-grained control):
// Accumulate changes in memory
for (const doc of documents) {
await engine.add({
id: doc.id,
fields: doc.fields
}, { persist: false }); // Skip immediate persistence
}
// Single batch persist at the end
await engine.flush();Option 2: Bulk Add (convenience method):
await engine.addBulk(documents, {
batchSize: 1000, // Progress callback interval (optional)
onProgress: (indexed, total) => {
console.log(`Indexed ${indexed}/${total} documents`);
}
});Performance Comparison:
- Traditional:
await engine.add(doc)→ ~30-60 seconds for 10,000 docs - Batched:
await engine.flush()→ ~1-2 seconds for 10,000 docs
Examples:
- See
examples/bulk-indexing-worker.tsfor real-world usage patterns - Run
benchmarks/batched-indexing-benchmark.tsto measure performance on your system
Searching
Basic search (returns document IDs):
const ids = await engine.search("getting started", {
limit: 10, // Max results (optional)
fields: ["title"] // Limit search to specific fields (optional)
});
// ["doc-1", "doc-5", ...]Detailed search (returns scores and stored data):
const results = await engine.searchDetailed("getting started", {
limit: 10,
includeStored: true // Include stored metadata
});
// [
// { docId: "doc-1", score: 1.23, document: { url: "/guide", tags: [...] } },
// ...
// ]Removing Documents
await engine.remove("doc-1");Updating Documents
Update by removing and re-adding:
await engine.remove("doc-1");
await engine.add({
id: "doc-1",
fields: { title: "Updated Title", body: "New content" },
store: { url: "/new-url" }
});Retrieving Stored Data
const doc = await engine.getDocument("doc-1");
// { url: "/new-url" }Snapshots
Export and import index state (useful for persistence or worker transfer):
const snapshot = await engine.exportSnapshot();
await engine.importSnapshot(snapshot);Worker Snapshots
JSON-serializable format for postMessage:
const payload = await engine.exportWorkerSnapshot();
worker.postMessage(payload);
// In worker:
await engine.importWorkerSnapshot(payload);Cleanup
await engine.destroy(); // Deletes IndexedDB databaseInMemorySearchFn
For scenarios where data is already loaded in memory (e.g., links panel, global graph, collection items), use InMemorySearchFn - a lightweight variant without IndexedDB persistence.
Quick Start
import { InMemorySearchFn } from "searchfn";
const search = new InMemorySearchFn({
fields: ["title", "tags"]
});
// Add documents (synchronous)
search.add({
id: "link-1",
fields: { title: "Home Page", tags: "navigation main" },
store: { url: "/", visits: 42 }
});
// Search (synchronous)
const results = search.searchDetailed("home", { includeStored: true });
// [{ docId: "link-1", score: 1.23, document: { url: "/", visits: 42 } }]API
All methods are synchronous and match SearchFn API:
// Add document
search.add({
id: "doc-1",
fields: { title: "...", body: "..." },
store: { custom: "data" }
});
// Search
const ids = search.search("query", { limit: 10, fields: ["title"] });
const detailed = search.searchDetailed("query", { includeStored: true });
// Remove
search.remove("doc-1");
// Get stored data
const doc = search.getDocument("doc-1");
// Clear all
search.clear();
// Snapshot (for serialization/hydration)
const snapshot = search.exportSnapshot();
search.importSnapshot(snapshot);When to Use
- ✅ In-memory data (UI state, cached lists, navigation items)
- ✅ Temporary search (session-based, ephemeral collections)
- ✅ Fast initialization (no async setup)
- ❌ Large datasets requiring persistence
- ❌ Data that needs to survive page reloads
FlexSearch Compatibility
Index Adapter
Drop-in replacement for FlexSearch Index:
import { FlexSearchIndexAdapter } from "searchfn";
const index = new FlexSearchIndexAdapter({
name: "notes",
fields: ["text"],
cache: { term: 1024 }
});
// FlexSearch-compatible methods
await index.addAsync("doc-1", "Quick brown fox");
await index.addDocument({ id: "doc-2", fields: { text: "Lazy dog" } });
const ids = await index.searchAsync("quick");
const cached = await index.searchCacheAsync("fox", { limit: 5 });
await index.removeAsync("doc-1");
await index.clear();
// Worker snapshots
const snapshot = await index.exportWorkerSnapshot();
await index.importWorkerSnapshot(snapshot);Document Adapter
Multi-field search with schema:
import { FlexSearchDocumentAdapter } from "searchfn";
const index = new FlexSearchDocumentAdapter({
name: "articles",
fields: ["title", "body"],
document: {
id: "id",
index: ["title", "body"], // Fields to index
store: ["title", "url"] // Fields to store (not indexed)
}
});
await index.add({
id: "article-1",
fields: { title: "Hello", body: "World" },
store: { url: "/hello" }
});
const results = await index.search("hello");
// [{ result: ["article-1"], documents: [{ id: "article-1", score: 1.0, ... }] }]
await index.remove("article-1");Migrating from v0.1.x to v0.2.0
The main class has been renamed from SearchEngine to SearchFn for consistency. Backward-compatible exports are provided:
Old (still works, but deprecated):
import { SearchEngine } from "searchfn";
const engine = new SearchEngine({...});New (recommended):
import { SearchFn } from "searchfn";
const engine = new SearchFn({...});New Features in v0.2.0:
flush()- Explicit persistence controladdBulk()- Batch indexing with progress callbacks{ persist: false }option inadd()- Defer persistence- 15-30x performance improvement for bulk indexing
Migration Utilities
Migrate existing FlexSearch data from IndexedDB:
import { migrateFlexStoreToSearchFn, SearchFn } from "searchfn";
const engine = new SearchFn({
name: "migrated",
fields: ["title", "body"]
});
await migrateFlexStoreToSearchFn(engine, legacyFlexStore, {
indexFields: ["title", "body"],
storeFields: ["tags", "url"]
});See docs/migration-guide.md for a full walkthrough.
Examples
Basic CRUD Operations
import { SearchFn } from "searchfn";
const engine = new SearchFn({
name: "notes",
fields: ["content"]
});
// Add
await engine.add({
id: "note-1",
fields: { content: "Buy groceries" },
store: { createdAt: Date.now() }
});
// Search
const results = await engine.search("groceries");
// Update
await engine.remove("note-1");
await engine.add({
id: "note-1",
fields: { content: "Buy milk and bread" },
store: { createdAt: Date.now(), updatedAt: Date.now() }
});
// Retrieve
const note = await engine.getDocument("note-1");
console.log(note.updatedAt);
// Delete
await engine.remove("note-1");Multi-Field Search
const engine = new SearchFn({
name: "products",
fields: ["name", "description", "category"]
});
await engine.add({
id: "prod-1",
fields: {
name: "Wireless Mouse",
description: "Ergonomic design with USB receiver",
category: "Electronics"
},
store: { price: 29.99, sku: "MOUSE-001" }
});
// Search across all fields
const all = await engine.search("wireless");
// Search specific fields only
const nameOnly = await engine.search("mouse", {
fields: ["name"]
});Custom Pipeline
// English with stemming (default)
const engineEN = new SearchFn({
name: "english",
fields: ["text"],
pipeline: {
language: "en", // or "english"
enableStemming: true
}
});
// Spanish stop words
const engineES = new SearchFn({
name: "spanish",
fields: ["text"],
pipeline: {
language: "es" // or "spanish"
}
});
// French stop words
const engineFR = new SearchFn({
name: "french",
fields: ["text"],
pipeline: {
language: "fr" // or "french"
}
});
// Custom stop words (overrides language defaults)
const engineCustom = new SearchFn({
name: "custom",
fields: ["text"],
pipeline: {
stopWords: new Set(["the", "a", "an", "and", "or", "but"])
}
});Architecture
searchfn persists its inverted index in IndexedDB while maintaining hot data in LRU caches for low-latency queries.
Storage Layout
- metadata: Schema, field definitions, pipeline config
- terms: Posting lists keyed by
[field, term, chunk]with doc frequencies - vectors: Sparse document-field vectors for BM25 scoring
- documents: Stored metadata (not indexed, retrievable via
getDocument) - cacheState: Captured cache entries for warm start
Data Flow
Documents → Pipeline → Indexer → IndexedDB
↘ LRU Cache ↗
Query Engine ← Search queriesSee docs/spec.md for detailed architecture.
Testing
# Run all tests
npm test
# Watch mode
npm run test:watch
# Run specific test file
npx vitest run __tests__/search-engine.test.tsBuilding
# Build for production
npm run build
# Watch mode
npm run build:watch
# Lint and typecheck
npm run lint
npm run typecheckBenchmarks
npm run benchmarkDocumentation
License
MIT
Prefix Search & Autocomplete
Enable edge n-grams for prefix matching and autocomplete functionality:
import { InMemorySearchFn } from 'searchfn';
const autocomplete = new InMemorySearchFn({
fields: ['name', 'description'],
pipeline: {
enableEdgeNGrams: true,
edgeNGramMinLength: 2,
edgeNGramMaxLength: 15,
stopWords: [] // Keep all terms for better autocomplete
}
});
// Index data
autocomplete.add({
id: '1',
fields: { name: 'Anthropic Claude' },
store: { url: '/anthropic' }
});
// Progressive search as user types
autocomplete.search('an'); // Matches 'Anthropic'
autocomplete.search('anth'); // Still matches
autocomplete.search('anthropic'); // Exact match (higher score)Fuzzy Search
Handle typos and spelling variations with fuzzy matching:
const search = new InMemorySearchFn({
fields: ['title']
});
search.add({ id: '1', fields: { title: 'anthropic' } });
// Without fuzzy - no match
search.search('anthopric'); // []
// With fuzzy - matches within edit distance
search.search('anthopric', { fuzzy: 2 }); // ['1']
search.search('anthopric', { fuzzy: true }); // ['1'] (default distance=2)Fuzzy Options:
fuzzy: true- Use default edit distance of 2fuzzy: number- Custom Levenshtein distance (1-3 recommended)- Exact matches always rank higher than fuzzy matches
Combining Prefix + Fuzzy
Use both for powerful autocomplete with typo tolerance:
const search = new InMemorySearchFn({
fields: ['text'],
pipeline: {
enableEdgeNGrams: true,
edgeNGramMinLength: 2,
edgeNGramMaxLength: 10
}
});
// Prefix matching + fuzzy matching
search.search('antho', { fuzzy: 1 });