cry-search
v1.0.3
Published
A fast, memory-efficient search library for large datasets with support for tokenized matching, linked collections, and per-field match modes.
Maintainers
Readme
cry-search
A fast, memory-efficient search library for large datasets with support for tokenized matching, linked collections, and per-field match modes.
Features
- Fast search - Pre-built metadata enables instant queries on large datasets
- Memory efficient - Numeric token storage uses ~45% less memory than string-based approaches
- Updatable - Add, update, or remove items without rebuilding metadata
- Augmented metadata - Add local/temporary searchable data without modifying global metadata
- Flexible matching - Per-field match modes and query prefixes for prefix, suffix, anywhere, exact, and negation
- Search prefixes - use
-word,--word, or~wordto exclude,..wordfor suffix,=wordfor exact match - Automatic normalization - Handles diacritics, case, dates, and number formatting
- Universal search - Search and update across linked collections (e.g., customers with their pets)
Table of Contents
- cry-search
Installation
npm install cry-searchExamples
Single Table Search
import {
createSearchArrayMetadata,
searchInDataReturnObjects,
arrayToSearchableData,
type SearchableObject,
} from 'cry-search';
interface Product extends SearchableObject {
_id: string;
name: string;
description: string;
sku: string;
}
const products: Product[] = [
{ _id: '1', name: 'iPhone 15 Pro', description: 'Latest Apple smartphone', sku: 'APL-IP15P' },
{ _id: '2', name: 'Samsung Galaxy S24', description: 'Android flagship phone', sku: 'SAM-GS24' },
{ _id: '3', name: 'MacBook Pro 16"', description: 'Apple laptop for professionals', sku: 'APL-MBP16' },
];
// Convert array to searchable data (Map<id, item>)
const data = arrayToSearchableData(products);
// Build search metadata (do this once, reuse for all searches)
const metadata = createSearchArrayMetadata(data, {
searchInFields: ['name', 'description', 'sku'],
});
// Search
const results = searchInDataReturnObjects('apple', metadata, data);
// Returns: Map with products 1 and 3
const results2 = searchInDataReturnObjects('samsung galaxy', metadata, data);
// Returns: Map with product 2
// Search is diacritics and case insensitive
const results3 = searchInDataReturnObjects('IPHONE', metadata, data);
// Returns: Map with product 1SearchUniverse with Linked Collections
For related data (like customers and their pets), use SearchUniverse to search across collections:
import { SearchUniverse, type SearchableObject } from 'cry-search';
interface Stranka extends SearchableObject {
_id: string;
name: string;
address: string;
}
interface Pacient extends SearchableObject {
_id: string;
stranka_id: string; // Foreign key to Stranka
name: string;
species: string;
}
// Create universe and register collections
const universe = new SearchUniverse();
universe.registerCollection<Stranka>('stranke', {
spec: { searchInFields: ['name', 'address'] },
});
universe.registerCollection<Pacient>('pacienti', {
spec: { dontSearchInFields: ['stranka_id'] },
linkedTo: {
collectionName: 'stranke',
foreignKeyGetter: (p) => p.stranka_id,
},
});
// Load data
universe.loadCollection('stranke', [
{ _id: 's1', name: 'Krajnik', address: 'Ljubljana' },
{ _id: 's2', name: 'Novak', address: 'Maribor' },
]);
universe.loadCollection('pacienti', [
{ _id: 'p1', stranka_id: 's1', name: 'Angie', species: 'cat' },
{ _id: 'p2', stranka_id: 's1', name: 'Rex', species: 'dog' },
{ _id: 'p3', stranka_id: 's2', name: 'Bella', species: 'cat' },
]);
// Search across all collections
const results = universe.searchUniversally('krajnik angie');
// Returns:
// {
// stranke: [{ _id: 's1', name: 'Krajnik', ... }],
// pacienti: [{
// primary: { _id: 's1', name: 'Krajnik', ... },
// linked: [{ _id: 'p1', name: 'Angie', ... }],
// matchedIn: 'both'
// }]
// }
// Search only in linked items
const results2 = universe.searchUniversally('rex');
// Returns pacienti results with matchedIn: 'linked'
// Search only in primary - returns all linked items
const results3 = universe.searchUniversally('novak');
// Returns Novak stranka with Bella in linked arrayUpdating Existing Data
When data changes, update the metadata to keep search in sync:
import {
createSearchArrayMetadata,
updateSearchMetadata,
updateSearchMetadataBatch,
arrayToSearchableData,
} from 'cry-search';
// Initial setup
const data = arrayToSearchableData(products);
const metadata = createSearchArrayMetadata(data);
// Update a single item
const updatedProduct = { ...products[0], name: 'iPhone 16 Pro' };
data.set(updatedProduct._id, updatedProduct);
updateSearchMetadata(metadata, updatedProduct._id, updatedProduct);
// Update multiple items at once
const updates = [
{ _id: '1', name: 'iPhone 16 Pro Max', description: 'Newest Apple phone', sku: 'APL-IP16PM' },
{ _id: '4', name: 'iPad Pro', description: 'Apple tablet', sku: 'APL-IPAD' },
];
for (const item of updates) {
data.set(item._id, item);
}
updateSearchMetadataBatch(metadata, updates);
// Mark item as deleted (removes from search but keeps in data)
const deletedProduct = { ...products[1], _deleted: new Date() };
data.set(deletedProduct._id, deletedProduct);
updateSearchMetadata(metadata, deletedProduct._id, deletedProduct);
// With SearchUniverse - handles linked indexes automatically
universe.updateSearchMetadata('pacienti', 'p1', {
_id: 'p1',
stranka_id: 's2', // Changed owner from s1 to s2
name: 'Angie',
species: 'cat',
});Augmented Metadata - Local Search Context
Sometimes you need to search items combined with page-specific data (like match status, local flags, computed properties) without modifying the global metadata. AugmentedMetadata wraps existing metadata and allows adding temporary searchable data.
import {
createSearchArrayMetadata,
searchInDataReturnIds,
AugmentedMetadata,
arrayToSearchableData,
type SearchableObject,
} from 'cry-search';
interface Medicine extends SearchableObject {
_id: string;
name: string;
manufacturer: string;
}
// Global data - shared across application
const medicines: Medicine[] = [
{ _id: '1', name: 'Aspirin', manufacturer: 'Bayer' },
{ _id: '2', name: 'Paracetamol', manufacturer: 'Krka' },
{ _id: '3', name: 'Ibuprofen', manufacturer: 'Lek' },
];
const data = arrayToSearchableData(medicines);
const globalMetadata = createSearchArrayMetadata(data);
// Page-specific: matching medicines with external registry
// Create augmented metadata with local match status
const augmented = new AugmentedMetadata(
globalMetadata,
{ searchInFields: ['status', 'notes'] } // Control which fields to index
);
// User matches medicines with external registry
augmented.updateAugmented('1', { status: 'matched', notes: 'verified' });
augmented.updateAugmented('2', { status: 'unmatched', notes: '' });
augmented.updateAugmented('3', { status: 'matching in progress', notes: 'awaiting approval' });
// Search by base data + augmented status
const matchedAspirin = searchInDataReturnIds('aspirin matched', augmented);
// Returns: ['1']
const unmatchedItems = searchInDataReturnIds('unmatched', augmented);
// Returns: ['2']
const inProgress = searchInDataReturnIds('progress', augmented);
// Returns: ['3']
// Search with negation
const notMatched = searchInDataReturnIds('bayer -matched', augmented);
// Returns: [] (Bayer Aspirin is matched)
// Batch update for efficiency
augmented.updateAugmentedBatch(new Map([
['1', { status: 'matched', notes: 'final' }],
['2', { status: 'matched', notes: 'approved' }],
['3', undefined], // Remove augmentation
]));
// Clear specific item
augmented.clearAugmented('1'); // Reverts to global metadata only
// Clear all augmentations
augmented.clearAllAugmented(); // Reset to global state
// AugmentedMetadata is memory efficient:
// - Items without augmentation reuse original Uint32Array (no memory duplication)
// - Items with augmentation get merged sorted token array (base + augmented)
// - Global metadata remains unchangedUse cases:
- Medicine matching with external registries (matched/unmatched status)
- Products with local pricing or availability flags
- Documents with review status or approval state
- Items with computed scores or temporary categories
Architecture
cry-search uses a two-phase approach: build metadata once, then search instantly.
Text Processing Pipeline
Both metadata building and search queries go through the same pipeline:
Input: "Čokolada 27.12.2025 SI123"
↓
1. Normalize dates → "Čokolada 20251227 SI123"
2. Preprocess → "Čokolada 20251227 SI 123" (split letter-digit)
3. Sanitize → "cokolada 20251227 si 123" (lowercase, remove diacritics)
4. Tokenize → ["cokolada", "20251227", "si", "123"]
5. Sort → ["123", "20251227", "cokolada", "si"]
6. Numeric IDs → Uint32Array [42, 891, 156, 7] (indices into global registry)Metadata Building
Tokens are stored as numeric IDs in a global registry. Each item's tokens are stored in a sorted Uint32Array for memory efficiency and fast binary search.
Searching
Query tokens are matched against metadata using binary search. All query tokens must match for an item to be returned. Match modes (start, end, startEnd, anywhere, whole) control how tokens are compared.
Memory Optimization
The numeric implementation stores tokens as Uint32Array indices instead of string arrays, reducing memory usage by ~45% compared to the string-based implementation.
Query Prefixes
Override match behavior per token at search time using prefixes:
| Prefix | Mode | Description |
|--------|------|-------------|
| word.. | start | Match tokens starting with "word" |
| ..word | end | Match tokens ending with "word" |
| =word | whole | Exact match only |
| ?word | anywhere | Match "word" anywhere in token |
| -word | negation | Exclude results (only when followed by letter) |
| --word | negation | Exclude results (works for words and numbers) |
| ~word | negation | Exclude results (works for words and numbers) |
Note: -5 is NOT negation (it's a negative number). Use --5 or ~5 to negate numbers.
Detecting negation: Linked search results include a hasNegation: boolean flag indicating if the query contained negation tokens.
Alternative prefixes (for programmatic use): <word (start), >word (end), +word (startEnd), *word (anywhere), !word (whole)
// Find "jana" but only exact match on "usenik"
searchInDataReturnObjects('jana =usenik', metadata, data);
// Matches "Jana Usenik" but not "Jana Useniker"
// Find "apple" but exclude results containing "iphone"
searchInDataReturnObjects('apple -iphone', metadata, data);
// Matches MacBook Pro but not iPhone
// Find products with SKU starting with "APL"
searchInDataReturnObjects('APL..', metadata, data);
// Matches APL-IP15P, APL-MBP16, etc.
// Find exact barcode code
searchInDataReturnObjects('=9001', metadata, data);
// Matches "9001" but NOT "035585249001" or "9001917"Number Matching
Match mode for numbers is determined by the target token's length (not the query):
- Short numbers (≤6 digits): require exact (whole) match
- Long numbers (>6 digits): allow start or end match (for barcodes)
// Query "60731" against barcode "3838989760731" (13 digits) → MATCH (ends with 60731)
// Query "123" against code "123" (3 digits) → MATCH (exact)
// Query "12" against code "123" (3 digits) → NO MATCH (short number needs exact)Priority: query prefix > field match mode > global SearchOpts default.
Legacy String Implementation
A string-based implementation is available for backwards compatibility. Import from the /string subpath:
import {
createSearchArrayMetadataString,
updateSearchMetadataString,
findInArray, // String-based search function
} from 'cry-search/string';The string implementation is excluded from the main bundle. The default import (cry-search) only includes the modern numeric implementation, keeping bundle sizes smaller.
Specification
See CLAUDE.md for the complete technical specification including:
- Input processing pipeline (date normalization, preprocessing, sanitization)
- Tokenization rules
- Matching behavior
- Linked search semantics
- Type definitions
License
See LICENSE.md for license terms.
