cry-search

v1.0.3

Published

8 days ago

A fast, memory-efficient search library for large datasets with support for tokenized matching, linked collections, and per-field match modes.

0High
0Medium
0Low

primoz.krajnik

search fuzzy-search tokenize metadata fast-search linked-collections typescript

cry-search

A fast, memory-efficient search library for large datasets with support for tokenized matching, linked collections, and per-field match modes.

Features

Fast search - Pre-built metadata enables instant queries on large datasets
Memory efficient - Numeric token storage uses ~45% less memory than string-based approaches
Updatable - Add, update, or remove items without rebuilding metadata
Augmented metadata - Add local/temporary searchable data without modifying global metadata
Flexible matching - Per-field match modes and query prefixes for prefix, suffix, anywhere, exact, and negation
Search prefixes - use -word, --word, or ~word to exclude, ..word for suffix, =word for exact match
Automatic normalization - Handles diacritics, case, dates, and number formatting
Universal search - Search and update across linked collections (e.g., customers with their pets)

cry-search

Installation

npm install cry-search

Examples

Single Table Search

import {
  createSearchArrayMetadata,
  searchInDataReturnObjects,
  arrayToSearchableData,
  type SearchableObject,
} from 'cry-search';

interface Product extends SearchableObject {
  _id: string;
  name: string;
  description: string;
  sku: string;
}

const products: Product[] = [
  { _id: '1', name: 'iPhone 15 Pro', description: 'Latest Apple smartphone', sku: 'APL-IP15P' },
  { _id: '2', name: 'Samsung Galaxy S24', description: 'Android flagship phone', sku: 'SAM-GS24' },
  { _id: '3', name: 'MacBook Pro 16"', description: 'Apple laptop for professionals', sku: 'APL-MBP16' },
];

// Convert array to searchable data (Map<id, item>)
const data = arrayToSearchableData(products);

// Build search metadata (do this once, reuse for all searches)
const metadata = createSearchArrayMetadata(data, {
  searchInFields: ['name', 'description', 'sku'],
});

// Search
const results = searchInDataReturnObjects('apple', metadata, data);
// Returns: Map with products 1 and 3

const results2 = searchInDataReturnObjects('samsung galaxy', metadata, data);
// Returns: Map with product 2

// Search is diacritics and case insensitive
const results3 = searchInDataReturnObjects('IPHONE', metadata, data);
// Returns: Map with product 1

SearchUniverse with Linked Collections

For related data (like customers and their pets), use SearchUniverse to search across collections:

import { SearchUniverse, type SearchableObject } from 'cry-search';

interface Stranka extends SearchableObject {
  _id: string;
  name: string;
  address: string;
}

interface Pacient extends SearchableObject {
  _id: string;
  stranka_id: string;  // Foreign key to Stranka
  name: string;
  species: string;
}

// Create universe and register collections
const universe = new SearchUniverse();

universe.registerCollection<Stranka>('stranke', {
  spec: { searchInFields: ['name', 'address'] },
});

universe.registerCollection<Pacient>('pacienti', {
  spec: { dontSearchInFields: ['stranka_id'] },
  linkedTo: {
    collectionName: 'stranke',
    foreignKeyGetter: (p) => p.stranka_id,
  },
});

// Load data
universe.loadCollection('stranke', [
  { _id: 's1', name: 'Krajnik', address: 'Ljubljana' },
  { _id: 's2', name: 'Novak', address: 'Maribor' },
]);

universe.loadCollection('pacienti', [
  { _id: 'p1', stranka_id: 's1', name: 'Angie', species: 'cat' },
  { _id: 'p2', stranka_id: 's1', name: 'Rex', species: 'dog' },
  { _id: 'p3', stranka_id: 's2', name: 'Bella', species: 'cat' },
]);

// Search across all collections
const results = universe.searchUniversally('krajnik angie');
// Returns:
// {
//   stranke: [{ _id: 's1', name: 'Krajnik', ... }],
//   pacienti: [{
//     primary: { _id: 's1', name: 'Krajnik', ... },
//     linked: [{ _id: 'p1', name: 'Angie', ... }],
//     matchedIn: 'both'
//   }]
// }

// Search only in linked items
const results2 = universe.searchUniversally('rex');
// Returns pacienti results with matchedIn: 'linked'

// Search only in primary - returns all linked items
const results3 = universe.searchUniversally('novak');
// Returns Novak stranka with Bella in linked array

Updating Existing Data

When data changes, update the metadata to keep search in sync:

import {
  createSearchArrayMetadata,
  updateSearchMetadata,
  updateSearchMetadataBatch,
  arrayToSearchableData,
} from 'cry-search';

// Initial setup
const data = arrayToSearchableData(products);
const metadata = createSearchArrayMetadata(data);

// Update a single item
const updatedProduct = { ...products[0], name: 'iPhone 16 Pro' };
data.set(updatedProduct._id, updatedProduct);
updateSearchMetadata(metadata, updatedProduct._id, updatedProduct);

// Update multiple items at once
const updates = [
  { _id: '1', name: 'iPhone 16 Pro Max', description: 'Newest Apple phone', sku: 'APL-IP16PM' },
  { _id: '4', name: 'iPad Pro', description: 'Apple tablet', sku: 'APL-IPAD' },
];
for (const item of updates) {
  data.set(item._id, item);
}
updateSearchMetadataBatch(metadata, updates);

// Mark item as deleted (removes from search but keeps in data)
const deletedProduct = { ...products[1], _deleted: new Date() };
data.set(deletedProduct._id, deletedProduct);
updateSearchMetadata(metadata, deletedProduct._id, deletedProduct);

// With SearchUniverse - handles linked indexes automatically
universe.updateSearchMetadata('pacienti', 'p1', {
  _id: 'p1',
  stranka_id: 's2',  // Changed owner from s1 to s2
  name: 'Angie',
  species: 'cat',
});

Augmented Metadata - Local Search Context

Sometimes you need to search items combined with page-specific data (like match status, local flags, computed properties) without modifying the global metadata. AugmentedMetadata wraps existing metadata and allows adding temporary searchable data.

import {
  createSearchArrayMetadata,
  searchInDataReturnIds,
  AugmentedMetadata,
  arrayToSearchableData,
  type SearchableObject,
} from 'cry-search';

interface Medicine extends SearchableObject {
  _id: string;
  name: string;
  manufacturer: string;
}

// Global data - shared across application
const medicines: Medicine[] = [
  { _id: '1', name: 'Aspirin', manufacturer: 'Bayer' },
  { _id: '2', name: 'Paracetamol', manufacturer: 'Krka' },
  { _id: '3', name: 'Ibuprofen', manufacturer: 'Lek' },
];

const data = arrayToSearchableData(medicines);
const globalMetadata = createSearchArrayMetadata(data);

// Page-specific: matching medicines with external registry
// Create augmented metadata with local match status
const augmented = new AugmentedMetadata(
  globalMetadata,
  { searchInFields: ['status', 'notes'] }  // Control which fields to index
);

// User matches medicines with external registry
augmented.updateAugmented('1', { status: 'matched', notes: 'verified' });
augmented.updateAugmented('2', { status: 'unmatched', notes: '' });
augmented.updateAugmented('3', { status: 'matching in progress', notes: 'awaiting approval' });

// Search by base data + augmented status
const matchedAspirin = searchInDataReturnIds('aspirin matched', augmented);
// Returns: ['1']

const unmatchedItems = searchInDataReturnIds('unmatched', augmented);
// Returns: ['2']

const inProgress = searchInDataReturnIds('progress', augmented);
// Returns: ['3']

// Search with negation
const notMatched = searchInDataReturnIds('bayer -matched', augmented);
// Returns: [] (Bayer Aspirin is matched)

// Batch update for efficiency
augmented.updateAugmentedBatch(new Map([
  ['1', { status: 'matched', notes: 'final' }],
  ['2', { status: 'matched', notes: 'approved' }],
  ['3', undefined], // Remove augmentation
]));

// Clear specific item
augmented.clearAugmented('1');  // Reverts to global metadata only

// Clear all augmentations
augmented.clearAllAugmented();  // Reset to global state

// AugmentedMetadata is memory efficient:
// - Items without augmentation reuse original Uint32Array (no memory duplication)
// - Items with augmentation get merged sorted token array (base + augmented)
// - Global metadata remains unchanged

Use cases:

Medicine matching with external registries (matched/unmatched status)
Products with local pricing or availability flags
Documents with review status or approval state
Items with computed scores or temporary categories

Architecture

cry-search uses a two-phase approach: build metadata once, then search instantly.

Text Processing Pipeline

Both metadata building and search queries go through the same pipeline:

Input: "Čokolada 27.12.2025 SI123"
   ↓
1. Normalize dates     → "Čokolada 20251227 SI123"
2. Preprocess          → "Čokolada 20251227 SI 123"  (split letter-digit)
3. Sanitize            → "cokolada 20251227 si 123"  (lowercase, remove diacritics)
4. Tokenize            → ["cokolada", "20251227", "si", "123"]
5. Sort                → ["123", "20251227", "cokolada", "si"]
6. Numeric IDs         → Uint32Array [42, 891, 156, 7]  (indices into global registry)

Metadata Building

Tokens are stored as numeric IDs in a global registry. Each item's tokens are stored in a sorted Uint32Array for memory efficiency and fast binary search.

Searching

Query tokens are matched against metadata using binary search. All query tokens must match for an item to be returned. Match modes (start, end, startEnd, anywhere, whole) control how tokens are compared.

Memory Optimization

The numeric implementation stores tokens as Uint32Array indices instead of string arrays, reducing memory usage by ~45% compared to the string-based implementation.

Query Prefixes

Override match behavior per token at search time using prefixes:

| Prefix | Mode | Description | |--------|------|-------------| | word.. | start | Match tokens starting with "word" | | ..word | end | Match tokens ending with "word" | | =word | whole | Exact match only | | ?word | anywhere | Match "word" anywhere in token | | -word | negation | Exclude results (only when followed by letter) | | --word | negation | Exclude results (works for words and numbers) | | ~word | negation | Exclude results (works for words and numbers) |

Note: -5 is NOT negation (it's a negative number). Use --5 or ~5 to negate numbers.

Detecting negation: Linked search results include a hasNegation: boolean flag indicating if the query contained negation tokens.

Alternative prefixes (for programmatic use): <word (start), >word (end), +word (startEnd), *word (anywhere), !word (whole)

// Find "jana" but only exact match on "usenik"
searchInDataReturnObjects('jana =usenik', metadata, data);
// Matches "Jana Usenik" but not "Jana Useniker"

// Find "apple" but exclude results containing "iphone"
searchInDataReturnObjects('apple -iphone', metadata, data);
// Matches MacBook Pro but not iPhone

// Find products with SKU starting with "APL"
searchInDataReturnObjects('APL..', metadata, data);
// Matches APL-IP15P, APL-MBP16, etc.

// Find exact barcode code
searchInDataReturnObjects('=9001', metadata, data);
// Matches "9001" but NOT "035585249001" or "9001917"

Number Matching

Match mode for numbers is determined by the target token's length (not the query):

Short numbers (≤6 digits): require exact (whole) match
Long numbers (>6 digits): allow start or end match (for barcodes)

// Query "60731" against barcode "3838989760731" (13 digits) → MATCH (ends with 60731)
// Query "123" against code "123" (3 digits) → MATCH (exact)
// Query "12" against code "123" (3 digits) → NO MATCH (short number needs exact)

Priority: query prefix > field match mode > global SearchOpts default.

Legacy String Implementation

A string-based implementation is available for backwards compatibility. Import from the /string subpath:

import {
  createSearchArrayMetadataString,
  updateSearchMetadataString,
  findInArray,  // String-based search function
} from 'cry-search/string';

The string implementation is excluded from the main bundle. The default import (cry-search) only includes the modern numeric implementation, keeping bundle sizes smaller.

Specification

See CLAUDE.md for the complete technical specification including:

Input processing pipeline (date normalization, preprocessing, sanitization)
Tokenization rules
Matching behavior
Linked search semantics
Type definitions

License

See LICENSE.md for license terms.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

cry-search

Features

Table of Contents

Installation

Examples

Single Table Search

SearchUniverse with Linked Collections

Updating Existing Data

Augmented Metadata - Local Search Context

Architecture

Text Processing Pipeline

Metadata Building

Searching

Memory Optimization

Query Prefixes

Number Matching

Legacy String Implementation

Specification

License