npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@zakkster/lite-search

v1.0.1

Published

Fuzzy search over a @zakkster/lite-store array with an index that maintains itself incrementally: trigram candidate generation, bounded zero-allocation Levenshtein ranking, weighted multi-field scoring. Zero runtime dependencies.

Readme

@zakkster/lite-search

npm version Zero-GC sponsor npm bundle size npm downloads npm total downloads TypeScript lite-signal peer lite-store peer Dependencies license

Fuzzy search over a @zakkster/lite-store array, with an index that maintains itself incrementally instead of rebuilding from scratch.

  • Trigram index for fast substring candidate generation.
  • Bounded Levenshtein for ranking near-matches, with two reused Int32Array rows and early-exit: zero allocation per candidate.
  • Weighted multi-field scoring (boost the title over the body, and so on).
  • Optional stop-word filtering.
  • A derived, read-only view of your data. lite-search never mutates your array and exposes no add/remove/update of its own. You mutate the array; the index reconciles itself.

Why not Fuse.js?

Fuse rebuilds its entire index when the data changes. For a large collection that is expensive on every edit. lite-search keeps a live trigram index and, when the data changes, runs a reference-equality diff to find the items that actually changed and re-trigrams only those (and only their changed fields). Adding or removing an item is caught automatically; the rest of the index is untouched.

Install

npm install @zakkster/lite-search @zakkster/lite-store @zakkster/lite-signal

@zakkster/lite-store and @zakkster/lite-signal are peer dependencies.

Quick start

import { store } from "@zakkster/lite-store";
import { signal, effect } from "@zakkster/lite-signal";
import { createSearchIndex } from "@zakkster/lite-search";

const items = store([
    { id: 1, title: "Buy milk",            body: "whole milk from the store", tags: ["errand"] },
    { id: 2, title: "Milk frother review", body: "comparing frothers",        tags: ["kitchen"] },
    { id: 3, title: "Read a book",         body: "the milky way galaxy",       tags: ["leisure"] },
]);

const search = createSearchIndex(items, {
    fields: ["title", "body", "tags"],
    weights: { title: 3, body: 1, tags: 2 },
    fuzziness: 0.3,
    limit: 20,
});

// One-shot search (a plain array):
search.search("milk");        // -> [{ item: {id:1,...}, id:1, score: 4.0 }, ...]
search.search("frothr");      // typo -> still finds the frother (id 2)

// Reactive search: pass a query signal, get a results signal.
const query = signal("galaxy");
const results = search.query(query);
effect(() => render(results()));   // re-runs when the query OR the item set changes

items.push({ id: 4, title: "Galaxy poster", body: "", tags: [] }); // results() updates automatically
query.set("milk");                                                  // results() updates automatically

How it stays in sync

lite-search is a derived view of your array. It tracks changes through two channels.

flowchart LR
    A["your lite-store array"] -->|"add / remove<br/>(length changes)"| B["length signal fires"]
    A -.->|"in-place field edit<br/>(no length change)"| C["you call search.sync()"]
    B --> R["reconcile()"]
    C --> R
    R -->|"reference-equality diff"| D["re-trigram only<br/>changed items + fields"]
    D --> V["bump version signal"]
    V --> Q["query() results recompute"]

Adds and removes are automatic. lite-search subscribes to the lite-store length signal -- a single signal, so this scales to any collection size. Pushing, splicing, or popping items reconciles the index and refreshes any live query() results.

In-place property edits need a nudge. Editing an existing item's field (items[0].title = "...") does not change the array's length, so it is not caught automatically. Call search.sync() after such edits to reconcile and refresh live results. (A one-shot search.search(...) always reconciles first, so it always sees the latest data.)

Why length-tracking instead of watching every field?

lite-store fires a separate reactive signal per field, and those signals live in a fixed-capacity reactive pool. Subscribing to every field of a large collection would exhaust that pool. Length-tracking plus an explicit sync() is honest and pool-safe: it covers the overwhelming majority of mutations (adds and removes) automatically, and gives a clear escape hatch for in-place edits. It also keeps lite-search strictly unidirectional -- your array is the single source of truth, and the index only ever derives from it.

The query pipeline

flowchart TD
    Q["query string"] --> N["normalize<br/>(lowercase, strip diacritics)"]
    N --> T["tokenize + drop stop-words"]
    T --> G["rolling 3-grams"]
    G --> C["gather candidates<br/>from the trigram index"]
    C --> P["prune by trigram overlap<br/>(similarity threshold)"]
    P --> S["score each candidate"]
    S --> S1["exact / prefix / substring match"]
    S --> S2["bounded Levenshtein<br/>(within the fuzziness budget)"]
    S1 --> W["weight per field, sum"]
    S2 --> W
    W --> O["sort by score, take limit"]

The trigram index generates a small candidate set quickly; Levenshtein then ranks those candidates. Levenshtein is not a fallback that finds terms with no trigram overlap -- see the limitation below.

Incremental indexing in detail

Each item contributes its trigrams to a shared gram -> set of item ids index. Because two fields of the same item can produce the same gram, lite-search keeps a per-item refcount of each gram. Editing one field decrements that field's grams and increments the new ones; an item only leaves a gram's posting set when its refcount reaches zero. So editing the title of an item whose body still contains the term keeps the item correctly indexed under that term.

reconcile() walks the array once and compares each item (and each indexed field) by reference. New items are trigrammed and inserted, removed items are dropped, and a field whose value reference changed is re-trigrammed -- nothing else is touched.

Ranking

A candidate's score is summed across fields. Within a field, each query token takes its best match against the field's tokens; those per-token best scores are averaged over the query tokens to produce the per-field contribution, then multiplied by weights[field] (default 1) and added to the total.

Per query token's best match against a field token:

| match | score contribution | | --- | --- | | exact token | 1.0 | | query is a prefix of the token | 0.9 | | query is a substring of the token | 0.7 | | within the Levenshtein budget | 0.6 * (1 - distance / (maxDistance + 1)) |

maxDistance for a query token is round(fuzziness * token.length). With fuzziness: 0, only exact/prefix/substring matches count -- fuzzy matching is fully disabled.

The trigram limitation (read this)

Candidate generation is trigram-based, so a query only finds items that share some trigrams with it. This is the same behavior as PostgreSQL's pg_trgm:

  • Typos that preserve trigram overlap are found and ranked by Levenshtein -- frothr finds frother, galxy finds galaxy, kiten finds kitten.
  • A pathological transposition that destroys all trigram overlap is not found -- mlik shares no trigrams with milk, so it is never a candidate. Levenshtein never sees it, by design (scanning every item with Levenshtein would defeat the index).

If you need to match arbitrary transpositions of very short words, lite-search is not the right tool.

API

createSearchIndex(items, options)

items must be an array (or array-like with a numeric .length) -- a lite-store array, or a plain array. A SearchError with code "not_array" is thrown otherwise.

| option | type | default | description | | --- | --- | --- | --- | | fields | string[] | (required) | item fields to index; array fields are joined | | weights | Record<string, number> | 1 each | per-field score multipliers | | fuzziness | number | 0.3 | max Levenshtein ratio; 0 disables fuzzy matching | | limit | number | 20 | maximum results returned | | stopWords | Iterable<string> | none | words to drop from indexed text and queries | | identify | (item) => string \| number | object identity | stable id per item (required for primitive items) |

Returns a SearchIndex:

| member | description | | --- | --- | | query(signal) | pass a query signal, get a reactive computed results signal | | query(string) | pass a string, get a one-shot results array | | search(string) | imperative one-shot search (reconciles first) | | sync() | force a reconcile after in-place edits; returns whether anything changed | | size | number of indexed items | | stats() | { items, grams } | | dispose() | release the length subscription and internal signal; idempotent |

Each result is { item, id, score }, where item is the current object from your array and higher score is more relevant.

dispose() is a one-way door

After dispose(), every other method is a no-op: search() and query(string) return an empty array, query(signal) returns a stable accessor that yields the empty array, and sync() returns false. None of them will rebuild the index against the disposed internal signal. dispose() itself is idempotent.

SearchError

Thrown for programmer mistakes. error.code is "not_array" or "misconfigured".

STOP_WORDS_EN

A small default English stop-word set: createSearchIndex(items, { fields, stopWords: STOP_WORDS_EN }).

Performance characteristics

  • Query hits the trigram index, prunes by overlap, then ranks the bounded candidate set with allocation-free Levenshtein. The per-candidate path allocates nothing; the only query-path allocations are the query's own tokens/grams and the returned results array.
  • Add / remove is caught automatically but runs an O(N) reference scan to find what changed (the length signal says that something changed, not which item). It then re-trigrams only the changed item. For very large, very churny datasets, batch your mutations and call sync() once.
  • reconcile() is O(N) reference comparisons plus O(changed) trigram work.

The zero-allocation hot-path property is locked in by test/07-zero-gc.test.mjs (run via npm run test:gc, which exposes global.gc and asserts < 100 B/query retained across 20k+ iterations). Run the included benchmark with node --expose-gc bench.mjs.

Zero dependencies

No runtime dependencies. lite-store and lite-signal are peers you already have. Ships as a single ESM file plus its type definitions.

License

MIT (c) Zahary Shinikchiev