npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

sentence-finder

v1.1.0

Published

A lightweight library to find sentences matching user input based on shared words

Readme

SentenceFinder

A high-performance TypeScript library for searching and managing collections of sentences with features like ranked results, prefix suggestions, and word frequency analysis.

Features

  • 🔍 Smart Search: Find sentences by word matches with configurable strictness

SentenceFinder

A high-performance TypeScript library for searching and managing collections of sentences. It supports ranked searches, prefix suggestions, partial/substring matching, merge/deduplication, word-frequency inspection and a small event API.

Highlights

  • Smart search with configurable matching behavior (exact, prefix fallback, partial/substring).
  • Ranked results that prefer exact word matches, then prefixes, then substrings; ties broken by earliest match position and original insertion order.
  • Fast prefix suggestions (binary search over a cached sorted dictionary).
  • Merge multiple finders with optional deduplication.
  • Expose internal dictionary and word frequency maps for analysis.
  • Small event system for init, search, suggest, merge, and reset events.

Installation

npm install sentence-finder

Quick start

import { SentenceFinder } from 'sentence-finder';

// Create finder with defaults
const finder = new SentenceFinder();

// Initialize with sentences
finder.init([
  'The quick brown fox jumps over the lazy dog',
  'Quick foxes are known for jumping',
  'Dogs are usually lazy in the afternoon',
]);

// Simple search (non-ranked)
console.log(finder.searchArray('fox jump'));

// Ranked search
const { results } = finder.search('fox jump', { ranked: true });
console.log(results);

Constructor options

interface SentenceFinderOptions {
  min_match_count?: number; // Minimum number of matched tokens required (default: 3)
  case_sensitive?: boolean; // Whether token matching is case-sensitive (default: false)
  tokenizer?: (text: string) => string[]; // Provide a custom tokenizer
  strict_tokens?: boolean; // Use the built-in strict tokenizer (default: false)
}

Notes:

  • The default min_match_count is intentionally conservative (3) to avoid noisy single-word matches; set it to 1 for single-token searches or in tests/examples where appropriate.
  • case_sensitive: false (default) means tokens are normalized to lower-case before indexing and searching.

Tokenizers

  • default tokenizer: splits on non-letter/non-number boundaries and (unless case_sensitive is true) lowercases tokens.
  • strict tokenizer: preserves hyphenated words and contractions better and trims extra spaces; it still respects the case_sensitive option when producing tokens.
  • custom tokenizer: pass a (text: string) => string[] to the constructor if you need special tokenization.

Example:

const finder = new SentenceFinder({ strict_tokens: true });

Search

API:

search(text: string, options?: { ranked?: boolean; min_match_count?: number; partial?: boolean }): { results: string[]; finder: SentenceFinder }
searchArray(text: string, options?: { ranked?: boolean; min_match_count?: number; partial?: boolean }): string[]

Behavior:

  • Tokenizes the input text using the configured tokenizer and normalizes tokens (unless case_sensitive).
  • partial: false (default) performs exact word matching. If an exact token is not present in the dictionary, the search will fall back to prefix matching (dictionary words that start with the token).
  • partial: true performs substring matching across dictionary words.
  • Matches are counted per sentence; only sentences with at least min_match_count distinct token matches are returned.

Ranking (when ranked: true):

  • A weighted score is computed per sentence based on occurrences of the search tokens inside the sentence tokens.
  • Preference order: exact word matches (strongest) > prefix matches > substring matches.
  • Ties are broken by earliest token position where a match appears in the sentence, then by original insertion order.

Examples:

finder.searchArray('fox jump'); // non-ranked
finder.search('fox jump', { ranked: true }); // ranked
finder.search('irr', { partial: true }); // substring matches

Suggestions

API:

suggest(prefix: string): { suggestions: string[]; finder: SentenceFinder }
  • Returns dictionary words that start with the provided prefix.
  • Uses a cached sorted array of dictionary keys and binary search for fast lookups.
  • Cache is invalidated when the dictionary is modified (via init, merge, or reset).

Merge and deduplication

API:

merge(finder: SentenceFinder, options?: { deduplicate?: boolean }): this
  • Merges another SentenceFinder instance into this one.
  • deduplicate: true will avoid adding duplicate sentences that already exist in the receiving finder.
  • When deduplicating the implementation ensures new sentences are added and their tokens are indexed; word frequency is updated accordingly for newly added sentences.

Note: merging preserves the receiving finder's tokenizer/case-sensitivity configuration for how sentences are indexed after merge.


Collection management

  • init(sentences: string[]): this — initialize or re-initialize the finder with a new collection. Clears previous indexes and caches.
  • reset(): this — clear collection, dictionary, frequencies and caches.

Analysis helpers

  • getDictionary(): Map<string, number[]> — returns the internal mapping of token -> sentence index list.
  • getWordFrequency(): Map<string, number> — returns a map of token -> occurrence count across the collection.

These are useful for debugging, exporting statistics, or building external visualizations.


Events

You can subscribe to lifecycle events using on(event, listener):

Supported events:

  • init — called after init completes with the number of sentences
  • search — called after each search with the number of results
  • suggest — called after each suggest with the number of suggestions
  • merge — called after merge with the number of sentences merged
  • reset — called after reset

Example:

finder.on('search', count => console.log(`Found ${count} matches`));

Performance notes

  • search (non-ranked) uses direct dictionary lookups where possible and only scans keys for prefix/substring fallbacks when needed.
  • search (ranked) computes per-sentence scores based on token occurrences; this is efficient for moderate collections but will perform more work than non-ranked searches.
  • suggest uses binary search on a cached sorted key array — first call may pay the sort cost; subsequent calls are fast.
  • merge with deduplication does additional work to avoid duplicates; for very large datasets consider batching or incremental updates.

Examples

See the examples/ folder for small runnable snippets demonstrating initialization, searching, suggestions, merging and tokenization options.


Contributing

Contributions welcome — open an issue or submit a pull request.

License

MIT