npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@codemix/text-search

v0.0.2

Published

Fast, accurate text search with BM25 scoring and stemming.

Readme

@codemix/text-search

Fast, accurate text search with BM25-inspired scoring, English stemming, and phrase-aware ranking. Zero dependencies.

Features

  • BM25-inspired scoring — term frequency saturation and document length normalization
  • Porter stemmer — matches "running" to "run", "connections" to "connect"
  • Stopword filtering — ignores noise words like "the", "and", "is"
  • Phrase bonuses — consecutive term matches score higher than scattered terms
  • Exact and prefix bonuses — boosts documents containing the literal query string
  • Position-aware scoring — terms appearing earlier in a document score slightly higher
  • Detailed results — inspect each scoring component for debugging or tuning
  • Scores clamped to [0, 1] — always safe to compare or threshold

Installation

npm install @codemix/text-search

Quick start

import { createMatcher, rankDocuments } from "@codemix/text-search";

// Score a single document against a query
const match = createMatcher("quick brown fox");
match("The quick brown fox jumps over the lazy dog"); // ~0.85
match("A slow gray elephant"); // 0.0

// Rank an array of strings by relevance
const results = rankDocuments("database connection", [
  "How to connect to a database",
  "Database connection pooling best practices",
  "Unrelated article about cooking",
]);
// results[0].document === "Database connection pooling best practices"
// results[0].score    === 0.73 (example)

API

createMatcher(query, options?): MatcherFn

Prepares a query and returns a scoring function. Creating the matcher does the expensive tokenization work once; the returned function is cheap to call repeatedly.

const match = createMatcher("error handling");

match("Always handle errors gracefully"); // > 0
match("Cooking recipes for beginners"); // 0

Returns (text: string) => number — a score in [0, 1].


createDetailedMatcher(query, options?): (text: string) => MatchResult

Same as createMatcher but returns a full breakdown of each scoring component. Useful for debugging or building custom ranking logic on top.

const match = createDetailedMatcher("quick brown fox");
const result = match("The quick brown fox jumps");
// {
//   score: 0.87,
//   termScore: 0.52,        // BM25 term frequency contribution
//   exactBonus: 0.15,       // literal substring found in document
//   prefixBonus: 0.0,       // document did not start with query
//   consecutiveBonus: 0.1,  // "quick brown" and "brown fox" appeared in order
//   positionScore: 0.05,    // terms appeared near the start
//   matchedTerms: 3,        // all 3 query terms matched
//   totalTerms: 3,
// }

rankDocuments(query, documents, options?)

Scores an array of strings and returns them sorted by relevance, highest first.

const results = rankDocuments("react hooks", [
  "React hooks introduction and useState examples",
  "Understanding React component lifecycle methods",
  "Getting started with React Native development",
]);
// [
//   { document: "React hooks introduction...", score: 0.71 },
//   { document: "Getting started with React Native...", score: 0.28 },
//   { document: "Understanding React component...", score: 0.21 },
// ]

rankDocuments(query, config, documents, options?)

Scores an array of objects by extracting text via a function, and returns a key from each object alongside its score.

interface Article {
  slug: string;
  title: string;
  content: string;
}

const articles: Article[] = [
  { slug: "intro-react", title: "Introduction to React", content: "Learn React basics" },
  { slug: "vue-guide", title: "Vue.js Guide", content: "Getting started with Vue" },
];

const results = rankDocuments(
  "react",
  { key: "slug", text: (a) => `${a.title} ${a.content}` },
  articles,
);
// [
//   { key: "intro-react", score: 0.55 },
//   { key: "vue-guide",   score: 0.0  },
// ]

tokenize(text, options?): Token[]

Splits text into tokens, optionally applying stemming and stopword removal. Each token carries its original form, stemmed form, and word-index position.

import { tokenize } from "@codemix/text-search";

tokenize("The quick brown fox");
// [
//   { original: "quick", stemmed: "quick", position: 1 },
//   { original: "brown", stemmed: "brown", position: 2 },
//   { original: "fox",   stemmed: "fox",   position: 3 },
// ]

tokenize("running dogs", { stem: true });
// [
//   { original: "running", stemmed: "run", position: 0 },
//   { original: "dogs",    stemmed: "dog", position: 1 },
// ]

stem(word): string

Applies the Porter stemming algorithm to a single lowercase word.

import { stem } from "@codemix/text-search";

stem("running"); // "run"
stem("connections"); // "connect"
stem("happily"); // "happili"

extractTerms(text, options?): string[]

Convenience wrapper around tokenize that returns just the stemmed term strings.

import { extractTerms } from "@codemix/text-search";

extractTerms("Database connection pooling");
// ["databas", "connect", "pool"]

buildTermFrequency(tokens): Map<string, number>

Builds a term → count map from a token array.


buildPositionMap(tokens): Map<string, number[]>

Builds a term → positions map from a token array.


STOPWORDS

A Set<string> of the English stopwords that are filtered by default.

Options

All main functions accept a shared options object.

Tokenization options (TokenizeOptions)

| Option | Type | Default | Description | | ----------------- | --------- | ------- | ------------------------------- | | stem | boolean | true | Apply Porter stemming | | removeStopwords | boolean | true | Filter common English stopwords | | minLength | number | 1 | Minimum token length to keep |

Matcher options (MatcherOptions)

Extends TokenizeOptions with BM25 tuning and bonus weights.

| Option | Type | Default | Description | | ------------------ | -------- | ------- | ----------------------------------------------------------------- | | k1 | number | 1.2 | BM25 term-frequency saturation. Higher = more weight on frequency | | b | number | 0.75 | BM25 length normalization. 0 = none, 1 = full | | exactMatchBonus | number | 0.15 | Added when query appears as a literal substring | | prefixMatchBonus | number | 0.1 | Added when document starts with query | | consecutiveBonus | number | 0.1 | Added proportionally for consecutive term matches | | positionWeight | number | 0.05 | Weight for position-based scoring |

Disabling all bonuses gives you a pure BM25-style score capped at ~0.6:

const match = createMatcher("error", {
  exactMatchBonus: 0,
  prefixMatchBonus: 0,
  consecutiveBonus: 0,
  positionWeight: 0,
});

How scoring works

The final score is the sum of five components, clamped to [0, 1]:

score = termScore + exactBonus + prefixBonus + consecutiveBonus + positionScore

| Component | Max contribution | Description | | ------------------ | ---------------- | --------------------------------------------------------------- | | termScore | 0.60 | BM25-style term frequency + query term coverage | | exactBonus | 0.15 | Literal query string found in document | | prefixBonus | 0.10 | Document starts with query string | | consecutiveBonus | 0.10 | Fraction of adjacent query term pairs that appear consecutively | | positionScore | 0.05 | Earlier term positions score higher |

Term score detail

Because there is no corpus to compute IDF from, the library uses a simplified BM25 formula:

  1. For each unique stemmed query term, compute the saturated term frequency in the document using the standard BM25 TF formula with the configured k1/b parameters and an assumed average document length of 50 words.
  2. Normalize the total by the number of query terms.
  3. Apply a coverage multiplier: matching 3/3 terms scores higher than matching 1/3.
  4. Scale to a maximum of 0.6 to leave headroom for bonuses.

Stemming

The library ships a complete Porter stemmer (Martin Porter, 1980). It handles all five steps of the algorithm:

  • Step 1a/b/c — plurals, -ed/-ing endings, -y-i
  • Step 2 — suffix reductions with measure > 0 (-ational-ate, etc.)
  • Step 3 — additional suffix reductions (-icate-ic, etc.)
  • Step 4 — suffix removal with measure > 1
  • Step 5a/b — trailing -e and -ll removal

Words shorter than 3 characters are returned unchanged.

Note: the Porter stemmer is a rule-based algorithm and does not handle irregular forms. ran will not stem to run.

Performance

Benchmarks on 1000-word documents (Node.js, M-series Mac):

| Operation | Time | | -------------------------------------------- | --------- | | Tokenize 1 document × 100 | < 500 ms | | Score 1 document × 100 | < 500 ms | | Rank 100 × 1000-word documents | < 1000 ms | | 500 scoring operations (50 docs × 10 passes) | < 1000 ms |

The matcher compiles the query once — reuse the function returned by createMatcher when scoring many documents against the same query.

Examples

Search a documentation index

const docs = [
  "How to connect to a PostgreSQL database using Node.js",
  "Database connection pooling best practices",
  "Introduction to NoSQL databases",
  "Setting up a web server with Express",
  "Understanding database transactions and ACID properties",
];

const results = rankDocuments("database connection", docs);
// results[0].document === "Database connection pooling best practices"

Autocomplete / prefix search

const faq = [
  "How do I reset my password?",
  "What payment methods are accepted?",
  "How to cancel my subscription",
];

const match = createMatcher("cancel subscription");
faq.filter((q) => match(q) > 0.3);
// ["How to cancel my subscription"]

Search structured objects with a custom key

const users = [
  { id: 1, name: "John Doe", bio: "Software engineer at TechCorp" },
  { id: 2, name: "Jane Smith", bio: "Product manager with engineering background" },
  { id: 3, name: "Bob Wilson", bio: "Marketing specialist" },
];

const results = rankDocuments(
  "software engineer",
  { key: "id", text: (u) => `${u.name} ${u.bio}` },
  users,
);
// results[0].key === 1  (John Doe)

Debug scoring

const match = createDetailedMatcher("machine learning");
const result = match("Introduction to machine learning algorithms");

console.log(`Score: ${result.score}`);
console.log(`Terms matched: ${result.matchedTerms}/${result.totalTerms}`);
console.log(`Consecutive bonus: ${result.consecutiveBonus}`);

Disable stemming for exact keyword matching

const match = createMatcher("engineering", { stem: false });
// Only matches the literal word "engineering", not "engineer" or "engineered"

License

MIT