npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@stll/aho-corasick

v1.0.1

Published

Exact many-pattern string search for Node.js and Bun via Rust's aho-corasick engine exposed through NAPI-RS.

Readme

@stll/aho-corasick

NAPI-RS bindings to the Rust aho-corasick crate for Node.js and Bun.

Multi-pattern string searching in linear time. Built on BurntSushi's aho-corasick (the same engine that powers ripgrep), exposed to JavaScript via NAPI-RS.

Install

npm install @stll/aho-corasick
# or
bun add @stll/aho-corasick

The companion @stll/aho-corasick-wasm package is available for browser builds.

If you use the browser package with Vite, import the bundled plugin so the generated WASM loader is not pre-bundled into broken asset URLs:

import { defineConfig } from "vite";
import stllAhoCorasickWasm from "@stll/aho-corasick-wasm/vite";

export default defineConfig({
  plugins: [stllAhoCorasickWasm()],
});

GitHub releases include npm tarballs, an SBOM, and third-party notices.

Prebuilts are available for:

| Platform | Architecture | | ------------- | ------------ | | macOS | x64, arm64 | | Linux (glibc) | x64, arm64 | | WASM | browser |

Usage

import { AhoCorasick } from "@stll/aho-corasick";

const ac = new AhoCorasick(["foo", "bar", "baz"]);

// Check for any match
ac.isMatch("hello foo world"); // true

// Find all non-overlapping matches
ac.findIter("foo bar baz");
// [
//   { pattern: 0, start: 0, end: 3, text: "foo" },
//   { pattern: 1, start: 4, end: 7, text: "bar" },
//   { pattern: 2, start: 8, end: 11, text: "baz" },
// ]

// Find overlapping matches
ac.findOverlappingIter("foobar");

// Replace matches
ac.replaceAll("foo bar", ["FOO", "BAR", "BAZ"]);
// "FOO BAR"

Options

const ac = new AhoCorasick(patterns, {
  // Match semantics (default: "leftmost-first")
  matchKind: "leftmost-longest",
  // Unicode simple case folding (default: false)
  caseInsensitive: true,
  // Only match whole words (default: false)
  // Unicode-aware; CJK always passes
  wholeWords: true,
});

Streaming

For processing large files or streams chunk by chunk:

import { StreamMatcher } from "@stll/aho-corasick";

const sm = new StreamMatcher(["needle", "haystack"]);

for await (const chunk of readableStream) {
  const matches = sm.write(chunk);
  for (const m of matches) {
    console.log(
      `Pattern ${m.pattern} ` + `at ${m.start}..${m.end}`,
    );
  }
}

sm.flush(); // finalize stream state
sm.reset(); // reuse for another stream

StreamMatcher automatically handles overlap between chunks so that matches spanning chunk boundaries are found.

Groups

To organize patterns into named groups (e.g., for entity recognition), use the pattern index as a lookup key into a parallel array:

const GROUPS = {
  LEGAL_FORM: ["s.r.o.", "GmbH", "LLC"],
  CURRENCY: ["EUR", "USD", "CZK"],
};

const patterns: string[] = [];
const tag: string[] = [];

for (const [group, terms] of Object.entries(GROUPS)) {
  for (const term of terms) {
    patterns.push(term);
    tag.push(group);
  }
}

const ac = new AhoCorasick(patterns, {
  wholeWords: true,
});

for (const m of ac.findIter(text)) {
  console.log(m.text, tag[m.pattern]);
  // "GmbH" "LEGAL_FORM"
  // "EUR"  "CURRENCY"
}

tag[m.pattern] is a single array index lookup (O(1), no hashing).

Benchmarks

The repository includes a checked-in benchmark harness for ASCII, Unicode, and WASM cases. The inputs are public and the scripts are reproducible from the repo. Run it locally:

bun run bench:install
bun run bench:download
bun run bench:all

The harness compares multiple JS/TS implementations on public corpora and verifies equal match counts across libraries. The speed suite is anchored in the many-pattern exact-search workloads where Aho-Corasick is meant to win, rather than single-call toy regex cases.

Representative baseline from the checked-in public harness on this machine:

  • runtime: Bun 1.3.12
  • platform: macOS 26.4.1 (Darwin arm64)

| Scenario | @stll/aho-corasick | Best compared JS/TS result | Relative | | ----------------------------------- | -------------------- | -------------------------- | -------- | | bible.txt, 4.0 MB, 20 terms | 2.27 ms | 69.25 ms | 30.5x | | world192.txt, 2.5 MB, 20 terms| 1.55 ms | 41.88 ms | 26.9x | | E.coli, 4.6 MB, 16 codons | 0.83 ms | 11.00 ms | 13.3x | | bible.txt, single-pattern baseline| 0.67 ms | 20.10 ms | 29.9x |

API

AhoCorasick

| Method | Returns | Description | | ------------------------------------- | ------------- | --------------------------------------------------- | | new AhoCorasick(patterns, options?) | instance | Build automaton | | .findIter(haystack) | Match[] | Non-overlapping matches | | .findOverlappingIter(haystack) | Match[] | All overlapping matches | | .isMatch(haystack) | boolean | Any pattern matches? | | .replaceAll(haystack, replacements) | string | Replace matched patterns | | .findIterBuf(haystack) | ByteMatch[] | Matches in a Buffer / Uint8Array (byte offsets) | | .isMatchBuf(haystack) | boolean | Any pattern matches in a Buffer / Uint8Array? | | .patternCount | number | Number of patterns |

StreamMatcher

| Method | Returns | Description | | --------------------------------------- | ------------- | ------------------------------------------ | | new StreamMatcher(patterns, options?) | instance | Build streaming matcher | | .write(chunk) | ByteMatch[] | Feed chunk, get global byte-offset matches | | .flush() | ByteMatch[] | Finalize stream | | .reset() | void | Reset for reuse |

Types

type MatchKind = "leftmost-first" | "leftmost-longest";

type Options = {
  matchKind?: MatchKind;
  caseInsensitive?: boolean;
  wholeWords?: boolean;
  dfa?: boolean;
};

// Returned by string methods (findIter, etc.)
type Match = {
  pattern: number; // index into patterns array
  start: number; // UTF-16 code unit offset
  end: number; // exclusive
  text: string; // matched substring
};

// Returned by Buffer/streaming methods
type ByteMatch = {
  pattern: number; // index into patterns array
  start: number; // byte offset
  end: number; // exclusive
};

Error handling

  • new AhoCorasick([...]) throws if the automaton cannot be built (e.g. patterns exceed internal size limits).
  • replaceAll(haystack, replacements) throws if replacements.length !== patternCount.
  • Empty patterns arrays are valid; all search methods return no matches.

Limitations

  • Case insensitivity uses Unicode simple case folding, not locale-specific collation. It handles one-to-one folds like Turkish İ -> i and ẞ -> ß, but it does not perform full multi-character expansions such as ß -> ss.
  • WASM requires SharedArrayBuffer. Browser builds need Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp headers. Edge runtimes without WASM support (some Cloudflare Workers configurations) are not supported.

Acknowledgements

This package is a thin binding layer. The hard work is done by:

  • aho-corasick by Andrew Gallant (BurntSushi) — the Rust implementation of the Aho-Corasick algorithm. MIT licensed.
  • NAPI-RS — the Rust framework for building Node.js native addons. MIT licensed.

Development

# Install dependencies
bun install

# Build native module (requires Rust toolchain)
bun run build

# Run tests (113 tests, including Unicode edge cases)
bun test

# Download benchmark corpora
bun run bench:download

# Install benchmark dependencies (alternatives)
bun run bench:install

# Run benchmarks
bun run bench:speed       # Canterbury corpus
bun run bench:unicode     # Leipzig corpora
bun run bench:correctness # cross-library comparison
bun run bench:all         # all three

# Lint & format
bun run lint
bun run format

License

MIT