npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

hypgrep

v0.2.0

Published

[![mit license](https://img.shields.io/badge/License-MIT-orange.svg)](https://opensource.org/licenses/MIT) ![coverage](https://img.shields.io/badge/Coverage-95-darkred)

Readme

HypGrep

mit license coverage

Build a compact n-gram search index for a Parquet file using hyparquet and hyparquet-writer. Queries are case-insensitive substring matches — grep semantics over a precomputed index.

Why?

Enable efficient grep-style search on large Parquet datasets from any client without a server. Store your Parquet dataset on S3, generate a compact index file, and query it directly from a browser or other clients using HTTP range requests. The index tells you exactly which row blocks to fetch, so you only download the data you need.

Perfect for serverless architectures where you want to offer search capabilities without managing infrastructure.

CLI usage

Build an index:

npx hypgrep dataset.parquet [dataset.index.parquet]

Grep against the indexed file:

npx hypgrep search dataset.parquet 'serverless'          # literal substring
npx hypgrep search dataset.parquet '/eigen.+value/i'      # regex
npx hypgrep search dataset.parquet 'rhythm' --limit 5     # first N matches
npx hypgrep search dataset.parquet 'rhythm' -c            # count only
npx hypgrep search dataset.parquet 'rhythm' -i            # case-insensitive literal

To install as a system-wide CLI tool:

npm install -g hypgrep
hypgrep search dataset.parquet 'pattern'

Find rows in a parquet file in JavaScript

Use parquetFind to find rows containing the query as a substring while preserving natural row order (like Ctrl+F):

import { parquetFind } from 'hypgrep'

for await (const row of parquetFind({
  query: 'serverless',
  url: 'https://s3.hyperparam.app/hypgrep/wiki_en.parquet',
})) {
  console.log(row) // { title: '...', text: '...' }
}

The query matches as a contiguous substring (grep semantics): 'speed of light' matches rows containing that exact phrase, not rows where the words merely co-occur. Queries shorter than the indexed n-gram length (default 5) fall back to a full scan but still return correct results.

Regex queries

Pass a RegExp directly — mandatory literals are extracted from the pattern for index pruning, and regex.test runs against each row:

for await (const row of parquetFind({
  query: /eigen\w*value/i,
  url: '...',
})) ...

If the regex has no extractable literal (e.g. /./, /foo|bar/), the index can't prune and HypGrep does a full scan. The substring/regex filter still applies — results are correct, just unaccelerated.

If you want full control over the row predicate (e.g. a custom JS function), pass rowFilter. The string query is still used for index pruning while the callback decides which rows to keep:

for await (const row of parquetFind({
  query: 'eigen',
  rowFilter: row => myCustomCheck(row),
  url: '...',
})) ...

Ranked search

Use parquetSearch for Google-style ranked search: whitespace-separated words are ANDed (every word must appear), and results are ranked by total occurrence count:

import { parquetSearch } from 'hypgrep'

for await (const row of parquetSearch({
  query: 'serverless',
  url: 'https://s3.hyperparam.app/hypgrep/wiki_en.parquet',
})) {
  console.log(row) // most matches first
}

Create an index in JavaScript

import { asyncBufferFromFile } from 'hyparquet'
import { fileWriter } from 'hyparquet-writer'
import { createIndex } from 'hypgrep'

// Generate dataset.index.parquet from dataset.parquet
const sourceFile = await asyncBufferFromFile('dataset.parquet')
const indexFile = fileWriter('dataset.index.parquet')
await createIndex({ sourceFile, indexFile })

Local parquet files

To search against local parquet files, provide an asyncBufferFactory that loads the file from the local filesystem:

import { asyncBufferFromFile } from 'hyparquet'
import { parquetFind } from 'hypgrep'

// Loads parquet file from local filesystem
function asyncBufferFactory({ url }) {
  return asyncBufferFromFile(url)
}

for await (const row of parquetFind({
  query: 'serverless',
  url: 'dataset.parquet',
  asyncBufferFactory,
})) {
  console.log(row)
}