npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@picosearch/picosearch

v3.0.0-rc.8

Published

Minimalistic full-text search, zero dependencies, local-first, browser-compatible.

Downloads

233

Readme

picosearch

Minimalistic full-text search implemented in Typescript.

    🔎 Full text search using the BM25F algorithm for multi-field matching     🈯 Fully typed with TypeScript     🧐 Benchmark tests in CI/CD     ♻️ JSON-serializable indexes

Installation

yarn add @picosearch/picosearch

Quick Start

import { Picosearch } from '@picosearch/picosearch';

type MyDoc = {
  id: string;
  text: string;
  additionalText: string;
};

const documents: MyDoc[] = [
  { id: '1', text: 'The quick brown fox', additionalText: 'A speedy canine' },
  { id: '2', text: 'Jumps over the lazy dog', additionalText: 'High leap' },
  { id: '3', text: 'Bright blue sky', additionalText: 'Clear and sunny day' },
];

const pico = new Picosearch<MyDoc>();
pico.insertMultipleDocuments(documents);
console.log(pico.searchDocuments('fox'));
// returns
//[
//  {
//    "id": "1",
//    "score": 0.5406145489041012,
//    "doc": {
//      "id": "1",
//      "text": "The quick brown fox",
//      "additionalText": "A speedy canine"
//    }
//  }
//]

Please note that currently, a document must be flat, can only contain string values, and needs an id field (also a string)!

Syncing

Picosearch natively supports syncing with a local storage and a remote file server (read only). Both of these components are optional.

TODO: add docs

Language-specific Preprocessing

By default, only a generic preprocessing is being done (simple regex tokenizer + lowercasing). It is highly recommended to replace this with language-specific options. Currently, the following languages have an additional package for pre-processing:

  • English (@picosearch/language-english)
  • German (@picosearch/language-german)

After installing it, use it like this:

import { Picosearch } from '@picosearch/picosearch';
import * as englishOptions from '@picosearch/language-english';
const pico = new Picosearch<Doc>({ ...englishOptions });

Create an issue if you need another language!

Custom Preprocessing

You can also provide a custom tokenizer (for splitting a document into words/tokens) and analyzer (processing a single token before indexing it). Just implement the types Tokenizer and Analyzer and provide these implementations to the constructor. Example:

import {
  Picosearch,
  type Analyzer,
  type Tokenizer,
} from '@picosearch/picosearch';

const myTokenizer: Tokenizer = (doc: string): string[] => doc.split(' ');

const myAnalyzer: Analyzer = (token: string): string =>
  // when the analyzer returns '', it is removed
  ['and', 'I'].includes(token) ? '' : token.toLowerCase();

const pico = new Picosearch({
  tokenizer: myTokenizer,
  analyzer: myAnalyzer,
});

JSON Serialization

Indexes can be exported to and imported from JSON. This is useful, for example, for performing the more compute-heavy indexing offline when the search runtime is in the browser. It is very important that you pass the same tokenizer and analyzer in the new instance and don't change any other constructor options. Here's an example:

import { Picosearch } from '@picosearch/picosearch';
import * as englishOptions from '@picosearch/language-english';
const pico = new Picosearch<Doc>({ ...englishOptions, keepDocuments: true });
// ...index documents

const jsonIndex = pico.toJSON() 

const fromSerialized = new Picosearch<Doc>({ ...englishOptions, jsonIndex });

Beware of the keepDocuments option! You might want to change it to false if you only need the index for search and can get individual documents at runtime via their ID another way.

Benchmark

The CI/CD pipeline includes a benchmarking step to ensure there are no performance regressions. It currently validates against three datasets of the BEIR benchmark. The performance is checked to be the same or slightly higher (due to multi-field matching) compared to the BM25 baseline.

| | scidocs | nfcorpus | scifact | |-------------------------------------|---------|----------|---------| | Picosearch+English (BM25F) | 15.6% | 32.9% | 69.0% | | Baseline (BM25) [1] | 15.8% | 32.5% | 66.5% |

[1] Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., & Gurevych, I. (2021). BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. Ubiquitous Knowledge Processing Lab (UKP-TUDA), Department of Computer Science, Technische Universität Darmstadt. Retrieved from https://arxiv.org/pdf/2104.08663