npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

albex

v0.1.0

Published

Local full-text search engine — documents never leave the browser

Readme

Albex

Local full-text search for documents. Runs entirely in the browser — no server, no upload, no network request after the initial load.

Drop a DOCX, PDF, XLSX, TXT or XML file, start typing, get results in milliseconds.


Features

  • Zero server — all text stays on the user's machine.
  • Fuzzy matching — finds "contrato" even if you type "conttrato" (adaptive edit distance).
  • Accent-insensitive — "accion" matches "acción", "espana" matches "España".
  • Multi-format — DOCX, XLSX, PDF (text-based), TXT, XML.
  • Phrase search"contrato marco" requires the words to appear together.
  • OR searchcontrato | acuerdo unions two independent searches.
  • No dependencies — one TypeScript file, two WASM binaries, nothing else.
  • Tiny footprint — main WASM is ~14 KB on disk; PDF module (~1 MB) loads on demand.

Installation

npm install albex

Or copy dist/albex.js, wasm/pkg/albex_wasm_bg.wasm (and optionally albex_pdf.wasm) to your project.


Quick start

import { AlbexEngine } from 'albex';

const engine = new AlbexEngine({
  wasmUrl:    '/assets/albex_wasm_bg.wasm',
  pdfWasmUrl: '/assets/albex_pdf.wasm',   // only needed for PDFs
});

await engine.init();

// Index a file from a <input type="file"> or drag-and-drop
const file = inputElement.files[0];
const doc  = await engine.indexFile(file);
console.log(`Indexed ${doc.chunks} chunks in ${doc.indexTimeMs.toFixed(0)} ms`);

// Search
const results = engine.search('contrato marco');
for (const r of results) {
  console.log(`[${r.score}] ${r.documentName} — ${r.snippet}`);
}

Supported formats

| Extension | How text is extracted | |-----------|----------------------| | .docx | Native Rust/WASM XML parser — reads word/document.xml directly | | .xlsx | Native Rust/WASM XML parser — reads shared strings + inline strings | | .pdf | Separate albex_pdf.wasm (pure Rust, loaded on demand) | | .txt | Plain text split on double newlines | | .xml | Tag-stripped, entity-decoded |


Query syntax

| Input | Behaviour | |-------|-----------| | contrato | Fuzzy match, accent-insensitive | | contrato marco | Both words must appear in the same chunk | | "contrato marco" | Both words AND they must be adjacent (phrase) | | contrato \| acuerdo | OR: returns results matching either term |

Up to 4 space-separated tokens per simple/phrase query. OR branches are unlimited.


API reference

new AlbexEngine(opts)

interface AlbexOptions {
  wasmUrl:     string;   // required
  pdfWasmUrl?: string;   // required only for PDF indexing
}

engine.init(): Promise<void>

Fetches and initialises the main WASM module. Must be called before anything else.

engine.indexFile(file: File): Promise<IndexedDocument>

Detects the file format by extension, extracts text, and adds it to the search index. Throws for unsupported extensions or parse errors.

interface IndexedDocument {
  name:        string;
  ext:         string;
  chunks:      number;   // number of indexed text chunks
  indexTimeMs: number;
  textBytes:   number;   // raw UTF-8 text indexed
}

engine.search(query: string): SearchResult[]

Returns results sorted by score (0–1000, descending).

interface SearchResult {
  documentName: string;
  location:     number;   // paragraph (DOCX/TXT) or page (PDF, 1-based)
  score:        number;   // 0–1000
  snippet:      string;   // full chunk text (original, with accents)
  matchStart:   number;   // byte offset of match in snippet
  matchEnd:     number;   // exclusive
}

engine.getStats(): EngineStats

interface EngineStats {
  documents:       number;
  chunks:          number;
  textUsed:        number;   // bytes
  textCapacity:    number;   // 16 MB hard cap
  wasmMemoryBytes: number;
}

engine.getLastSearchStats(): SearchStats | null

Bloom/Bitap pipeline counters from the most recent search — useful for debugging and UI dashboards.

interface SearchStats {
  query:        string;
  timeMs:       number;
  results:      number;
  bloomTested:  number;   // chunks tested
  bloomPassed:  number;   // passed bloom pre-filter
  bitapMatched: number;   // confirmed by Bitap
}

Tuning

engine.setMaxErrors(n);     // 0–3  (default 2, auto-scaled by query length)
engine.setThreshold(n);     // 0–1000 minimum score (default 250)
engine.setMaxResults(n);    // 1–200 (default 50)

engine.reset()

Clears all indexed documents. The engine is ready to index new files immediately after.


Capacity

| Resource | Limit | |----------|-------| | Documents | 128 | | Chunks | 100 000 | | Total text | 16 MB | | Query length | 64 characters (longer queries are truncated) | | Results | 200 (configurable, default 50) |

These are hard-coded BSS limits in the WASM module. Exceeding them is silent — the engine stops indexing additional content without error.


Browser requirements

  • WebAssembly (all modern browsers since 2017)
  • DecompressionStream for DOCX/XLSX (Chrome 80+, Firefox 113+, Safari 16.4+)
  • String.prototype.normalize for phrase search (all modern browsers)

PDF support additionally requires the albex_pdf.wasm module to be served with the correct MIME type (application/wasm).


Building from source

# Install Rust + wasm-pack
rustup target add wasm32-unknown-unknown

# Build main WASM
cd wasm && cargo build --target wasm32-unknown-unknown --release
cp ../target/wasm32-unknown-unknown/release/albex_wasm.wasm pkg/albex_wasm_bg.wasm

# Build PDF WASM
cd ../pdf-wasm && cargo build --target wasm32-unknown-unknown --release
cp ../target/wasm32-unknown-unknown/release/albex_pdf.wasm ../wasm/pkg/albex_pdf.wasm

# Build TypeScript
cd .. && npm install && npm run build

Privacy

Albex does not transmit any document content. Text extraction, indexing, and search all happen inside the browser's WASM sandbox. The only network requests are the initial fetch of the .wasm binary files.


License

MIT