indexer-web

v1.0.0

Published

12 days ago

Search engine core for indexing text files with high performance.

0High
0Medium
0Low

inverted-index inverted-index-core ft-index text-index-core term-index ft-search-core search-core text-search-core ir-core ir-engine

indexer-web

High-performance inverted index for large text files, powered by Rust + WebAssembly and designed to work efficiently in browsers, Web Workers, and PWA.

Indexer processes data incrementally (streaming) and supports files from MBs to multiple GBs without blocking the main thread.

Features

Very fast text indexing (Rust + WASM)
Runs inside Web Workers (non-blocking UI)
Supports URL, Blob, and Uint8Array
Handles hundreds of MB / GB-scale files
Inverted index (term → documents + frequency)
High-level JS API and low-level WASM access
Ready for PWA usage

Installation

npm install indexer-web

Quick Start (High-level API)

import Indexer from "indexer-web";

const indexer = new Indexer();

// Index a document
const docId = await indexer.read(
  "my-worker",          // worker name (internal)
  "Large document",     // document title (stored in index)
  "/huge-text-file.txt" // URL / Blob / Uint8Array
);

// Search
const results = await indexer.search("my-worker", "example", 10);

console.log(results);
/*
[
  { doc_id: 0, count: 42 },
  ...
]
*/

// Get document title
const title = await indexer.getTitleDocument("my-worker", docId);
console.log(title);

High-level API

read(
  workerName: string,
  title: string,
  data: string | Uint8Array | Blob,
  onProgress?: (workerName: string, processedBytes: number) => void
): Promise<number>

Parameters

search(
  workerName: string,
  query: string,
  limit?: number
): Promise<EngineValue[]>

Searches for a term in the indexed documents.

getTitleDocument(
  workerName: string,
  docId: number
): Promise<string>

Returns the document title.

clear(workerName: string): void

Terminates the worker and frees memory.

clearAll(): void

Terminates all workers.

EngineValue

interface EngineValue {
doc_id: number;
count: number;
}

doc_id – document identifier
count – number of occurrences in that document

Low-level API (WASM)

For advanced use cases you can access the WASM engine directly (no workers, no streaming wrapper).

import init, { Engine } from "indexer-web/dist/indexer_web";

await init();

const engine = new Engine();

const docId = engine.begin_document("Title");
engine.add_content(new Uint8Array(data));
engine.flush();

const results = engine.search("example", 10);

When to use low-level API?

Custom worker orchestration
Node.js native usage
Full control over memory & streaming
Advanced experimentation

Low-level API runs synchronously and may block the main thread.

Why Web Workers?

Indexing large files can take seconds. Workers ensure:

No UI freeze
Smooth progress reporting
Safe execution in browsers and PWAs

PWA Support

Indexer works in:

Browser
Web Workers
Progressive Web Apps (PWA)

Service Worker integration is possible and planned.

Notes & Limitations

Text is tokenized using ASCII rules
Tokens shorter than 3 characters are ignored
Case-insensitive (ASCII lowercase)
Designed for text files, not binary formats

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

indexer-web

Features

Installation

Quick Start (High-level API)

High-level API

EngineValue

Low-level API (WASM)

When to use low-level API?

Why Web Workers?

PWA Support

Notes & Limitations

License