indexer-web
v1.0.0
Published
Search engine core for indexing text files with high performance.
Maintainers
Readme
indexer-web
High-performance inverted index for large text files, powered by Rust + WebAssembly and designed to work efficiently in browsers, Web Workers, and PWA.
Indexer processes data incrementally (streaming) and supports files from MBs to multiple GBs without blocking the main thread.
Features
- Very fast text indexing (Rust + WASM)
- Runs inside Web Workers (non-blocking UI)
- Supports URL, Blob, and Uint8Array
- Handles hundreds of MB / GB-scale files
- Inverted index (term → documents + frequency)
- High-level JS API and low-level WASM access
- Ready for PWA usage
Installation
npm install indexer-webQuick Start (High-level API)
import Indexer from "indexer-web";
const indexer = new Indexer();
// Index a document
const docId = await indexer.read(
"my-worker", // worker name (internal)
"Large document", // document title (stored in index)
"/huge-text-file.txt" // URL / Blob / Uint8Array
);
// Search
const results = await indexer.search("my-worker", "example", 10);
console.log(results);
/*
[
{ doc_id: 0, count: 42 },
...
]
*/
// Get document title
const title = await indexer.getTitleDocument("my-worker", docId);
console.log(title);High-level API
read(
workerName: string,
title: string,
data: string | Uint8Array | Blob,
onProgress?: (workerName: string, processedBytes: number) => void
): Promise<number>Parameters
| Name | Description |
| :----------: | :---------------------------------------: |
| workerName | Identifier of the worker instance |
| title | Document title (required by core indexer) |
| data | URL, Blob, or Uint8Array |
| onProgress | Optional progress callback |
search(
workerName: string,
query: string,
limit?: number
): Promise<EngineValue[]>Searches for a term in the indexed documents.
getTitleDocument(
workerName: string,
docId: number
): Promise<string>Returns the document title.
clear(workerName: string): voidTerminates the worker and frees memory.
clearAll(): voidTerminates all workers.
EngineValue
interface EngineValue {
doc_id: number;
count: number;
}doc_id– document identifiercount– number of occurrences in that document
Low-level API (WASM)
For advanced use cases you can access the WASM engine directly (no workers, no streaming wrapper).
import init, { Engine } from "indexer-web/dist/indexer_web";
await init();
const engine = new Engine();
const docId = engine.begin_document("Title");
engine.add_content(new Uint8Array(data));
engine.flush();
const results = engine.search("example", 10);When to use low-level API?
- Custom worker orchestration
- Node.js native usage
- Full control over memory & streaming
- Advanced experimentation
Low-level API runs synchronously and may block the main thread.
Why Web Workers?
Indexing large files can take seconds. Workers ensure:
- No UI freeze
- Smooth progress reporting
- Safe execution in browsers and PWAs
PWA Support
Indexer works in:
- Browser
- Web Workers
- Progressive Web Apps (PWA)
Service Worker integration is possible and planned.
Notes & Limitations
- Text is tokenized using ASCII rules
- Tokens shorter than 3 characters are ignored
- Case-insensitive (ASCII lowercase)
- Designed for text files, not binary formats
License
MIT © Mateusz Krasuski
