seshat-trie

v1.0.2

Published

3 months ago

A trie library for JavaScript with C++ bindings.

0High
0Medium
0Low

grml

trie radix trie seshat seshat-trie c++javascript typescript

Seshat Trie

Image source: Wikipedia - Seshat

A high-performance radix trie (prefix tree) for Node.js with a C++ core and TypeScript API.

What is this?

A radix trie implementation with C++ performance for fast prefix-based searches. Useful for autocomplete, spell checking, and dictionary lookups where you need to quickly find words by their prefixes.

Install

Prerequisites: a working C/C++ toolchain for node-gyp (Python, make, compiler). On Linux, install your distro's build tools. This happened because of the way a few of my accounts are set up (some of my accounts are for school and whatnot, which have 2FA mandatorily enabled, breaking github actions and more)

npm install seshat-trie

Performance-Notes

Due to N-API overhead when crossing the JavaScript/C++ boundary, individual operations (especially small batch inserts) may be slower than expected on some systems (higher single-core performance is better for this library). For bulk insertions, use insertFromFile() or insertFromFileAsync() which handle file I/O entirely in C++, avoiding per-word marshalling costs.

Benchmarks

Honestly a little nervous about this, but there is a Benchmarks results file. Hope you enjoy this repo. (I felt like this project may have under performed in some ways I did not expect after working on the C++ version)

JSON import sample schema

{
	"words": [
		"aa",
		"aah",
		"aahed",
		"aahing",
		"aahs",
		"aal",
		"aalii",
		"aaliis",
		"aals"
	],
	"options": {
		"ignoreCase": false
	}
}

OS prerequisites for local compilation

Linux
- Python 3, make, C/C++ compiler and headers
- Debian/Ubuntu: sudo apt-get install -y build-essential python3
- Fedora/RHEL: sudo dnf install -y @development-tools python3
- Arch: sudo pacman -S --needed base-devel python
macOS
- Xcode Command Line Tools and Python 3
- Install: xcode-select --install and brew install python
Windows
- Visual Studio Build Tools (workload: Desktop development with C++) and Python 3
- Python: winget install Python.Python.3.11 or download from python.org
- If multiple VS versions are installed: npm config set msvs_version 2019 (or 2022)

Supported Node versions: Node 18, 20, or 22 (see engines field). npm install will build the native addon locally via node-gyp.

Quick start

import { Seshat } from "seshat-trie";

const trie = new Seshat({ ignoreCase: false });

trie.insert("hello");
trie.insertBatch(["world", "help"]);

console.log(trie.search("hello")); // true
console.log(trie.startsWith("he")); // true
console.log(trie.getWordsWithPrefix("he")); // ["help", "hello"]

// Insert from a file (one word per line)
// Default buffer size is 1MB; you can pass a custom number of bytes as 2nd arg
// const count = trie.insertFromFile("./words.txt", 1024);

// Async variant (Node-style callback)
// trie.insertFromFileAsync("./words.txt", 1024, (err, count) => {
//   if (err) return console.error(err);
//   console.log("inserted", count);
// });

API

Methods are synchronous unless noted.

constructor(options?)
- options.words?: string[] initial words to insert
- options.ignoreCase?: boolean default false
insert(word: string): void
insertBatch(words: string[]): number returns count inserted
insertFromFile(filePath: string, bufferSize?: number): number words per line
insertFromFileAsync(filePath: string, bufferSize?: number, cb: (err: Error | null, count?: number) => void): void
search(word: string): boolean
searchBatch(words: string[]): boolean[]
startsWith(prefix: string): boolean
getWordsWithPrefix(prefix: string): string[]
remove(word: string): boolean
removeBatch(words: string[]): boolean[]
removeMany(words: string[]): boolean[] (helper built on remove)
isEmpty(): boolean
size(): number
clear(): void
getStats(): { wordCount: number; isEmpty: boolean; allWords: string[] }
getHeightStats(): { minHeight: number; maxHeight: number; averageHeight: number; modeHeight: number; allHeights: number[] }
getMemoryStats(): { totalBytes: number; nodeCount: number; stringBytes: number; overheadBytes: number; bytesPerWord: number }
getWordMetrics(): { minLength: number; maxLength: number; averageLength: number; modeLength: number; lengthDistribution: number[]; totalCharacters: number }
patternSearch(pattern: string): string[] supports * and ? wildcards
toJSON(): { words: string[]; options: { ignoreCase: boolean } }
static fromJSON(json): Seshat
static fromWords(words: string[], options?): Seshat

Errors and validation

Empty or whitespace-only words throw on insert.
Non-string inputs throw where a string is required.
insertFromFile throws if bufferSize is not a positive number or file read fails.
insertFromFileAsync reports errors via the callback err parameter.

Case handling

When ignoreCase is true, inputs are lowercased internally. Returned words reflect stored (normalized) casing.

File input format

insertFromFile reads a UTF-8 text file.
One word per line.
Default buffer size is 1MB; pass bufferSize in bytes to override.

Benchmarks (optional)

Prepare word lists:

npm run setupTextFiles

Run benchmarks:

npm run benchmark
npm run benchmark:filestream

Development

Build native addon and TypeScript:

npm run build

Run tests:

npm test

Entry points:

Runtime: dist/index.js
Types: dist/index.d.ts
Native addon: build/Release/seshat.node

License

MIT