@samyok/annoy
v1.1.0
Published
Native Node.js bindings for Spotify's Annoy — fast approximate nearest neighbor search
Maintainers
Readme
@samyok/annoy
Native Node.js bindings for Spotify's Annoy (Approximate Nearest Neighbors Oh Yeah).
Annoy is a C++ library for searching for points in space that are close to a given query point. It creates read-only, file-based data structures that are memory-mapped, making it possible to share indexes across processes. It's used at Spotify for music recommendations.
Features
- Fast — native C++ addon via Node-API (N-API), no FFI overhead
- ABI-stable — built on N-API for compatibility across Node.js versions without recompilation
- Full API — all Annoy operations: build, save, load, query by item, query by vector, on-disk build
- TypeScript — first-class type definitions with overloaded signatures
- Multiple metrics — Angular, Euclidean, Manhattan, and Dot Product distance
- Multi-threaded builds — leverages Annoy's parallel tree construction
Installation
npm install @samyok/annoyRequires a C++ compiler (Xcode Command Line Tools on macOS, build-essential on Linux, MSVC on Windows).
Quick Start
import { AnnoyIndex } from "@samyok/annoy";
// Create an index with 40 dimensions using angular distance
const index = new AnnoyIndex(40, "angular");
// Add some vectors
for (let i = 0; i < 1000; i++) {
const vector = Array.from({ length: 40 }, () => Math.random());
index.addItem(i, vector);
}
// Build 10 trees — more trees = better accuracy, more memory
index.build(10);
// Find 10 nearest neighbors of item 0
const neighbors = index.getNnsByItem(0, 10);
console.log(neighbors); // [0, 234, 581, ...]
// Save to disk
index.save("index.ann");API Reference
new AnnoyIndex(dimensions, metric?)
Create a new index.
| Parameter | Type | Default | Description |
|---|---|---|---|
| dimensions | number | | Number of dimensions in the vectors |
| metric | Metric | "angular" | Distance metric to use |
Metrics:
| Metric | Description |
|---|---|
| "angular" | Angular distance (cosine distance) |
| "euclidean" | Euclidean distance (L2) |
| "manhattan" | Manhattan distance (L1) |
| "dot" | Dot product (higher = more similar) |
Building an Index
addItem(index, vector)
Add a vector to the index at the given position.
index.addItem(0, [1.0, 0.5, -0.3, ...]);build(numTrees, numThreads?)
Build the search trees. Must be called before querying.
index.build(10); // 10 trees, auto threads
index.build(10, 4); // 10 trees, 4 threadsunbuild()
Remove the trees so you can add more items. The items are preserved.
onDiskBuild(filename)
Prepare the index to be built directly on disk, useful for indexes that exceed available RAM.
index.onDiskBuild("large_index.ann");
for (let i = 0; i < 10_000_000; i++) {
index.addItem(i, vector);
}
index.build(10);Querying
getNnsByItem(item, n, searchK?, includeDistances?)
Find the n nearest neighbors of a given item.
// Just indices
const neighbors = index.getNnsByItem(0, 10);
// [0, 42, 108, ...]
// With distances
const result = index.getNnsByItem(0, 10, -1, true);
// { neighbors: [0, 42, 108, ...], distances: [0, 0.234, 0.312, ...] }getNnsByVector(vector, n, searchK?, includeDistances?)
Find the n nearest neighbors of an arbitrary vector.
const query = [1.0, 0.5, -0.3, /* ... */];
const neighbors = index.getNnsByVector(query, 10);| Parameter | Type | Default | Description |
|---|---|---|---|
| n | number | | Number of neighbors to return |
| searchK | number | -1 | Number of nodes to search. -1 uses the default (n * numTrees). Higher values give better accuracy at the cost of speed. |
| includeDistances | boolean | false | When true, returns { neighbors, distances } instead of just indices |
Persistence
save(filename, prefault?)
Save the index to disk.
index.save("my_index.ann");load(filename, prefault?)
Load a previously saved index. The file is memory-mapped, so loading is near-instant and multiple processes can share the same file.
const index = new AnnoyIndex(40, "angular");
index.load("my_index.ann");Utility Methods
| Method | Returns | Description |
|---|---|---|
| getItem(i) | number[] | Get the vector for item i |
| getDistance(i, j) | number | Get distance between items i and j |
| getNItems() | number | Number of items in the index |
| getNTrees() | number | Number of trees in the index |
| getF() | number | Number of dimensions |
| setSeed(seed) | void | Set random seed for reproducible builds |
| verbose(v) | void | Enable/disable verbose logging |
| unload() | void | Free the index from memory |
Choosing Parameters
numTrees — controls build time and index size vs. query accuracy. More trees means higher accuracy but uses more memory and takes longer to build. A good starting point is 10.
searchK — controls query time vs. accuracy at runtime. Default is n * numTrees. Increase for better accuracy, decrease for faster queries.
| Use Case | Trees | searchK | |---|---|---| | Quick prototype | 10 | default | | Production (balanced) | 50-100 | 5000-10000 | | Maximum accuracy | 100+ | 50000+ |
Interop with Python
Indexes created with the Python annoy package can be loaded directly:
# Python
from annoy import AnnoyIndex
t = AnnoyIndex(40, 'angular')
# ... add items, build ...
t.save('index.ann')// Node.js
import { AnnoyIndex } from "@samyok/annoy";
const index = new AnnoyIndex(40, "angular");
index.load("index.ann");
const neighbors = index.getNnsByItem(0, 10);Requirements
- Node.js >= 16
- C++17 compiler
- macOS: Xcode Command Line Tools (
xcode-select --install) - Linux:
build-essential(apt install build-essential) - Windows: Visual Studio Build Tools with C++ workload
- macOS: Xcode Command Line Tools (
License
Apache-2.0 — same as Spotify's Annoy.
