@samyok/annoy

v1.1.0

Published

5 days ago

Native Node.js bindings for Spotify's Annoy — fast approximate nearest neighbor search

0High
0Medium
0Low

annoy approximate-nearest-neighbors nearest-neighbor knn vector-search similarity-search embeddings native n-api node-addon

@samyok/annoy

Native Node.js bindings for Spotify's Annoy (Approximate Nearest Neighbors Oh Yeah).

Annoy is a C++ library for searching for points in space that are close to a given query point. It creates read-only, file-based data structures that are memory-mapped, making it possible to share indexes across processes. It's used at Spotify for music recommendations.

Features

Fast — native C++ addon via Node-API (N-API), no FFI overhead
ABI-stable — built on N-API for compatibility across Node.js versions without recompilation
Full API — all Annoy operations: build, save, load, query by item, query by vector, on-disk build
TypeScript — first-class type definitions with overloaded signatures
Multiple metrics — Angular, Euclidean, Manhattan, and Dot Product distance
Multi-threaded builds — leverages Annoy's parallel tree construction

Installation

npm install @samyok/annoy

Requires a C++ compiler (Xcode Command Line Tools on macOS, build-essential on Linux, MSVC on Windows).

Quick Start

import { AnnoyIndex } from "@samyok/annoy";

// Create an index with 40 dimensions using angular distance
const index = new AnnoyIndex(40, "angular");

// Add some vectors
for (let i = 0; i < 1000; i++) {
  const vector = Array.from({ length: 40 }, () => Math.random());
  index.addItem(i, vector);
}

// Build 10 trees — more trees = better accuracy, more memory
index.build(10);

// Find 10 nearest neighbors of item 0
const neighbors = index.getNnsByItem(0, 10);
console.log(neighbors); // [0, 234, 581, ...]

// Save to disk
index.save("index.ann");

API Reference

`new AnnoyIndex(dimensions, metric?)`

Create a new index.

| Parameter | Type | Default | Description | |---|---|---|---| | dimensions | number | | Number of dimensions in the vectors | | metric | Metric | "angular" | Distance metric to use |

Metrics:

| Metric | Description | |---|---| | "angular" | Angular distance (cosine distance) | | "euclidean" | Euclidean distance (L2) | | "manhattan" | Manhattan distance (L1) | | "dot" | Dot product (higher = more similar) |

Building an Index

`addItem(index, vector)`

Add a vector to the index at the given position.

index.addItem(0, [1.0, 0.5, -0.3, ...]);

`build(numTrees, numThreads?)`

Build the search trees. Must be called before querying.

index.build(10);      // 10 trees, auto threads
index.build(10, 4);   // 10 trees, 4 threads

`unbuild()`

Remove the trees so you can add more items. The items are preserved.

`onDiskBuild(filename)`

Prepare the index to be built directly on disk, useful for indexes that exceed available RAM.

index.onDiskBuild("large_index.ann");
for (let i = 0; i < 10_000_000; i++) {
  index.addItem(i, vector);
}
index.build(10);

Querying

`getNnsByItem(item, n, searchK?, includeDistances?)`

Find the n nearest neighbors of a given item.

// Just indices
const neighbors = index.getNnsByItem(0, 10);
// [0, 42, 108, ...]

// With distances
const result = index.getNnsByItem(0, 10, -1, true);
// { neighbors: [0, 42, 108, ...], distances: [0, 0.234, 0.312, ...] }

`getNnsByVector(vector, n, searchK?, includeDistances?)`

Find the n nearest neighbors of an arbitrary vector.

const query = [1.0, 0.5, -0.3, /* ... */];
const neighbors = index.getNnsByVector(query, 10);

| Parameter | Type | Default | Description | |---|---|---|---| | n | number | | Number of neighbors to return | | searchK | number | -1 | Number of nodes to search. -1 uses the default (n * numTrees). Higher values give better accuracy at the cost of speed. | | includeDistances | boolean | false | When true, returns { neighbors, distances } instead of just indices |

Persistence

`save(filename, prefault?)`

Save the index to disk.

index.save("my_index.ann");

`load(filename, prefault?)`

Load a previously saved index. The file is memory-mapped, so loading is near-instant and multiple processes can share the same file.

const index = new AnnoyIndex(40, "angular");
index.load("my_index.ann");

Utility Methods

| Method | Returns | Description | |---|---|---| | getItem(i) | number[] | Get the vector for item i | | getDistance(i, j) | number | Get distance between items i and j | | getNItems() | number | Number of items in the index | | getNTrees() | number | Number of trees in the index | | getF() | number | Number of dimensions | | setSeed(seed) | void | Set random seed for reproducible builds | | verbose(v) | void | Enable/disable verbose logging | | unload() | void | Free the index from memory |

Choosing Parameters

numTrees — controls build time and index size vs. query accuracy. More trees means higher accuracy but uses more memory and takes longer to build. A good starting point is 10.

searchK — controls query time vs. accuracy at runtime. Default is n * numTrees. Increase for better accuracy, decrease for faster queries.

| Use Case | Trees | searchK | |---|---|---| | Quick prototype | 10 | default | | Production (balanced) | 50-100 | 5000-10000 | | Maximum accuracy | 100+ | 50000+ |

Interop with Python

Indexes created with the Python annoy package can be loaded directly:

# Python
from annoy import AnnoyIndex
t = AnnoyIndex(40, 'angular')
# ... add items, build ...
t.save('index.ann')

// Node.js
import { AnnoyIndex } from "@samyok/annoy";
const index = new AnnoyIndex(40, "angular");
index.load("index.ann");
const neighbors = index.getNnsByItem(0, 10);

Requirements

Node.js >= 16
C++17 compiler
- macOS: Xcode Command Line Tools (xcode-select --install)
- Linux: build-essential (apt install build-essential)
- Windows: Visual Studio Build Tools with C++ workload

License

Apache-2.0 — same as Spotify's Annoy.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@samyok/annoy

Features

Installation

Quick Start

API Reference

new AnnoyIndex(dimensions, metric?)

Building an Index

addItem(index, vector)

build(numTrees, numThreads?)

unbuild()

onDiskBuild(filename)

Querying

getNnsByItem(item, n, searchK?, includeDistances?)

getNnsByVector(vector, n, searchK?, includeDistances?)

Persistence

save(filename, prefault?)

load(filename, prefault?)

Utility Methods

Choosing Parameters

Interop with Python

Requirements

License

`new AnnoyIndex(dimensions, metric?)`

`addItem(index, vector)`

`build(numTrees, numThreads?)`

`unbuild()`

`onDiskBuild(filename)`

`getNnsByItem(item, n, searchK?, includeDistances?)`

`getNnsByVector(vector, n, searchK?, includeDistances?)`

`save(filename, prefault?)`

`load(filename, prefault?)`