npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

elid

v0.4.0

Published

Embedding Locality IDentifier - encode embeddings into sortable string IDs for vector search without vector stores, plus fast string similarity algorithms

Downloads

449

Readme

ELID - Embedding Locality IDentifier

CI License

ELID enables vector search without a vector store by encoding high-dimensional embeddings into sortable string IDs that preserve locality. Similar vectors produce similar IDs, allowing you to use standard database indexes for similarity search.

ELID also includes a complete suite of fast string similarity algorithms.

Features

Embedding Encoding (Vector Search Without Vector Stores)

Convert embeddings from any ML model into compact, sortable identifiers:

| Profile | Output | Best For | |---------|--------|----------| | Mini128 | 26-char base32hex | Fast similarity via Hamming distance | | Morton10x10 | 20-char base32hex | Database range queries (Z-order) | | Hilbert10x10 | 20-char base32hex | Maximum locality preservation |

Key benefits:

  • Similar vectors produce similar IDs (locality preservation)
  • IDs are lexicographically sortable for database indexing
  • No vector store required - use any database with string indexes
  • Deterministic: same embedding always produces the same ID

String Similarity Algorithms

| Algorithm | Type | Best For | |-----------|------|----------| | Levenshtein | Edit distance | General-purpose comparison, spell checking | | Normalized Levenshtein | Similarity (0-1) | When you need a percentage match | | Jaro | Similarity (0-1) | Short strings | | Jaro-Winkler | Similarity (0-1) | Names and record linkage | | Hamming | Distance | Fixed-length strings, DNA, error codes | | OSA | Edit distance | Typo detection (counts transpositions) | | SimHash | LSH fingerprint | Database-queryable similarity, near-duplicate detection | | Best Match | Composite (0-1) | When unsure which algorithm fits |

Installation

Rust

# String similarity only (zero dependencies)
[dependencies]
elid = "0.1"

# Embedding encoding
[dependencies]
elid = { version = "0.1", features = ["embeddings"] }

# Both features
[dependencies]
elid = { version = "0.1", features = ["strings", "embeddings"] }

Python

pip install elid

JavaScript (WASM)

npm install elid-wasm

C/C++

Build with cargo build --release --features ffi to get libelid.so and elid.h.

Quick Start

Embedding Encoding (Rust)

use elid::embeddings::{encode, Profile, Elid};

// Get an embedding from your ML model (e.g., OpenAI, Cohere, sentence-transformers)
let embedding: Vec<f32> = model.embed("Hello, world!")?;

// Encode to a sortable ELID
let profile = Profile::default(); // Mini128
let elid: Elid = encode(&embedding, &profile)?;

println!("ELID: {}", elid); // e.g., "01a3f5g7h9jklmnopqrstuv"

// Similar texts produce similar ELIDs
let elid2 = encode(&model.embed("Hello, universe!")?, &profile)?;

// Compare similarity via Hamming distance
use elid::embeddings::hamming_distance;
let distance = hamming_distance(&elid, &elid2)?; // Lower = more similar

Encoding Profiles

use elid::embeddings::Profile;

// Mini128: 128-bit SimHash (default)
// Best for: Fast similarity search via Hamming distance
let mini = Profile::Mini128 {
    seed: 0x454c4944_53494d48, // Deterministic seed
};

// Morton10x10: Z-order curve encoding
// Best for: Database range queries
let morton = Profile::Morton10x10 {
    dims: 10,
    bits_per_dim: 10,
    transform_id: None,
};

// Hilbert10x10: Hilbert curve encoding
// Best for: Maximum locality preservation
let hilbert = Profile::Hilbert10x10 {
    dims: 10,
    bits_per_dim: 10,
    transform_id: None,
};

String Similarity (Rust)

use elid::*;

// Edit distance
let distance = levenshtein("kitten", "sitting"); // 3

// Normalized similarity (0.0 to 1.0)
let similarity = normalized_levenshtein("hello", "hallo"); // 0.8

// Name matching
let similarity = jaro_winkler("Martha", "Marhta"); // 0.961

// SimHash for database queries
let hash = simhash("iPhone 14");
let sim = simhash_similarity("iPhone 14", "iPhone 15"); // ~0.92

// Find best match in a list
let candidates = vec!["apple", "application", "apply"];
let (idx, score) = find_best_match("app", &candidates);

Python

import elid

# String similarity
elid.levenshtein("kitten", "sitting")  # 3
elid.jaro_winkler("martha", "marhta")  # 0.961
elid.simhash_similarity("iPhone 14", "iPhone 15")  # 0.922

# Embedding encoding (with embeddings feature)
embedding = model.embed("Hello, world!")
elid_str = elid.encode_embedding(embedding)

JavaScript

import init, { levenshtein, jaroWinkler, simhashSimilarity } from 'elid';

await init();
levenshtein("kitten", "sitting");  // 3
jaroWinkler("martha", "marhta");   // 0.961
simhashSimilarity("iPhone 14", "iPhone 15");  // 0.922

Configuration

Use SimilarityOpts for case-insensitive or whitespace-trimmed comparisons:

use elid::{levenshtein_with_opts, SimilarityOpts};

let opts = SimilarityOpts {
    case_sensitive: false,
    trim_whitespace: true,
    ..Default::default()
};
let distance = levenshtein_with_opts("  HELLO  ", "hello", &opts); // 0

Feature Flags

| Feature | Description | Dependencies | |---------|-------------|--------------| | strings | String similarity algorithms (default) | None | | embeddings | Embedding encoding (default) | rand, blake3, etc. | | models | Base ONNX model support | tract-onnx | | models-text | Text embedding (Model2Vec, 256-dim) | models | | models-image | Image embedding (MobileNetV3, 1024-dim) | models, image | | wasm | WebAssembly bindings (includes embeddings) | wasm-bindgen, js-sys, getrandom | | python | Python bindings via PyO3 (includes embeddings) | pyo3, numpy, rayon | | ffi | C FFI bindings | None (enables unsafe) |

Performance

  • Zero external dependencies for string-only use
  • O(min(m,n)) space-optimized Levenshtein
  • 1.4M+ string comparisons per second (Python benchmarks)
  • ~96KB WASM binary (strings only)
  • Embedding encoding: <1ms per vector

Built-in Embedding Models

ELID includes optional ONNX models for generating embeddings directly, without external API calls. Models are bundled via separate packages:

| Package | Model | Dimensions | Size | |---------|-------|------------|------| | elid-text | Model2Vec potion-base-8M | 256 | ~8MB | | elid-image | MobileNetV3-Small | 1024 | ~5MB |

Text embeddings:

use elid::models::embed_text;

let embedding = embed_text("Hello, world!")?;
assert_eq!(embedding.len(), 256);

Image embeddings:

use elid::models::embed_image;

let bytes = std::fs::read("photo.jpg")?;
let embedding = embed_image(&bytes)?;
assert_eq!(embedding.len(), 1024);

LSH Bands for Database Querying

Convert embeddings to LSH bands for efficient database similarity search:

import { embeddingToBands } from 'elid';

// Split embedding into 4 bands (32 bits each)
const bands = embeddingToBands(embedding, 4);

// Store bands in database columns
// Query with OR across bands for approximate nearest neighbors:
// SELECT * FROM embeddings WHERE band0 = ? OR band1 = ? OR band2 = ? OR band3 = ?
use elid::embeddings::embedding_to_bands;

let bands = embedding_to_bands(&embedding, 4, 0x454c4944_53494d48);
// bands: Vec<String> with 4 base32hex-encoded band strings

Use Cases

Vector Search Without Vector Stores

Store ELIDs directly in PostgreSQL, SQLite, or any database:

-- Create index on ELID column
CREATE INDEX idx_documents_elid ON documents(elid);

-- Find similar documents using string prefix matching
SELECT * FROM documents
WHERE elid LIKE 'abc%'  -- Prefix match for locality
ORDER BY elid;

Deduplication

Use SimHash to find near-duplicate content:

let hash1 = simhash("The quick brown fox");
let hash2 = simhash("The quick brown dog");
let similarity = simhash_similarity_from_hashes(hash1, hash2);
if similarity > 0.9 {
    println!("Likely duplicates!");
}

Fuzzy Search

Find matches with typo tolerance:

let candidates = vec!["apple", "application", "apply", "banana"];
let matches = find_matches_above_threshold("aple", &candidates, 0.7);
// Returns: [("apple", 0.8), ...]

Building

git clone https://github.com/ZachHandley/ELID.git
cd ELID

cargo build --release
cargo test
cargo bench
cargo run --example basic_usage

License

Dual-licensed under MIT or Apache-2.0 at your option.