@pixagram/pixahash
v0.1.0
Published
Fast WASM-optimized non-cryptographic hash (fingerprint) tuned for 5-30 kB payloads. Flexible output: uint32, uint64, 128-bit, up to a 256-bit digest as raw bytes, hex, or base58btc. Sync + auto-init async API plus an isomorphic worker pool for data-paral
Maintainers
Readme
@pixagram/pixahash
A fast, WebAssembly-optimized non-cryptographic hash, compiled from Rust. Built for fingerprinting medium payloads (≈5–30 kB) — content IDs, deduplication, hash tables, bloom filters, checksums.
- Flexible output —
uint32,uint64, 128-bit, or a full 256-bit digest, as a number/BigInt, raw bytes, hex, or base58btc (any bit width 1–256). - Async API —
await ready()once, then call synchronously; or use the auto-initializing*Asynchelpers. - Parallel — an isomorphic worker pool (browser Workers / Node
worker_threads) for data-parallel hashing across many items. - Tiny & portable — one ~34 kB
.wasm, no native addons, runs in browsers, bundlers, and Node ≥18.
⚠️ Not cryptographic. PixaHash is a fingerprint. It is not collision-resistant against an adversary who can choose inputs, and must never be used for message authentication, password hashing, key derivation, or any security boundary. For those, use BLAKE3, SHA-3, or Ascon. PixaHash optimizes for speed and distribution quality on non-adversarial data.
Install
npm install @pixagram/pixahashThe package ships a prebuilt .wasm; no Rust toolchain is needed to consume it.
Quick start
import { ready, hash64, hashHex, hashBase58 } from '@pixagram/pixahash';
await ready(); // load the WASM module once
hash64('hello world'); // 64-bit BigInt
hashHex(bytes, 0, 128); // 128-bit hex string
hashBase58(bytes, 0, 256); // 256-bit base58btc stringInputs may be a string (hashed as UTF-8), Uint8Array, ArrayBuffer, any
TypedArray, or a DataView. The optional seed is a number or bigint
and spans the full 64 bits (default 0).
Without the init dance
If you don't want to manage ready(), the *Async variants initialize on first
use and resolve on the calling thread:
import { hash64Async } from '@pixagram/pixahash';
const h = await hash64Async('hello world');Streaming
For data you receive in chunks:
import { ready, createHasher } from '@pixagram/pixahash';
await ready();
const h = createHasher(/* seed */ 0);
h.update(chunkA).update(chunkB);
const id = h.digestBase58(128);
h.free(); // release WASM memory (or use `using`)Streaming output is bit-for-bit identical to the one-shot functions.
Parallel hashing
import { createPool } from '@pixagram/pixahash/pool';
const pool = await createPool(); // size = CPU count
const ids = await Promise.all(
files.map((bytes) => pool.hashHex(bytes, 0, 256))
);
pool.destroy();Input is copied to the worker by default, so your buffers stay intact. For
large inputs you can move them instead with createPool({ transfer: true }) —
faster, but it neuters the caller's ArrayBuffer. hashMany(items, { op }) runs
a batch on a single worker when you'd rather avoid one message per item.
When does a pool actually help? A single 5–30 kB hash takes microseconds, so the worker round-trip costs more than the hash. Reach for a pool to (a) keep the UI thread responsive during a burst, or (b) parallelize across many items. For a one-off hash, call the sync/async API directly.
API
| Function | Returns |
| --- | --- |
| ready() / isReady() | Promise<void> / boolean |
| hash32(data, seed?) | number (u32) |
| hash64(data, seed?) | bigint (u64) |
| hash128(data, seed?) | bigint (u128) |
| hashHex(data, seed?, bits=128) | hex string |
| hashBase58(data, seed?, bits=128) | base58btc string |
| digest(data, seed?) | Uint8Array (32 bytes) |
| *Async(...) | the same, auto-initialized, as a Promise |
| createHasher(seed?) → Hasher | streaming: update, digest32/64/128/Hex/Base58/Bytes, free |
| createPool(opts?) → HashPool | parallel: same hash methods returning Promises |
bits is clamped to 1–256. Bits beyond 128 come from a finalizer expansion and
add output width, not extra collision resistance (see below).
Design
Both rapidhash and
museair are excellent modern hashes, and
both lean on a 64×64→128-bit multiply (umulh). On native CPUs that's a single
instruction — but WebAssembly has no high-multiply opcode, so each one lowers
to four 32×32 multiplies plus carry handling. PixaHash is designed around what
WASM does do in one instruction: 64-bit multiply-low, rotate, xor, and add. That
puts it in the lineage of xxHash64 rather than the umulh-based designs.
The core borrows the best ideas from each:
- xxHash64-style bulk loop — 4 lanes, 32-byte stripes,
round(acc, w) = rotl(acc + w·P2, 31)·P1, using the xxHash primes. Multiply-low, rotate, add only. - museair-style ring coupling — after each stripe the lanes are mixed in a ring
(
v[i] ^= rotl(v[i+1], 17)) so every input bit reaches the whole state, not just one lane. - rapidhash-style tail + length injection — short tails fold round-robin into the lanes and the total length is mixed into all of them, killing length-extension-style collisions on similar inputs.
- All multipliers are odd constants, so each multiply is a bijection on
u64and can never collapse state to zero — museair's "blinding multiplication" failure mode is avoided structurally, without needing its additive workaround. - A Moremur finalizer avalanches the result; for ≥32-byte inputs (the whole target range) both 64-bit halves carry full entropy, giving a genuine ~128-bit fingerprint. The 256-bit digest extends that width with two more Moremur rounds.
Quality (measured)
From the native test harness (cargo run --release --bin quality):
- Avalanche on a 10 kB input: worst single-bit bias 1.3%, average 0.4% (well within the 5σ noise band) — flipping any input bit flips ≈half the output bits.
- Collisions: 0 across 5M random 64-bit hashes; 32-bit collisions on 2M structured keys land within ~2% of the ideal birthday-bound rate.
- Streaming == one-shot across thousands of random chunk splits.
- Seed sensitivity: changing the seed flips ≈half the bits.
- Throughput: ≈9.3 GB/s on 20 kB inputs natively (algorithmic speed).
The
.wasmbundled here is built withoutwasm-opt(binaryen wasn't available in the build sandbox). It is correct and passes the full native↔WASM cross-check, but for production you should rebuild withwasm-optenabled (see below) for a smaller, faster module.
Build from source
Requires the Rust toolchain, the wasm32-unknown-unknown
target, and wasm-pack:
rustup target add wasm32-unknown-unknown
cargo install wasm-pack
./build.sh # → pkg/pixahash.js + pkg/pixahash_bg.wasmTo enable size/speed optimization, install binaryen and remove the
wasm-opt = false line under [package.metadata.wasm-pack.profile.release] in
Cargo.toml, then rebuild.
Run the native correctness/quality suite and the JS cross-check:
cargo run --release --no-default-features --bin quality
node test-node.mjs
node test-wrapper.mjsThreading roadmap
The worker pool is data parallelism: many independent items across many threads,
and it needs no special HTTP headers. Parallelizing a single hash (splitting one
large input across threads via wasm-bindgen-rayon + SharedArrayBuffer) is on the
roadmap; that path requires cross-origin isolation (Cross-Origin-Opener-Policy:
same-origin and Cross-Origin-Embedder-Policy: require-corp). A simd128 build of
the core (identical output, gated behind the simd feature) is also planned.
License
MIT © Pixagram SA
