@pixagram/pixahash

v0.1.0

Published

a month ago

Fast WASM-optimized non-cryptographic hash (fingerprint) tuned for 5-30 kB payloads. Flexible output: uint32, uint64, 128-bit, up to a 256-bit digest as raw bytes, hex, or base58btc. Sync + auto-init async API plus an isomorphic worker pool for data-paral

0High
0Medium
0Low

pixagram

hash hashing non-cryptographic fingerprint checksum wasm webassembly rust xxhash rapidhash museair base58 hex worker parallel pixagram

@pixagram/pixahash

A fast, WebAssembly-optimized non-cryptographic hash, compiled from Rust. Built for fingerprinting medium payloads (≈5–30 kB) — content IDs, deduplication, hash tables, bloom filters, checksums.

Flexible output — uint32, uint64, 128-bit, or a full 256-bit digest, as a number/BigInt, raw bytes, hex, or base58btc (any bit width 1–256).
Async API — await ready() once, then call synchronously; or use the auto-initializing *Async helpers.
Parallel — an isomorphic worker pool (browser Workers / Node worker_threads) for data-parallel hashing across many items.
Tiny & portable — one ~34 kB .wasm, no native addons, runs in browsers, bundlers, and Node ≥18.

⚠️ Not cryptographic. PixaHash is a fingerprint. It is not collision-resistant against an adversary who can choose inputs, and must never be used for message authentication, password hashing, key derivation, or any security boundary. For those, use BLAKE3, SHA-3, or Ascon. PixaHash optimizes for speed and distribution quality on non-adversarial data.

Install

npm install @pixagram/pixahash

The package ships a prebuilt .wasm; no Rust toolchain is needed to consume it.

Quick start

import { ready, hash64, hashHex, hashBase58 } from '@pixagram/pixahash';

await ready();                       // load the WASM module once

hash64('hello world');               // 64-bit BigInt
hashHex(bytes, 0, 128);              // 128-bit hex string
hashBase58(bytes, 0, 256);          // 256-bit base58btc string

Inputs may be a string (hashed as UTF-8), Uint8Array, ArrayBuffer, any TypedArray, or a DataView. The optional seed is a number or bigint and spans the full 64 bits (default 0).

Without the init dance

If you don't want to manage ready(), the *Async variants initialize on first use and resolve on the calling thread:

import { hash64Async } from '@pixagram/pixahash';
const h = await hash64Async('hello world');

Streaming

For data you receive in chunks:

import { ready, createHasher } from '@pixagram/pixahash';
await ready();

const h = createHasher(/* seed */ 0);
h.update(chunkA).update(chunkB);
const id = h.digestBase58(128);
h.free();                            // release WASM memory (or use `using`)

Streaming output is bit-for-bit identical to the one-shot functions.

Parallel hashing

import { createPool } from '@pixagram/pixahash/pool';

const pool = await createPool();                 // size = CPU count
const ids  = await Promise.all(
  files.map((bytes) => pool.hashHex(bytes, 0, 256))
);
pool.destroy();

Input is copied to the worker by default, so your buffers stay intact. For large inputs you can move them instead with createPool({ transfer: true }) — faster, but it neuters the caller's ArrayBuffer. hashMany(items, { op }) runs a batch on a single worker when you'd rather avoid one message per item.

When does a pool actually help? A single 5–30 kB hash takes microseconds, so the worker round-trip costs more than the hash. Reach for a pool to (a) keep the UI thread responsive during a burst, or (b) parallelize across many items. For a one-off hash, call the sync/async API directly.

API

| Function | Returns | | --- | --- | | ready() / isReady() | Promise<void> / boolean | | hash32(data, seed?) | number (u32) | | hash64(data, seed?) | bigint (u64) | | hash128(data, seed?) | bigint (u128) | | hashHex(data, seed?, bits=128) | hex string | | hashBase58(data, seed?, bits=128) | base58btc string | | digest(data, seed?) | Uint8Array (32 bytes) | | *Async(...) | the same, auto-initialized, as a Promise | | createHasher(seed?) → Hasher | streaming: update, digest32/64/128/Hex/Base58/Bytes, free | | createPool(opts?) → HashPool | parallel: same hash methods returning Promises |

bits is clamped to 1–256. Bits beyond 128 come from a finalizer expansion and add output width, not extra collision resistance (see below).

Design

Both rapidhash and museair are excellent modern hashes, and both lean on a 64×64→128-bit multiply (umulh). On native CPUs that's a single instruction — but WebAssembly has no high-multiply opcode, so each one lowers to four 32×32 multiplies plus carry handling. PixaHash is designed around what WASM does do in one instruction: 64-bit multiply-low, rotate, xor, and add. That puts it in the lineage of xxHash64 rather than the umulh-based designs.

The core borrows the best ideas from each:

xxHash64-style bulk loop — 4 lanes, 32-byte stripes, round(acc, w) = rotl(acc + w·P2, 31)·P1, using the xxHash primes. Multiply-low, rotate, add only.
museair-style ring coupling — after each stripe the lanes are mixed in a ring (v[i] ^= rotl(v[i+1], 17)) so every input bit reaches the whole state, not just one lane.
rapidhash-style tail + length injection — short tails fold round-robin into the lanes and the total length is mixed into all of them, killing length-extension-style collisions on similar inputs.
All multipliers are odd constants, so each multiply is a bijection on u64 and can never collapse state to zero — museair's "blinding multiplication" failure mode is avoided structurally, without needing its additive workaround.
A Moremur finalizer avalanches the result; for ≥32-byte inputs (the whole target range) both 64-bit halves carry full entropy, giving a genuine ~128-bit fingerprint. The 256-bit digest extends that width with two more Moremur rounds.

Quality (measured)

From the native test harness (cargo run --release --bin quality):

Avalanche on a 10 kB input: worst single-bit bias 1.3%, average 0.4% (well within the 5σ noise band) — flipping any input bit flips ≈half the output bits.
Collisions: 0 across 5M random 64-bit hashes; 32-bit collisions on 2M structured keys land within ~2% of the ideal birthday-bound rate.
Streaming == one-shot across thousands of random chunk splits.
Seed sensitivity: changing the seed flips ≈half the bits.
Throughput: ≈9.3 GB/s on 20 kB inputs natively (algorithmic speed).

The .wasm bundled here is built without wasm-opt (binaryen wasn't available in the build sandbox). It is correct and passes the full native↔WASM cross-check, but for production you should rebuild with wasm-opt enabled (see below) for a smaller, faster module.

Build from source

Requires the Rust toolchain, the wasm32-unknown-unknown target, and wasm-pack:

rustup target add wasm32-unknown-unknown
cargo install wasm-pack
./build.sh          # → pkg/pixahash.js + pkg/pixahash_bg.wasm

To enable size/speed optimization, install binaryen and remove the wasm-opt = false line under [package.metadata.wasm-pack.profile.release] in Cargo.toml, then rebuild.

Run the native correctness/quality suite and the JS cross-check:

cargo run --release --no-default-features --bin quality
node test-node.mjs
node test-wrapper.mjs

Threading roadmap

The worker pool is data parallelism: many independent items across many threads, and it needs no special HTTP headers. Parallelizing a single hash (splitting one large input across threads via wasm-bindgen-rayon + SharedArrayBuffer) is on the roadmap; that path requires cross-origin isolation (Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp). A simd128 build of the core (identical output, gated behind the simd feature) is also planned.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@pixagram/pixahash

Install

Quick start

Without the init dance

Streaming

Parallel hashing

API

Design

Quality (measured)

Build from source

Threading roadmap

License