@ruvector/attention-wasm

v0.1.0

Published

2 months ago

WebAssembly bindings for ruvector-attention - high-performance attention mechanisms

Downloads

969

0High
0Medium
0Low

ruvnet

wasm webassembly attention transformer machine-learning neural-networks hyperbolic moe flash-attention

ruvector-attention-wasm

WebAssembly bindings for the ruvector-attention package, providing high-performance attention mechanisms for browser and Node.js environments.

Features

Multiple Attention Mechanisms:
- Scaled Dot-Product Attention
- Multi-Head Attention
- Hyperbolic Attention (for hierarchical data)
- Linear Attention (Performer-style)
- Flash Attention (memory-efficient)
- Local-Global Attention
- Mixture of Experts (MoE) Attention
Training Utilities:
- InfoNCE contrastive loss
- Adam optimizer
- AdamW optimizer (with decoupled weight decay)
- Learning rate scheduler (warmup + cosine decay)
TypeScript Support: Full type definitions and modern API

Installation

npm install ruvector-attention-wasm

Usage

TypeScript/JavaScript

import { initialize, MultiHeadAttention, utils } from 'ruvector-attention-wasm';

// Initialize WASM module
await initialize();

// Create multi-head attention
const attention = new MultiHeadAttention({ dim: 64, numHeads: 8 });

// Prepare inputs
const query = new Float32Array(64);
const keys = [new Float32Array(64), new Float32Array(64)];
const values = [new Float32Array(64), new Float32Array(64)];

// Compute attention
const output = attention.compute(query, keys, values);

// Use utilities
const similarity = utils.cosineSimilarity(query, keys[0]);

Advanced Examples

Hyperbolic Attention

import { HyperbolicAttention } from 'ruvector-attention-wasm';

const hyperbolic = new HyperbolicAttention({
  dim: 128,
  curvature: 1.0
});

const output = hyperbolic.compute(query, keys, values);

MoE Attention with Expert Stats

import { MoEAttention } from 'ruvector-attention-wasm';

const moe = new MoEAttention({
  dim: 64,
  numExperts: 4,
  topK: 2
});

const output = moe.compute(query, keys, values);

// Get expert utilization
const stats = moe.getExpertStats();
console.log('Load balance:', stats.loadBalance);

Training with InfoNCE Loss

import { InfoNCELoss, Adam } from 'ruvector-attention-wasm';

const loss = new InfoNCELoss(0.07);
const optimizer = new Adam(paramCount, {
  learningRate: 0.001,
  beta1: 0.9,
  beta2: 0.999,
});

// Training loop
const lossValue = loss.compute(anchor, positive, negatives);
optimizer.step(params, gradients);

Learning Rate Scheduling

import { LRScheduler, AdamW } from 'ruvector-attention-wasm';

const scheduler = new LRScheduler({
  initialLR: 0.001,
  warmupSteps: 1000,
  totalSteps: 10000,
});

const optimizer = new AdamW(paramCount, {
  learningRate: scheduler.getLR(),
  weightDecay: 0.01,
});

// Training loop
for (let step = 0; step < 10000; step++) {
  optimizer.learningRate = scheduler.getLR();
  optimizer.step(params, gradients);
  scheduler.step();
}

Building from Source

Prerequisites

Rust 1.70+
wasm-pack

Build Commands

# Build for web (ES modules)
wasm-pack build --target web --out-dir pkg

# Build for Node.js
wasm-pack build --target nodejs --out-dir pkg-node

# Build for bundlers (webpack, vite, etc.)
wasm-pack build --target bundler --out-dir pkg-bundler

# Run tests
wasm-pack test --headless --firefox

API Reference

Attention Mechanisms

MultiHeadAttention - Standard multi-head attention
HyperbolicAttention - Attention in hyperbolic space
LinearAttention - Linear complexity attention (Performer)
FlashAttention - Memory-efficient attention
LocalGlobalAttention - Combined local and global attention
MoEAttention - Mixture of Experts attention
scaledDotAttention() - Functional API for basic attention

Training

InfoNCELoss - Contrastive loss function
Adam - Adam optimizer
AdamW - AdamW optimizer with weight decay
LRScheduler - Learning rate scheduler

Utilities

utils.cosineSimilarity() - Cosine similarity between vectors
utils.l2Norm() - L2 norm of a vector
utils.normalize() - Normalize vector to unit length
utils.softmax() - Apply softmax transformation
utils.attentionWeights() - Compute attention weights from scores
utils.batchNormalize() - Batch normalization
utils.randomOrthogonalMatrix() - Generate random orthogonal matrix
utils.pairwiseDistances() - Compute pairwise distances

Performance

The WASM bindings provide near-native performance for attention computations:

Optimized with opt-level = "s" and LTO
SIMD acceleration where available
Efficient memory management
Zero-copy data transfer where possible

License

MIT OR Apache-2.0