webinfer
v0.0.3
Published
High-performance LLM inference kernels for WebGPU
Maintainers
Readme
WebInfer
High-performance LLM inference kernels for WebGPU.
Install
npm install webinferUsage
import { WebInferDevice, attention } from 'webinfer';
const device = await WebInferDevice.create();
// Single decode attention
const q = new Float32Array(32 * 128); // [num_qo_heads, head_dim]
const k = new Float32Array(2048 * 32 * 128); // [kv_len, num_kv_heads, head_dim]
const v = new Float32Array(2048 * 32 * 128);
const output = await attention(device, { q, k, v });API
| Category | Exports |
|----------|---------|
| Attention | attention, BatchAttention, AttentionKernel, cascadedAttention |
| KV Cache | PagedKVCache, BlockManager, pagedAttention |
| Patterns | buildCausalMask, buildSlidingWindowMask, buildBlockSparseCSR |
| Sampling | topKSamplingFromProbs, topPSamplingFromProbs, minPSamplingFromProbs, topKTopPSamplingFromLogits |
| Normalization | rmsNorm, layerNorm, fusedAddRmsNorm, gemmaRmsNorm |
| Core | matmul, rope, gelu, silu, softmax |
Release
bun run build
npm publishLicense
Apache-2.0
