npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@goodganglabs/lipsync-wasm-v2

v0.4.10

Published

WASM LipSync V2 - Student model 52-dim ARKit blendshape engine

Readme

@goodganglabs/lipsync-wasm-v2

WebAssembly-based real-time audio-to-blendshape lip sync engine. Converts 16kHz PCM audio into 52-dimensional ARKit-compatible blendshape frames at 30fps using a student distillation model.

Website · Start Building · GitHub

Which Version?

| | V2 (this package) | V1 | |---|---|---| | Dimensions | 52-dim ARKit | 111-dim ARKit | | Model | Student distillation | Phoneme classification | | Emotion | 5-dim emotion conditioning (neutral, joy, anger, sadness, surprise) | Not available | | Post-processing | Model-integrated | Manual | | Idle expression | Built-in IdleExpressionGenerator | Built-in IdleExpressionGenerator | | VAD | Not included | Built-in VoiceActivityDetector | | ONNX fallback | None (ONNX required) | Heuristic fallback | | Recommendation | Most use cases | Full expression control needed |

Features

  • 52-dim ARKit blendshape output (direct prediction, no intermediate phoneme step)
  • VRM 18-dim blendshape output (automatic ARKit→VRM conversion)
  • Emotion-conditioned inference — 5-dim vector: neutral, joy, anger, sadness, surprise
  • Real-time emotion switchingreInferWithEmotion() re-runs inference without re-uploading audio
  • Streaming ONNX model with LSTM state carry (chunk_size=5, ~167ms latency)
  • Built-in idle expression generator (eye blinks + micro expressions)
  • Batch and real-time streaming processing
  • Built-in expression preset blending
  • Embedded VRMA bone animation data (idle + speaking)
  • 30-day free trial (no license key required)
  • Runs entirely in the browser via WebAssembly

Requirements

  • onnxruntime-web >=1.17.0 (required — V2 has no heuristic fallback)
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.min.js"></script>

Installation

npm install @goodganglabs/lipsync-wasm-v2

Quick Start

Minimal Example (Batch Processing)

import { LipSyncWasmWrapper } from '@goodganglabs/lipsync-wasm-v2';

const lipsync = new LipSyncWasmWrapper();
await lipsync.init();

// Process an audio file
const result = await lipsync.processFile(audioFile);

// Each frame is a number[52] array of ARKit blendshape weights
for (let i = 0; i < result.frame_count; i++) {
  const frame = lipsync.getFrame(result, i);
  applyToAvatar(frame); // your rendering code
}

lipsync.dispose();

Complete Working Example (Three.js + VRM)

Copy-paste ready. This example handles everything: VRM loading, VRMA bone animations (idle/speaking crossfade), blendshape application, 30fps frame consumption, and audio-synced playback.

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <script type="importmap">
  { "imports": {
      "three": "https://cdn.jsdelivr.net/npm/[email protected]/build/three.module.js",
      "three/addons/": "https://cdn.jsdelivr.net/npm/[email protected]/examples/jsm/",
      "@pixiv/three-vrm": "https://cdn.jsdelivr.net/npm/@pixiv/[email protected]/lib/three-vrm.module.min.js",
      "@pixiv/three-vrm-animation": "https://cdn.jsdelivr.net/npm/@pixiv/[email protected]/lib/three-vrm-animation.module.min.js"
  }}
  </script>
  <script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.min.js"></script>
</head>
<body>

<canvas id="avatar-canvas" style="width:100%; height:500px;"></canvas>
<input type="file" id="audio-file" accept="audio/*">

<script type="module">
import * as THREE from 'three';
import { GLTFLoader } from 'three/addons/loaders/GLTFLoader.js';
import { OrbitControls } from 'three/addons/controls/OrbitControls.js';
import { VRMLoaderPlugin, VRMUtils } from '@pixiv/three-vrm';
import { VRMAnimationLoaderPlugin, createVRMAnimationClip } from '@pixiv/three-vrm-animation';
import { LipSyncWasmWrapper } from '@goodganglabs/lipsync-wasm-v2';

// ============================================================
// Step 1: ARKit Blendshape Mapping (52-dim)
// ============================================================
const ARKIT_NAMES = {
  0:'browDownLeft',1:'browDownRight',2:'browInnerUp',3:'browOuterUpLeft',4:'browOuterUpRight',
  5:'cheekPuff',6:'cheekSquintLeft',7:'cheekSquintRight',8:'eyeBlinkLeft',9:'eyeBlinkRight',
  10:'eyeLookDownLeft',11:'eyeLookDownRight',12:'eyeLookInLeft',13:'eyeLookInRight',
  14:'eyeLookOutLeft',15:'eyeLookOutRight',16:'eyeLookUpLeft',17:'eyeLookUpRight',
  18:'eyeSquintLeft',19:'eyeSquintRight',20:'eyeWideLeft',21:'eyeWideRight',
  22:'jawForward',23:'jawLeft',24:'jawOpen',25:'jawRight',
  26:'mouthClose',27:'mouthDimpleLeft',28:'mouthDimpleRight',
  29:'mouthFrownLeft',30:'mouthFrownRight',31:'mouthFunnel',
  32:'mouthLeft',33:'mouthLowerDownLeft',34:'mouthLowerDownRight',
  35:'mouthPressLeft',36:'mouthPressRight',37:'mouthPucker',
  38:'mouthRight',39:'mouthRollLower',40:'mouthRollUpper',
  41:'mouthShrugLower',42:'mouthShrugUpper',43:'mouthSmileLeft',44:'mouthSmileRight',
  45:'mouthStretchLeft',46:'mouthStretchRight',47:'mouthUpperUpLeft',48:'mouthUpperUpRight',
  49:'noseSneerLeft',50:'noseSneerRight',51:'tongueOut'
};

function applyBlendshapes(vrm, frame) {
  if (!vrm?.expressionManager) return;
  for (const [idx, name] of Object.entries(ARKIT_NAMES)) {
    vrm.expressionManager.setValue(name, frame[idx] || 0);
  }
}

// ============================================================
// Step 2: Three.js Scene
// ============================================================
const canvas = document.getElementById('avatar-canvas');
const scene = new THREE.Scene();
scene.background = new THREE.Color(0x1a1a2e);

const camera = new THREE.PerspectiveCamera(30, canvas.clientWidth / canvas.clientHeight, 0.1, 100);
camera.position.set(0, 1.25, 0.5);

const renderer = new THREE.WebGLRenderer({ canvas, antialias: true });
renderer.setSize(canvas.clientWidth, canvas.clientHeight);
renderer.setPixelRatio(Math.min(window.devicePixelRatio, 2));

const controls = new OrbitControls(camera, canvas);
controls.target.set(0, 1.25, 0);
controls.enableDamping = true;

scene.add(new THREE.AmbientLight(0xffffff, 2.0));
const dirLight = new THREE.DirectionalLight(0xffffff, 1.1);
dirLight.position.set(1, 3, 2);
scene.add(dirLight);

// ============================================================
// Step 3: Load VRM Avatar
// ============================================================
const loader = new GLTFLoader();
loader.register(p => new VRMLoaderPlugin(p));

const gltf = await new Promise((res, rej) => loader.load('your-avatar.vrm', res, undefined, rej));
const vrm = gltf.userData.vrm;
VRMUtils.removeUnnecessaryVertices(gltf.scene);
VRMUtils.removeUnnecessaryJoints(gltf.scene);
scene.add(vrm.scene);

const mixer = new THREE.AnimationMixer(vrm.scene);

// ============================================================
// Step 3.5: Detect VRM Mode (ARKit 52-dim vs VRM 18-dim)
// ============================================================
// VRoid Hub models use VRM expressions (aa, ih, ou, ee, oh, blink, etc.)
// instead of ARKit names (jawOpen, eyeBlinkLeft, etc.).
// Detect which format the model supports to apply the correct blendshapes.

const VRM_NAMES = [
  'aa','ih','ou','ee','oh',           // lip-sync (5)
  'happy','angry','sad','relaxed','surprised', // emotions (5)
  'blink','blinkLeft','blinkRight',   // blink (3)
  'lookUp','lookDown','lookLeft','lookRight', // gaze (4)
  'neutral'                            // base (1)
];

let useVrmMode = false;

function detectVrmMode() {
  if (!vrm?.expressionManager) return false;
  const exprMap = vrm.expressionManager.expressionMap || vrm.expressionManager._expressionMap || {};
  const names = Object.keys(exprMap);
  const arkitProbes = ['jawOpen','mouthFunnel','mouthPucker','eyeBlinkLeft','eyeBlinkRight'];
  const vrmProbes = ['aa','ih','ou','ee','oh'];
  const hasArkit = arkitProbes.filter(n => names.includes(n)).length >= 3;
  const hasVrm = vrmProbes.filter(n => names.includes(n)).length >= 3;
  return !hasArkit && hasVrm;
}

useVrmMode = detectVrmMode();
console.log('VRM mode:', useVrmMode);

function applyVrmBlendshapes(vrm, vrmFrame) {
  if (!vrm?.expressionManager) return;
  for (let i = 0; i < VRM_NAMES.length; i++) {
    vrm.expressionManager.setValue(VRM_NAMES[i], vrmFrame[i] || 0);
  }
}

// ============================================================
// Step 4: Init LipSync
// ============================================================
const lipsync = new LipSyncWasmWrapper();
// For production, pass your license key:
//   await lipsync.init({ licenseKey: 'ggl_your_key_here' });
await lipsync.init({
  onProgress: (stage, pct) => console.log(`Init: ${stage} ${pct}%`)
});

// ============================================================
// Step 5: Load VRMA Bone Animations (idle + speaking)
// ============================================================
// The package embeds two VRMA animations: idle pose and speaking pose.
// Use AnimationMixer to crossfade between them when audio plays.

const vrmaData = lipsync.getVrmaBytes();

async function loadVRMA(bytes) {
  const blob = new Blob([bytes], { type: 'application/octet-stream' });
  const url = URL.createObjectURL(blob);
  const vrmaLoader = new GLTFLoader();
  vrmaLoader.register(p => new VRMAnimationLoaderPlugin(p));
  const g = await new Promise((res, rej) => vrmaLoader.load(url, res, undefined, rej));
  URL.revokeObjectURL(url);
  return g.userData.vrmAnimations[0];
}

const idleAnim = await loadVRMA(vrmaData.idle);
const speakingAnim = await loadVRMA(vrmaData.speaking);

const idleClip = createVRMAnimationClip(idleAnim, vrm);
const speakingClip = createVRMAnimationClip(speakingAnim, vrm);

const idleAction = mixer.clipAction(idleClip);
const speakingAction = mixer.clipAction(speakingClip);

// LoopPingPong prevents visible seam when idle animation loops
idleAction.setLoop(THREE.LoopPingPong);
speakingAction.setLoop(THREE.LoopRepeat);
idleAction.setEffectiveWeight(1);
idleAction.play();
speakingAction.setEffectiveWeight(0);
speakingAction.play();

// Crossfade state
let isSpeaking = false;
let crossFadeProgress = 0; // 0 = idle, 1 = speaking

function transitionToSpeaking(instant) {
  isSpeaking = true;
  if (instant) crossFadeProgress = 1;
}
function transitionToIdle() {
  isSpeaking = false;
}
function updateBoneWeights(delta) {
  const target = isSpeaking ? 1 : 0;
  if (Math.abs(crossFadeProgress - target) > 0.001) {
    // Asymmetric crossfade: 0.8s into speaking, 1.0s back to idle
    const duration = isSpeaking ? 0.8 : 1.0;
    const step = delta / duration;
    crossFadeProgress = target > crossFadeProgress
      ? Math.min(crossFadeProgress + step, 1)
      : Math.max(crossFadeProgress - step, 0);
  }
  const t = crossFadeProgress;
  const w = t * t * (3 - 2 * t); // smoothstep
  speakingAction.setEffectiveWeight(w);
  idleAction.setEffectiveWeight(1 - w);
}

// ============================================================
// Step 5.5: Idle Expression Generator
// ============================================================
// Procedural eye blinks + micro expressions when no audio is playing.
const idle = new lipsync.wasmModule.IdleExpressionGenerator();
let elapsedSeconds = 0;
let prevFrame = null;

// ============================================================
// Step 6: Frame Queue + Render Loop
// ============================================================
// Frames are consumed at 30fps regardless of monitor refresh rate.
const frameQueue = [];
let streamTimeAccum = 0;
const FRAME_INTERVAL = 1 / 30;

const clock = new THREE.Clock();

function animate() {
  requestAnimationFrame(animate);
  const delta = clock.getDelta();
  elapsedSeconds += delta;
  controls.update();

  // Bone animation crossfade
  updateBoneWeights(delta);
  mixer.update(delta);

  // Consume blendshape frames at 30fps
  streamTimeAccum += delta;
  while (streamTimeAccum >= FRAME_INTERVAL) {
    streamTimeAccum -= FRAME_INTERVAL;
    if (frameQueue.length > 0) {
      prevFrame = frameQueue.shift();
      if (useVrmMode) {
        applyVrmBlendshapes(vrm, prevFrame);
      } else {
        applyBlendshapes(vrm, prevFrame);
      }
    }
  }

  // Idle expressions when queue is empty
  if (frameQueue.length === 0 && !isSpeaking) {
    const idleFrame = idle.get_frame(elapsedSeconds);
    let frame = idleFrame;
    if (prevFrame) {
      frame = prevFrame.map((v, i) => v + 0.15 * ((idleFrame[i] || 0) - v));
      prevFrame = frame;
    }
    if (useVrmMode) {
      const vrmFrame = lipsync.wasmModule.convert_arkit_to_vrm(frame);
      applyVrmBlendshapes(vrm, Array.from(vrmFrame));
    } else {
      applyBlendshapes(vrm, frame);
    }
  }

  vrm.update(delta);
  renderer.render(scene, camera);
}
animate();

// ============================================================
// Step 7: Audio File Playback (batch processing)
// ============================================================
document.getElementById('audio-file').addEventListener('change', async (e) => {
  const file = e.target.files[0];
  if (!file) return;

  // Process blendshapes
  const result = await lipsync.processFile(file);

  // Fill frame queue
  frameQueue.length = 0;
  for (let i = 0; i < result.frame_count; i++) {
    if (useVrmMode) {
      frameQueue.push(lipsync.getVrmFrame(result, i));
    } else {
      frameQueue.push(lipsync.getFrame(result, i));
    }
  }

  // Switch to speaking pose immediately
  transitionToSpeaking(true);

  // Play audio in sync
  const arrayBuffer = await file.arrayBuffer();
  const audioCtx = new AudioContext();
  const audioBuffer = await audioCtx.decodeAudioData(arrayBuffer);
  const source = audioCtx.createBufferSource();
  source.buffer = audioBuffer;
  source.connect(audioCtx.destination);
  source.start();
  source.onended = () => transitionToIdle();
});
</script>
</body>
</html>

Licensing

The first call to init() automatically starts a 30-day free trial (no signup, no API key). For production use, pass your license key:

await lipsync.init({ licenseKey: 'ggl_your_key_here' });

| | Free Trial | Licensed | |---|---|---| | Duration | 30 days from first use | Unlimited | | Setup | None (automatic) | Pass licenseKey to init() | | Domain restriction | None | Configurable per key | | Features | Full access | Full access |

Contact GoodGang Labs for license keys.

API Reference

Constructor

new LipSyncWasmWrapper(options?: { wasmPath?: string })

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | wasmPath | string | './lipsync_wasm_v2.js' | Path to the WASM glue module |

Important: wasmPath is resolved relative to the HTML page, not the wrapper JS file.

  • With bundlers (Vite, Webpack): the default './lipsync_wasm_v2.js' works automatically.
  • Without a bundler (plain <script type="module">): use an absolute path:
new LipSyncWasmWrapper({
  wasmPath: '/node_modules/@goodganglabs/lipsync-wasm-v2/lipsync_wasm_v2.js'
})

init(options?): Promise<InitResult>

Initializes the WASM runtime, loads the ONNX model, and applies the expression preset. ONNX Runtime must be loaded before calling this method.

| Option | Type | Default | Description | |--------|------|---------|-------------| | licenseKey | string | — | GoodGang Labs license key. Omit for 30-day free trial. | | onProgress | (stage, percent) => void | — | Progress callback. Stages: 'wasm', 'license', 'decrypt', 'onnx' | | preset | boolean \| string | true | true = built-in preset, URL string = custom preset JSON, false = disabled |

Returns { mode: 'v2-onnx' }. Throws if ONNX Runtime is not available.

Properties

| Property | Type | Description | |----------|------|-------------| | ready | boolean | true after init() completes | | modelVersion | 'v2' | Always 'v2' | | blendshapeDim | 52 | Output dimension per frame | | wasmModule | object | Direct access to WASM exports (for IdleExpressionGenerator, convert_arkit_to_vrm) |

Processing Methods

| Method | Input | Output | Use Case | |--------|-------|--------|----------| | processFile(file) | File | Promise<ProcessResult> | Audio file upload | | processAudio(audio) | Float32Array (16kHz) | Promise<ProcessResult> | Raw PCM buffer | | processAudioBuffer(buf) | AudioBuffer | Promise<ProcessResult> | Web Audio API buffer | | processAudioChunk(chunk, isLast?) | Float32Array | Promise<ProcessResult \| null> | Real-time streaming |

Emotion Control

| Method | Description | |--------|-------------| | setEmotion(vec) | Set emotion vector [neutral, joy, anger, sadness, surprise]. Each value 0–1. Default: [0,0,0,0,0] | | getEmotion() | Returns current emotion vector as number[5] | | reInferWithEmotion(vec?) | Re-run ONNX inference on cached audio features with new emotion. No audio re-upload needed. Requires prior processAudio() / processFile() call. |

// Set emotion before processing (applies to next inference)
lipsync.setEmotion([0, 0.8, 0, 0, 0]); // joy at 80%
const result = await lipsync.processFile(audioFile);

// Change emotion in real-time (re-infers without re-uploading audio)
const joyResult = await lipsync.reInferWithEmotion([0, 1.0, 0, 0, 0]);
const angryResult = await lipsync.reInferWithEmotion([0, 0, 0.8, 0, 0]);

getFrame(result, frameIndex): number[]

Extracts a single frame from ProcessResult. Returns number[52].

getVrmFrame(result, frameIndex): number[]

Extracts a single VRM 18-dim frame from ProcessResult. Returns number[18] with VRM expression weights. The WASM engine automatically converts ARKit 52-dim → VRM 18-dim with natural triangle blinks.

Available when result.vrm_blendshapes exists (always present in batch/streaming results).

getVrmaBytes(): { idle: Uint8Array, speaking: Uint8Array }

Returns embedded VRMA bone animation data. Load with GLTFLoader + VRMAnimationLoaderPlugin (see Complete Example above).

getVrmExpressionNames(): string[]

Returns the 18 VRM expression names in order: ['aa', 'ih', 'ou', 'ee', 'oh', 'happy', 'angry', 'sad', 'relaxed', 'surprised', 'blink', 'blinkLeft', 'blinkRight', 'lookUp', 'lookDown', 'lookLeft', 'lookRight', 'neutral'].

reset(): void

Resets internal state and ends any active streaming session.

dispose(): void

Releases all WASM and ONNX resources.

ProcessResult

{
  blendshapes: number[];       // Flat array: frame_count * 52 values
  vrm_blendshapes?: number[];  // Flat array: frame_count * 18 VRM values (use getVrmFrame() to extract)
  frame_count: number;         // Number of 30fps frames
  fps: number;                 // Always 30
  mode: string;                // 'v2-onnx' | 'v2-streaming-onnx'
}

ARKit Blendshape Index

Full 52-element index mapping:

| Index | Name | Index | Name | |-------|------|-------|------| | 0 | browDownLeft | 26 | mouthClose | | 1 | browDownRight | 27 | mouthDimpleLeft | | 2 | browInnerUp | 28 | mouthDimpleRight | | 3 | browOuterUpLeft | 29 | mouthFrownLeft | | 4 | browOuterUpRight | 30 | mouthFrownRight | | 5 | cheekPuff | 31 | mouthFunnel | | 6 | cheekSquintLeft | 32 | mouthLeft | | 7 | cheekSquintRight | 33 | mouthLowerDownLeft | | 8 | eyeBlinkLeft | 34 | mouthLowerDownRight | | 9 | eyeBlinkRight | 35 | mouthPressLeft | | 10 | eyeLookDownLeft | 36 | mouthPressRight | | 11 | eyeLookDownRight | 37 | mouthPucker | | 12 | eyeLookInLeft | 38 | mouthRight | | 13 | eyeLookInRight | 39 | mouthRollLower | | 14 | eyeLookOutLeft | 40 | mouthRollUpper | | 15 | eyeLookOutRight | 41 | mouthShrugLower | | 16 | eyeLookUpLeft | 42 | mouthShrugUpper | | 17 | eyeLookUpRight | 43 | mouthSmileLeft | | 18 | eyeSquintLeft | 44 | mouthSmileRight | | 19 | eyeSquintRight | 45 | mouthStretchLeft | | 20 | eyeWideLeft | 46 | mouthStretchRight | | 21 | eyeWideRight | 47 | mouthUpperUpLeft | | 22 | jawForward | 48 | mouthUpperUpRight | | 23 | jawLeft | 49 | noseSneerLeft | | 24 | jawOpen | 50 | noseSneerRight | | 25 | jawRight | 51 | tongueOut |

Advanced Features

Bone Animation Tips

The package embeds idle and speaking VRMA bone animations. Two key recommendations for smooth results:

1. Use LoopPingPong for idle animation — The idle clip's first and last keyframes don't perfectly match, so LoopRepeat causes a visible jump at the loop boundary. LoopPingPong (forward→backward→forward) eliminates this seam.

2. Use asymmetric crossfade durations — A slower transition into speaking (0.8s) feels more natural than an instant snap. The return to idle can be slightly slower (1.0s) for a relaxed feel. Apply smoothstep to the linear progress for ease-in/ease-out.

// Idle: PingPong to avoid loop seam
idleAction.setLoop(THREE.LoopPingPong);
speakingAction.setLoop(THREE.LoopRepeat);

// Asymmetric crossfade: 0.8s into speaking, 1.0s back to idle
function updateBoneWeights(delta) {
  const target = isSpeaking ? 1 : 0;
  const duration = isSpeaking ? 0.8 : 1.0;
  const step = delta / duration;
  crossFadeProgress = target > crossFadeProgress
    ? Math.min(crossFadeProgress + step, 1)
    : Math.max(crossFadeProgress - step, 0);
  const w = crossFadeProgress * crossFadeProgress * (3 - 2 * crossFadeProgress); // smoothstep
  speakingAction.setEffectiveWeight(w);
  idleAction.setEffectiveWeight(1 - w);
}

IdleExpressionGenerator

Procedural idle animation: eye blinks (2.5–4.5s random interval, 15% double-blink), micro expressions (sinusoidal). See Step 5.5 in the Complete Example.

const idle = new lipsync.wasmModule.IdleExpressionGenerator();

// In render loop (when no audio is playing):
const frame = idle.get_frame(elapsedSeconds); // number[52] (ARKit)

// For VRM mode, convert to 18-dim:
if (useVrmMode) {
  const vrmFrame = lipsync.wasmModule.convert_arkit_to_vrm(frame);
  applyVrmBlendshapes(vrm, Array.from(vrmFrame));
} else {
  applyBlendshapes(vrm, frame);
}

Real-time Microphone Streaming

Use AudioWorklet to batch 1600 samples (100ms @ 16kHz), feed to processAudioChunk(), push frames to the queue. The render loop (Step 6 in the Complete Example) consumes them at 30fps automatically.

const stream = await navigator.mediaDevices.getUserMedia({
  audio: { sampleRate: 16000, channelCount: 1, echoCancellation: true }
});
const audioCtx = new AudioContext({ sampleRate: 16000 });
const source = audioCtx.createMediaStreamSource(stream);

// AudioWorklet batches 128-sample inputs into 1600-sample chunks
const workletCode = `
class MicProcessor extends AudioWorkletProcessor {
  constructor() { super(); this.buf = []; this.len = 0; }
  process(inputs) {
    const d = inputs[0][0];
    if (d) { this.buf.push(new Float32Array(d)); this.len += d.length; }
    if (this.len >= 1600) {
      const out = new Float32Array(this.len);
      let off = 0;
      for (const b of this.buf) { out.set(b, off); off += b.length; }
      this.port.postMessage(out);
      this.buf = []; this.len = 0;
    }
    return true;
  }
}
registerProcessor('mic-processor', MicProcessor);
`;
const blob = new Blob([workletCode], { type: 'application/javascript' });
await audioCtx.audioWorklet.addModule(URL.createObjectURL(blob));
const worklet = new AudioWorkletNode(audioCtx, 'mic-processor');
source.connect(worklet);

transitionToSpeaking(false);

worklet.port.onmessage = async (e) => {
  const result = await lipsync.processAudioChunk(e.data);
  if (result) {
    for (let i = 0; i < result.frame_count; i++) {
      if (useVrmMode) {
        frameQueue.push(lipsync.getVrmFrame(result, i));
      } else {
        frameQueue.push(lipsync.getFrame(result, i));
      }
    }
  }
};

// To stop: stream.getTracks().forEach(t => t.stop());
//          audioCtx.close(); lipsync.reset(); transitionToIdle();

TTS Streaming Integration

When processing pre-generated TTS audio, slice into 100ms chunks and yield to the main thread periodically to prevent render freezes:

async function processTTSAudio(audioFloat32) {
  const chunkSize = 1600; // 100ms @ 16kHz
  const totalChunks = Math.ceil(audioFloat32.length / chunkSize);

  for (let i = 0; i < totalChunks; i++) {
    const start = i * chunkSize;
    const chunk = audioFloat32.slice(start, start + chunkSize);
    const isLast = (i === totalChunks - 1);

    const result = await lipsync.processAudioChunk(chunk, isLast);
    if (result) {
      for (let j = 0; j < result.frame_count; j++) {
        if (useVrmMode) {
          frameQueue.push(lipsync.getVrmFrame(result, j));
        } else {
          frameQueue.push(lipsync.getFrame(result, j));
        }
      }
    }

    // Yield every 3 chunks (~300ms) to keep rAF rendering smooth
    if ((i + 1) % 3 === 0) await new Promise(r => setTimeout(r, 0));
  }
}

Bundler Setup

Vite

Works out of the box.

Webpack

// webpack.config.js
module.exports = {
  experiments: { asyncWebAssembly: true },
};

Plain HTML (no bundler)

<script type="module">
  import { LipSyncWasmWrapper }
    from './node_modules/@goodganglabs/lipsync-wasm-v2/lipsync-wasm-wrapper.js';

  // IMPORTANT: wasmPath must be absolute (resolved from HTML page, not JS file)
  const lipsync = new LipSyncWasmWrapper({
    wasmPath: '/node_modules/@goodganglabs/lipsync-wasm-v2/lipsync_wasm_v2.js'
  });
  await lipsync.init();
</script>

CDN

<script type="importmap">
{ "imports": {
    "@goodganglabs/lipsync-wasm-v2": "https://your-cdn.com/lipsync-wasm-v2/lipsync-wasm-wrapper.js"
}}
</script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.min.js"></script>
<script type="module">
  import { LipSyncWasmWrapper } from '@goodganglabs/lipsync-wasm-v2';
  const lipsync = new LipSyncWasmWrapper({
    wasmPath: 'https://your-cdn.com/lipsync-wasm-v2/lipsync_wasm_v2.js'
  });
</script>

Deployment Notes

  • .wasm files must be served with Content-Type: application/wasm
  • CORS headers required for cross-origin WASM loading
  • ONNX Runtime Web must be loaded before init() is called

License

Proprietary — GoodGang Labs