@goodganglabs/lipsync-wasm-v2

v0.4.10

Published

16 days ago

WASM LipSync V2 - Student model 52-dim ARKit blendshape engine

@goodganglabs/lipsync-wasm-v2

WebAssembly-based real-time audio-to-blendshape lip sync engine. Converts 16kHz PCM audio into 52-dimensional ARKit-compatible blendshape frames at 30fps using a student distillation model.

Website · Start Building · GitHub

Which Version?

| | V2 (this package) | V1 | |---|---|---| | Dimensions | 52-dim ARKit | 111-dim ARKit | | Model | Student distillation | Phoneme classification | | Emotion | 5-dim emotion conditioning (neutral, joy, anger, sadness, surprise) | Not available | | Post-processing | Model-integrated | Manual | | Idle expression | Built-in IdleExpressionGenerator | Built-in IdleExpressionGenerator | | VAD | Not included | Built-in VoiceActivityDetector | | ONNX fallback | None (ONNX required) | Heuristic fallback | | Recommendation | Most use cases | Full expression control needed |

Features

52-dim ARKit blendshape output (direct prediction, no intermediate phoneme step)
VRM 18-dim blendshape output (automatic ARKit→VRM conversion)
Emotion-conditioned inference — 5-dim vector: neutral, joy, anger, sadness, surprise
Real-time emotion switching — reInferWithEmotion() re-runs inference without re-uploading audio
Streaming ONNX model with LSTM state carry (chunk_size=5, ~167ms latency)
Built-in idle expression generator (eye blinks + micro expressions)
Batch and real-time streaming processing
Built-in expression preset blending
Embedded VRMA bone animation data (idle + speaking)
30-day free trial (no license key required)
Runs entirely in the browser via WebAssembly

Requirements

onnxruntime-web >=1.17.0 (required — V2 has no heuristic fallback)

<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.min.js"></script>

Installation

npm install @goodganglabs/lipsync-wasm-v2

Quick Start

Minimal Example (Batch Processing)

import { LipSyncWasmWrapper } from '@goodganglabs/lipsync-wasm-v2';

const lipsync = new LipSyncWasmWrapper();
await lipsync.init();

// Process an audio file
const result = await lipsync.processFile(audioFile);

// Each frame is a number[52] array of ARKit blendshape weights
for (let i = 0; i < result.frame_count; i++) {
  const frame = lipsync.getFrame(result, i);
  applyToAvatar(frame); // your rendering code
}

lipsync.dispose();

Complete Working Example (Three.js + VRM)

Copy-paste ready. This example handles everything: VRM loading, VRMA bone animations (idle/speaking crossfade), blendshape application, 30fps frame consumption, and audio-synced playback.

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <script type="importmap">
  { "imports": {
      "three": "https://cdn.jsdelivr.net/npm/[email protected]/build/three.module.js",
      "three/addons/": "https://cdn.jsdelivr.net/npm/[email protected]/examples/jsm/",
      "@pixiv/three-vrm": "https://cdn.jsdelivr.net/npm/@pixiv/[email protected]/lib/three-vrm.module.min.js",
      "@pixiv/three-vrm-animation": "https://cdn.jsdelivr.net/npm/@pixiv/[email protected]/lib/three-vrm-animation.module.min.js"
  }}
  </script>
  <script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.min.js"></script>
</head>
<body>

<canvas id="avatar-canvas" style="width:100%; height:500px;"></canvas>
<input type="file" id="audio-file" accept="audio/*">

<script type="module">
import * as THREE from 'three';
import { GLTFLoader } from 'three/addons/loaders/GLTFLoader.js';
import { OrbitControls } from 'three/addons/controls/OrbitControls.js';
import { VRMLoaderPlugin, VRMUtils } from '@pixiv/three-vrm';
import { VRMAnimationLoaderPlugin, createVRMAnimationClip } from '@pixiv/three-vrm-animation';
import { LipSyncWasmWrapper } from '@goodganglabs/lipsync-wasm-v2';

// ============================================================
// Step 1: ARKit Blendshape Mapping (52-dim)
// ============================================================
const ARKIT_NAMES = {
  0:'browDownLeft',1:'browDownRight',2:'browInnerUp',3:'browOuterUpLeft',4:'browOuterUpRight',
  5:'cheekPuff',6:'cheekSquintLeft',7:'cheekSquintRight',8:'eyeBlinkLeft',9:'eyeBlinkRight',
  10:'eyeLookDownLeft',11:'eyeLookDownRight',12:'eyeLookInLeft',13:'eyeLookInRight',
  14:'eyeLookOutLeft',15:'eyeLookOutRight',16:'eyeLookUpLeft',17:'eyeLookUpRight',
  18:'eyeSquintLeft',19:'eyeSquintRight',20:'eyeWideLeft',21:'eyeWideRight',
  22:'jawForward',23:'jawLeft',24:'jawOpen',25:'jawRight',
  26:'mouthClose',27:'mouthDimpleLeft',28:'mouthDimpleRight',
  29:'mouthFrownLeft',30:'mouthFrownRight',31:'mouthFunnel',
  32:'mouthLeft',33:'mouthLowerDownLeft',34:'mouthLowerDownRight',
  35:'mouthPressLeft',36:'mouthPressRight',37:'mouthPucker',
  38:'mouthRight',39:'mouthRollLower',40:'mouthRollUpper',
  41:'mouthShrugLower',42:'mouthShrugUpper',43:'mouthSmileLeft',44:'mouthSmileRight',
  45:'mouthStretchLeft',46:'mouthStretchRight',47:'mouthUpperUpLeft',48:'mouthUpperUpRight',
  49:'noseSneerLeft',50:'noseSneerRight',51:'tongueOut'
};

function applyBlendshapes(vrm, frame) {
  if (!vrm?.expressionManager) return;
  for (const [idx, name] of Object.entries(ARKIT_NAMES)) {
    vrm.expressionManager.setValue(name, frame[idx] || 0);
  }
}

// ============================================================
// Step 2: Three.js Scene
// ============================================================
const canvas = document.getElementById('avatar-canvas');
const scene = new THREE.Scene();
scene.background = new THREE.Color(0x1a1a2e);

const camera = new THREE.PerspectiveCamera(30, canvas.clientWidth / canvas.clientHeight, 0.1, 100);
camera.position.set(0, 1.25, 0.5);

const renderer = new THREE.WebGLRenderer({ canvas, antialias: true });
renderer.setSize(canvas.clientWidth, canvas.clientHeight);
renderer.setPixelRatio(Math.min(window.devicePixelRatio, 2));

const controls = new OrbitControls(camera, canvas);
controls.target.set(0, 1.25, 0);
controls.enableDamping = true;

scene.add(new THREE.AmbientLight(0xffffff, 2.0));
const dirLight = new THREE.DirectionalLight(0xffffff, 1.1);
dirLight.position.set(1, 3, 2);
scene.add(dirLight);

// ============================================================
// Step 3: Load VRM Avatar
// ============================================================
const loader = new GLTFLoader();
loader.register(p => new VRMLoaderPlugin(p));

const gltf = await new Promise((res, rej) => loader.load('your-avatar.vrm', res, undefined, rej));
const vrm = gltf.userData.vrm;
VRMUtils.removeUnnecessaryVertices(gltf.scene);
VRMUtils.removeUnnecessaryJoints(gltf.scene);
scene.add(vrm.scene);

const mixer = new THREE.AnimationMixer(vrm.scene);

// ============================================================
// Step 3.5: Detect VRM Mode (ARKit 52-dim vs VRM 18-dim)
// ============================================================
// VRoid Hub models use VRM expressions (aa, ih, ou, ee, oh, blink, etc.)
// instead of ARKit names (jawOpen, eyeBlinkLeft, etc.).
// Detect which format the model supports to apply the correct blendshapes.

const VRM_NAMES = [
  'aa','ih','ou','ee','oh',           // lip-sync (5)
  'happy','angry','sad','relaxed','surprised', // emotions (5)
  'blink','blinkLeft','blinkRight',   // blink (3)
  'lookUp','lookDown','lookLeft','lookRight', // gaze (4)
  'neutral'                            // base (1)
];

let useVrmMode = false;

function detectVrmMode() {
  if (!vrm?.expressionManager) return false;
  const exprMap = vrm.expressionManager.expressionMap || vrm.expressionManager._expressionMap || {};
  const names = Object.keys(exprMap);
  const arkitProbes = ['jawOpen','mouthFunnel','mouthPucker','eyeBlinkLeft','eyeBlinkRight'];
  const vrmProbes = ['aa','ih','ou','ee','oh'];
  const hasArkit = arkitProbes.filter(n => names.includes(n)).length >= 3;
  const hasVrm = vrmProbes.filter(n => names.includes(n)).length >= 3;
  return !hasArkit && hasVrm;
}

useVrmMode = detectVrmMode();
console.log('VRM mode:', useVrmMode);

function applyVrmBlendshapes(vrm, vrmFrame) {
  if (!vrm?.expressionManager) return;
  for (let i = 0; i < VRM_NAMES.length; i++) {
    vrm.expressionManager.setValue(VRM_NAMES[i], vrmFrame[i] || 0);
  }
}

// ============================================================
// Step 4: Init LipSync
// ============================================================
const lipsync = new LipSyncWasmWrapper();
// For production, pass your license key:
//   await lipsync.init({ licenseKey: 'ggl_your_key_here' });
await lipsync.init({
  onProgress: (stage, pct) => console.log(`Init: ${stage} ${pct}%`)
});

// ============================================================
// Step 5: Load VRMA Bone Animations (idle + speaking)
// ============================================================
// The package embeds two VRMA animations: idle pose and speaking pose.
// Use AnimationMixer to crossfade between them when audio plays.

const vrmaData = lipsync.getVrmaBytes();

async function loadVRMA(bytes) {
  const blob = new Blob([bytes], { type: 'application/octet-stream' });
  const url = URL.createObjectURL(blob);
  const vrmaLoader = new GLTFLoader();
  vrmaLoader.register(p => new VRMAnimationLoaderPlugin(p));
  const g = await new Promise((res, rej) => vrmaLoader.load(url, res, undefined, rej));
  URL.revokeObjectURL(url);
  return g.userData.vrmAnimations[0];
}

const idleAnim = await loadVRMA(vrmaData.idle);
const speakingAnim = await loadVRMA(vrmaData.speaking);

const idleClip = createVRMAnimationClip(idleAnim, vrm);
const speakingClip = createVRMAnimationClip(speakingAnim, vrm);

const idleAction = mixer.clipAction(idleClip);
const speakingAction = mixer.clipAction(speakingClip);

// LoopPingPong prevents visible seam when idle animation loops
idleAction.setLoop(THREE.LoopPingPong);
speakingAction.setLoop(THREE.LoopRepeat);
idleAction.setEffectiveWeight(1);
idleAction.play();
speakingAction.setEffectiveWeight(0);
speakingAction.play();

// Crossfade state
let isSpeaking = false;
let crossFadeProgress = 0; // 0 = idle, 1 = speaking

function transitionToSpeaking(instant) {
  isSpeaking = true;
  if (instant) crossFadeProgress = 1;
}
function transitionToIdle() {
  isSpeaking = false;
}
function updateBoneWeights(delta) {
  const target = isSpeaking ? 1 : 0;
  if (Math.abs(crossFadeProgress - target) > 0.001) {
    // Asymmetric crossfade: 0.8s into speaking, 1.0s back to idle
    const duration = isSpeaking ? 0.8 : 1.0;
    const step = delta / duration;
    crossFadeProgress = target > crossFadeProgress
      ? Math.min(crossFadeProgress + step, 1)
      : Math.max(crossFadeProgress - step, 0);
  }
  const t = crossFadeProgress;
  const w = t * t * (3 - 2 * t); // smoothstep
  speakingAction.setEffectiveWeight(w);
  idleAction.setEffectiveWeight(1 - w);
}

// ============================================================
// Step 5.5: Idle Expression Generator
// ============================================================
// Procedural eye blinks + micro expressions when no audio is playing.
const idle = new lipsync.wasmModule.IdleExpressionGenerator();
let elapsedSeconds = 0;
let prevFrame = null;

// ============================================================
// Step 6: Frame Queue + Render Loop
// ============================================================
// Frames are consumed at 30fps regardless of monitor refresh rate.
const frameQueue = [];
let streamTimeAccum = 0;
const FRAME_INTERVAL = 1 / 30;

const clock = new THREE.Clock();

function animate() {
  requestAnimationFrame(animate);
  const delta = clock.getDelta();
  elapsedSeconds += delta;
  controls.update();

  // Bone animation crossfade
  updateBoneWeights(delta);
  mixer.update(delta);

  // Consume blendshape frames at 30fps
  streamTimeAccum += delta;
  while (streamTimeAccum >= FRAME_INTERVAL) {
    streamTimeAccum -= FRAME_INTERVAL;
    if (frameQueue.length > 0) {
      prevFrame = frameQueue.shift();
      if (useVrmMode) {
        applyVrmBlendshapes(vrm, prevFrame);
      } else {
        applyBlendshapes(vrm, prevFrame);
      }
    }
  }

  // Idle expressions when queue is empty
  if (frameQueue.length === 0 && !isSpeaking) {
    const idleFrame = idle.get_frame(elapsedSeconds);
    let frame = idleFrame;
    if (prevFrame) {
      frame = prevFrame.map((v, i) => v + 0.15 * ((idleFrame[i] || 0) - v));
      prevFrame = frame;
    }
    if (useVrmMode) {
      const vrmFrame = lipsync.wasmModule.convert_arkit_to_vrm(frame);
      applyVrmBlendshapes(vrm, Array.from(vrmFrame));
    } else {
      applyBlendshapes(vrm, frame);
    }
  }

  vrm.update(delta);
  renderer.render(scene, camera);
}
animate();

// ============================================================
// Step 7: Audio File Playback (batch processing)
// ============================================================
document.getElementById('audio-file').addEventListener('change', async (e) => {
  const file = e.target.files[0];
  if (!file) return;

  // Process blendshapes
  const result = await lipsync.processFile(file);

  // Fill frame queue
  frameQueue.length = 0;
  for (let i = 0; i < result.frame_count; i++) {
    if (useVrmMode) {
      frameQueue.push(lipsync.getVrmFrame(result, i));
    } else {
      frameQueue.push(lipsync.getFrame(result, i));
    }
  }

  // Switch to speaking pose immediately
  transitionToSpeaking(true);

  // Play audio in sync
  const arrayBuffer = await file.arrayBuffer();
  const audioCtx = new AudioContext();
  const audioBuffer = await audioCtx.decodeAudioData(arrayBuffer);
  const source = audioCtx.createBufferSource();
  source.buffer = audioBuffer;
  source.connect(audioCtx.destination);
  source.start();
  source.onended = () => transitionToIdle();
});
</script>
</body>
</html>

Licensing

The first call to init() automatically starts a 30-day free trial (no signup, no API key). For production use, pass your license key:

await lipsync.init({ licenseKey: 'ggl_your_key_here' });

| | Free Trial | Licensed | |---|---|---| | Duration | 30 days from first use | Unlimited | | Setup | None (automatic) | Pass licenseKey to init() | | Domain restriction | None | Configurable per key | | Features | Full access | Full access |

Contact GoodGang Labs for license keys.

API Reference

Constructor

new LipSyncWasmWrapper(options?: { wasmPath?: string })

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | wasmPath | string | './lipsync_wasm_v2.js' | Path to the WASM glue module |

Important: wasmPath is resolved relative to the HTML page, not the wrapper JS file.
With bundlers (Vite, Webpack): the default './lipsync_wasm_v2.js' works automatically.
Without a bundler (plain <script type="module">): use an absolute path:
new LipSyncWasmWrapper({
  wasmPath: '/node_modules/@goodganglabs/lipsync-wasm-v2/lipsync_wasm_v2.js'
})

`init(options?): Promise<InitResult>`

Initializes the WASM runtime, loads the ONNX model, and applies the expression preset. ONNX Runtime must be loaded before calling this method.

| Option | Type | Default | Description | |--------|------|---------|-------------| | licenseKey | string | — | GoodGang Labs license key. Omit for 30-day free trial. | | onProgress | (stage, percent) => void | — | Progress callback. Stages: 'wasm', 'license', 'decrypt', 'onnx' | | preset | boolean \| string | true | true = built-in preset, URL string = custom preset JSON, false = disabled |

Returns { mode: 'v2-onnx' }. Throws if ONNX Runtime is not available.

Properties

| Property | Type | Description | |----------|------|-------------| | ready | boolean | true after init() completes | | modelVersion | 'v2' | Always 'v2' | | blendshapeDim | 52 | Output dimension per frame | | wasmModule | object | Direct access to WASM exports (for IdleExpressionGenerator, convert_arkit_to_vrm) |

Processing Methods

| Method | Input | Output | Use Case | |--------|-------|--------|----------| | processFile(file) | File | Promise<ProcessResult> | Audio file upload | | processAudio(audio) | Float32Array (16kHz) | Promise<ProcessResult> | Raw PCM buffer | | processAudioBuffer(buf) | AudioBuffer | Promise<ProcessResult> | Web Audio API buffer | | processAudioChunk(chunk, isLast?) | Float32Array | Promise<ProcessResult \| null> | Real-time streaming |

Emotion Control

| Method | Description | |--------|-------------| | setEmotion(vec) | Set emotion vector [neutral, joy, anger, sadness, surprise]. Each value 0–1. Default: [0,0,0,0,0] | | getEmotion() | Returns current emotion vector as number[5] | | reInferWithEmotion(vec?) | Re-run ONNX inference on cached audio features with new emotion. No audio re-upload needed. Requires prior processAudio() / processFile() call. |

// Set emotion before processing (applies to next inference)
lipsync.setEmotion([0, 0.8, 0, 0, 0]); // joy at 80%
const result = await lipsync.processFile(audioFile);

// Change emotion in real-time (re-infers without re-uploading audio)
const joyResult = await lipsync.reInferWithEmotion([0, 1.0, 0, 0, 0]);
const angryResult = await lipsync.reInferWithEmotion([0, 0, 0.8, 0, 0]);

`getFrame(result, frameIndex): number[]`

Extracts a single frame from ProcessResult. Returns number[52].

`getVrmFrame(result, frameIndex): number[]`

Extracts a single VRM 18-dim frame from ProcessResult. Returns number[18] with VRM expression weights. The WASM engine automatically converts ARKit 52-dim → VRM 18-dim with natural triangle blinks.

Available when result.vrm_blendshapes exists (always present in batch/streaming results).

`getVrmaBytes(): { idle: Uint8Array, speaking: Uint8Array }`

Returns embedded VRMA bone animation data. Load with GLTFLoader + VRMAnimationLoaderPlugin (see Complete Example above).

`getVrmExpressionNames(): string[]`

Returns the 18 VRM expression names in order: ['aa', 'ih', 'ou', 'ee', 'oh', 'happy', 'angry', 'sad', 'relaxed', 'surprised', 'blink', 'blinkLeft', 'blinkRight', 'lookUp', 'lookDown', 'lookLeft', 'lookRight', 'neutral'].

`reset(): void`

Resets internal state and ends any active streaming session.

`dispose(): void`

Releases all WASM and ONNX resources.

ProcessResult

{
  blendshapes: number[];       // Flat array: frame_count * 52 values
  vrm_blendshapes?: number[];  // Flat array: frame_count * 18 VRM values (use getVrmFrame() to extract)
  frame_count: number;         // Number of 30fps frames
  fps: number;                 // Always 30
  mode: string;                // 'v2-onnx' | 'v2-streaming-onnx'
}

ARKit Blendshape Index

Full 52-element index mapping:

| Index | Name | Index | Name | |-------|------|-------|------| | 0 | browDownLeft | 26 | mouthClose | | 1 | browDownRight | 27 | mouthDimpleLeft | | 2 | browInnerUp | 28 | mouthDimpleRight | | 3 | browOuterUpLeft | 29 | mouthFrownLeft | | 4 | browOuterUpRight | 30 | mouthFrownRight | | 5 | cheekPuff | 31 | mouthFunnel | | 6 | cheekSquintLeft | 32 | mouthLeft | | 7 | cheekSquintRight | 33 | mouthLowerDownLeft | | 8 | eyeBlinkLeft | 34 | mouthLowerDownRight | | 9 | eyeBlinkRight | 35 | mouthPressLeft | | 10 | eyeLookDownLeft | 36 | mouthPressRight | | 11 | eyeLookDownRight | 37 | mouthPucker | | 12 | eyeLookInLeft | 38 | mouthRight | | 13 | eyeLookInRight | 39 | mouthRollLower | | 14 | eyeLookOutLeft | 40 | mouthRollUpper | | 15 | eyeLookOutRight | 41 | mouthShrugLower | | 16 | eyeLookUpLeft | 42 | mouthShrugUpper | | 17 | eyeLookUpRight | 43 | mouthSmileLeft | | 18 | eyeSquintLeft | 44 | mouthSmileRight | | 19 | eyeSquintRight | 45 | mouthStretchLeft | | 20 | eyeWideLeft | 46 | mouthStretchRight | | 21 | eyeWideRight | 47 | mouthUpperUpLeft | | 22 | jawForward | 48 | mouthUpperUpRight | | 23 | jawLeft | 49 | noseSneerLeft | | 24 | jawOpen | 50 | noseSneerRight | | 25 | jawRight | 51 | tongueOut |

Advanced Features

Bone Animation Tips

The package embeds idle and speaking VRMA bone animations. Two key recommendations for smooth results:

1. Use LoopPingPong for idle animation — The idle clip's first and last keyframes don't perfectly match, so LoopRepeat causes a visible jump at the loop boundary. LoopPingPong (forward→backward→forward) eliminates this seam.

2. Use asymmetric crossfade durations — A slower transition into speaking (0.8s) feels more natural than an instant snap. The return to idle can be slightly slower (1.0s) for a relaxed feel. Apply smoothstep to the linear progress for ease-in/ease-out.

// Idle: PingPong to avoid loop seam
idleAction.setLoop(THREE.LoopPingPong);
speakingAction.setLoop(THREE.LoopRepeat);

// Asymmetric crossfade: 0.8s into speaking, 1.0s back to idle
function updateBoneWeights(delta) {
  const target = isSpeaking ? 1 : 0;
  const duration = isSpeaking ? 0.8 : 1.0;
  const step = delta / duration;
  crossFadeProgress = target > crossFadeProgress
    ? Math.min(crossFadeProgress + step, 1)
    : Math.max(crossFadeProgress - step, 0);
  const w = crossFadeProgress * crossFadeProgress * (3 - 2 * crossFadeProgress); // smoothstep
  speakingAction.setEffectiveWeight(w);
  idleAction.setEffectiveWeight(1 - w);
}

IdleExpressionGenerator

Procedural idle animation: eye blinks (2.5–4.5s random interval, 15% double-blink), micro expressions (sinusoidal). See Step 5.5 in the Complete Example.

const idle = new lipsync.wasmModule.IdleExpressionGenerator();

// In render loop (when no audio is playing):
const frame = idle.get_frame(elapsedSeconds); // number[52] (ARKit)

// For VRM mode, convert to 18-dim:
if (useVrmMode) {
  const vrmFrame = lipsync.wasmModule.convert_arkit_to_vrm(frame);
  applyVrmBlendshapes(vrm, Array.from(vrmFrame));
} else {
  applyBlendshapes(vrm, frame);
}

Real-time Microphone Streaming

Use AudioWorklet to batch 1600 samples (100ms @ 16kHz), feed to processAudioChunk(), push frames to the queue. The render loop (Step 6 in the Complete Example) consumes them at 30fps automatically.

const stream = await navigator.mediaDevices.getUserMedia({
  audio: { sampleRate: 16000, channelCount: 1, echoCancellation: true }
});
const audioCtx = new AudioContext({ sampleRate: 16000 });
const source = audioCtx.createMediaStreamSource(stream);

// AudioWorklet batches 128-sample inputs into 1600-sample chunks
const workletCode = `
class MicProcessor extends AudioWorkletProcessor {
  constructor() { super(); this.buf = []; this.len = 0; }
  process(inputs) {
    const d = inputs[0][0];
    if (d) { this.buf.push(new Float32Array(d)); this.len += d.length; }
    if (this.len >= 1600) {
      const out = new Float32Array(this.len);
      let off = 0;
      for (const b of this.buf) { out.set(b, off); off += b.length; }
      this.port.postMessage(out);
      this.buf = []; this.len = 0;
    }
    return true;
  }
}
registerProcessor('mic-processor', MicProcessor);
`;
const blob = new Blob([workletCode], { type: 'application/javascript' });
await audioCtx.audioWorklet.addModule(URL.createObjectURL(blob));
const worklet = new AudioWorkletNode(audioCtx, 'mic-processor');
source.connect(worklet);

transitionToSpeaking(false);

worklet.port.onmessage = async (e) => {
  const result = await lipsync.processAudioChunk(e.data);
  if (result) {
    for (let i = 0; i < result.frame_count; i++) {
      if (useVrmMode) {
        frameQueue.push(lipsync.getVrmFrame(result, i));
      } else {
        frameQueue.push(lipsync.getFrame(result, i));
      }
    }
  }
};

// To stop: stream.getTracks().forEach(t => t.stop());
//          audioCtx.close(); lipsync.reset(); transitionToIdle();

TTS Streaming Integration

When processing pre-generated TTS audio, slice into 100ms chunks and yield to the main thread periodically to prevent render freezes:

async function processTTSAudio(audioFloat32) {
  const chunkSize = 1600; // 100ms @ 16kHz
  const totalChunks = Math.ceil(audioFloat32.length / chunkSize);

  for (let i = 0; i < totalChunks; i++) {
    const start = i * chunkSize;
    const chunk = audioFloat32.slice(start, start + chunkSize);
    const isLast = (i === totalChunks - 1);

    const result = await lipsync.processAudioChunk(chunk, isLast);
    if (result) {
      for (let j = 0; j < result.frame_count; j++) {
        if (useVrmMode) {
          frameQueue.push(lipsync.getVrmFrame(result, j));
        } else {
          frameQueue.push(lipsync.getFrame(result, j));
        }
      }
    }

    // Yield every 3 chunks (~300ms) to keep rAF rendering smooth
    if ((i + 1) % 3 === 0) await new Promise(r => setTimeout(r, 0));
  }
}

Bundler Setup

Vite

Works out of the box.

Webpack

// webpack.config.js
module.exports = {
  experiments: { asyncWebAssembly: true },
};

Plain HTML (no bundler)

<script type="module">
  import { LipSyncWasmWrapper }
    from './node_modules/@goodganglabs/lipsync-wasm-v2/lipsync-wasm-wrapper.js';

  // IMPORTANT: wasmPath must be absolute (resolved from HTML page, not JS file)
  const lipsync = new LipSyncWasmWrapper({
    wasmPath: '/node_modules/@goodganglabs/lipsync-wasm-v2/lipsync_wasm_v2.js'
  });
  await lipsync.init();
</script>

CDN

<script type="importmap">
{ "imports": {
    "@goodganglabs/lipsync-wasm-v2": "https://your-cdn.com/lipsync-wasm-v2/lipsync-wasm-wrapper.js"
}}
</script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.min.js"></script>
<script type="module">
  import { LipSyncWasmWrapper } from '@goodganglabs/lipsync-wasm-v2';
  const lipsync = new LipSyncWasmWrapper({
    wasmPath: 'https://your-cdn.com/lipsync-wasm-v2/lipsync_wasm_v2.js'
  });
</script>

Deployment Notes

.wasm files must be served with Content-Type: application/wasm
CORS headers required for cross-origin WASM loading
ONNX Runtime Web must be loaded before init() is called

License

Proprietary — GoodGang Labs

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@goodganglabs/lipsync-wasm-v2

Which Version?

Features

Requirements

Installation

Quick Start

Minimal Example (Batch Processing)

Complete Working Example (Three.js + VRM)

Licensing

API Reference

Constructor

init(options?): Promise<InitResult>

Properties

Processing Methods

Emotion Control

getFrame(result, frameIndex): number[]

getVrmFrame(result, frameIndex): number[]

getVrmaBytes(): { idle: Uint8Array, speaking: Uint8Array }

getVrmExpressionNames(): string[]

reset(): void

dispose(): void

ProcessResult

ARKit Blendshape Index

Advanced Features

Bone Animation Tips

IdleExpressionGenerator

Real-time Microphone Streaming

TTS Streaming Integration

Bundler Setup

Vite

Webpack

Plain HTML (no bundler)

CDN

Deployment Notes

License

`init(options?): Promise<InitResult>`

`getFrame(result, frameIndex): number[]`

`getVrmFrame(result, frameIndex): number[]`

`getVrmaBytes(): { idle: Uint8Array, speaking: Uint8Array }`

`getVrmExpressionNames(): string[]`

`reset(): void`

`dispose(): void`