@goodganglabs/lipsync-wasm-v2
v0.4.10
Published
WASM LipSync V2 - Student model 52-dim ARKit blendshape engine
Maintainers
Readme
@goodganglabs/lipsync-wasm-v2
WebAssembly-based real-time audio-to-blendshape lip sync engine. Converts 16kHz PCM audio into 52-dimensional ARKit-compatible blendshape frames at 30fps using a student distillation model.
Website · Start Building · GitHub
Which Version?
| | V2 (this package) | V1 |
|---|---|---|
| Dimensions | 52-dim ARKit | 111-dim ARKit |
| Model | Student distillation | Phoneme classification |
| Emotion | 5-dim emotion conditioning (neutral, joy, anger, sadness, surprise) | Not available |
| Post-processing | Model-integrated | Manual |
| Idle expression | Built-in IdleExpressionGenerator | Built-in IdleExpressionGenerator |
| VAD | Not included | Built-in VoiceActivityDetector |
| ONNX fallback | None (ONNX required) | Heuristic fallback |
| Recommendation | Most use cases | Full expression control needed |
Features
- 52-dim ARKit blendshape output (direct prediction, no intermediate phoneme step)
- VRM 18-dim blendshape output (automatic ARKit→VRM conversion)
- Emotion-conditioned inference — 5-dim vector: neutral, joy, anger, sadness, surprise
- Real-time emotion switching —
reInferWithEmotion()re-runs inference without re-uploading audio - Streaming ONNX model with LSTM state carry (chunk_size=5, ~167ms latency)
- Built-in idle expression generator (eye blinks + micro expressions)
- Batch and real-time streaming processing
- Built-in expression preset blending
- Embedded VRMA bone animation data (idle + speaking)
- 30-day free trial (no license key required)
- Runs entirely in the browser via WebAssembly
Requirements
- onnxruntime-web
>=1.17.0(required — V2 has no heuristic fallback)
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.min.js"></script>Installation
npm install @goodganglabs/lipsync-wasm-v2Quick Start
Minimal Example (Batch Processing)
import { LipSyncWasmWrapper } from '@goodganglabs/lipsync-wasm-v2';
const lipsync = new LipSyncWasmWrapper();
await lipsync.init();
// Process an audio file
const result = await lipsync.processFile(audioFile);
// Each frame is a number[52] array of ARKit blendshape weights
for (let i = 0; i < result.frame_count; i++) {
const frame = lipsync.getFrame(result, i);
applyToAvatar(frame); // your rendering code
}
lipsync.dispose();Complete Working Example (Three.js + VRM)
Copy-paste ready. This example handles everything: VRM loading, VRMA bone animations (idle/speaking crossfade), blendshape application, 30fps frame consumption, and audio-synced playback.
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<script type="importmap">
{ "imports": {
"three": "https://cdn.jsdelivr.net/npm/[email protected]/build/three.module.js",
"three/addons/": "https://cdn.jsdelivr.net/npm/[email protected]/examples/jsm/",
"@pixiv/three-vrm": "https://cdn.jsdelivr.net/npm/@pixiv/[email protected]/lib/three-vrm.module.min.js",
"@pixiv/three-vrm-animation": "https://cdn.jsdelivr.net/npm/@pixiv/[email protected]/lib/three-vrm-animation.module.min.js"
}}
</script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.min.js"></script>
</head>
<body>
<canvas id="avatar-canvas" style="width:100%; height:500px;"></canvas>
<input type="file" id="audio-file" accept="audio/*">
<script type="module">
import * as THREE from 'three';
import { GLTFLoader } from 'three/addons/loaders/GLTFLoader.js';
import { OrbitControls } from 'three/addons/controls/OrbitControls.js';
import { VRMLoaderPlugin, VRMUtils } from '@pixiv/three-vrm';
import { VRMAnimationLoaderPlugin, createVRMAnimationClip } from '@pixiv/three-vrm-animation';
import { LipSyncWasmWrapper } from '@goodganglabs/lipsync-wasm-v2';
// ============================================================
// Step 1: ARKit Blendshape Mapping (52-dim)
// ============================================================
const ARKIT_NAMES = {
0:'browDownLeft',1:'browDownRight',2:'browInnerUp',3:'browOuterUpLeft',4:'browOuterUpRight',
5:'cheekPuff',6:'cheekSquintLeft',7:'cheekSquintRight',8:'eyeBlinkLeft',9:'eyeBlinkRight',
10:'eyeLookDownLeft',11:'eyeLookDownRight',12:'eyeLookInLeft',13:'eyeLookInRight',
14:'eyeLookOutLeft',15:'eyeLookOutRight',16:'eyeLookUpLeft',17:'eyeLookUpRight',
18:'eyeSquintLeft',19:'eyeSquintRight',20:'eyeWideLeft',21:'eyeWideRight',
22:'jawForward',23:'jawLeft',24:'jawOpen',25:'jawRight',
26:'mouthClose',27:'mouthDimpleLeft',28:'mouthDimpleRight',
29:'mouthFrownLeft',30:'mouthFrownRight',31:'mouthFunnel',
32:'mouthLeft',33:'mouthLowerDownLeft',34:'mouthLowerDownRight',
35:'mouthPressLeft',36:'mouthPressRight',37:'mouthPucker',
38:'mouthRight',39:'mouthRollLower',40:'mouthRollUpper',
41:'mouthShrugLower',42:'mouthShrugUpper',43:'mouthSmileLeft',44:'mouthSmileRight',
45:'mouthStretchLeft',46:'mouthStretchRight',47:'mouthUpperUpLeft',48:'mouthUpperUpRight',
49:'noseSneerLeft',50:'noseSneerRight',51:'tongueOut'
};
function applyBlendshapes(vrm, frame) {
if (!vrm?.expressionManager) return;
for (const [idx, name] of Object.entries(ARKIT_NAMES)) {
vrm.expressionManager.setValue(name, frame[idx] || 0);
}
}
// ============================================================
// Step 2: Three.js Scene
// ============================================================
const canvas = document.getElementById('avatar-canvas');
const scene = new THREE.Scene();
scene.background = new THREE.Color(0x1a1a2e);
const camera = new THREE.PerspectiveCamera(30, canvas.clientWidth / canvas.clientHeight, 0.1, 100);
camera.position.set(0, 1.25, 0.5);
const renderer = new THREE.WebGLRenderer({ canvas, antialias: true });
renderer.setSize(canvas.clientWidth, canvas.clientHeight);
renderer.setPixelRatio(Math.min(window.devicePixelRatio, 2));
const controls = new OrbitControls(camera, canvas);
controls.target.set(0, 1.25, 0);
controls.enableDamping = true;
scene.add(new THREE.AmbientLight(0xffffff, 2.0));
const dirLight = new THREE.DirectionalLight(0xffffff, 1.1);
dirLight.position.set(1, 3, 2);
scene.add(dirLight);
// ============================================================
// Step 3: Load VRM Avatar
// ============================================================
const loader = new GLTFLoader();
loader.register(p => new VRMLoaderPlugin(p));
const gltf = await new Promise((res, rej) => loader.load('your-avatar.vrm', res, undefined, rej));
const vrm = gltf.userData.vrm;
VRMUtils.removeUnnecessaryVertices(gltf.scene);
VRMUtils.removeUnnecessaryJoints(gltf.scene);
scene.add(vrm.scene);
const mixer = new THREE.AnimationMixer(vrm.scene);
// ============================================================
// Step 3.5: Detect VRM Mode (ARKit 52-dim vs VRM 18-dim)
// ============================================================
// VRoid Hub models use VRM expressions (aa, ih, ou, ee, oh, blink, etc.)
// instead of ARKit names (jawOpen, eyeBlinkLeft, etc.).
// Detect which format the model supports to apply the correct blendshapes.
const VRM_NAMES = [
'aa','ih','ou','ee','oh', // lip-sync (5)
'happy','angry','sad','relaxed','surprised', // emotions (5)
'blink','blinkLeft','blinkRight', // blink (3)
'lookUp','lookDown','lookLeft','lookRight', // gaze (4)
'neutral' // base (1)
];
let useVrmMode = false;
function detectVrmMode() {
if (!vrm?.expressionManager) return false;
const exprMap = vrm.expressionManager.expressionMap || vrm.expressionManager._expressionMap || {};
const names = Object.keys(exprMap);
const arkitProbes = ['jawOpen','mouthFunnel','mouthPucker','eyeBlinkLeft','eyeBlinkRight'];
const vrmProbes = ['aa','ih','ou','ee','oh'];
const hasArkit = arkitProbes.filter(n => names.includes(n)).length >= 3;
const hasVrm = vrmProbes.filter(n => names.includes(n)).length >= 3;
return !hasArkit && hasVrm;
}
useVrmMode = detectVrmMode();
console.log('VRM mode:', useVrmMode);
function applyVrmBlendshapes(vrm, vrmFrame) {
if (!vrm?.expressionManager) return;
for (let i = 0; i < VRM_NAMES.length; i++) {
vrm.expressionManager.setValue(VRM_NAMES[i], vrmFrame[i] || 0);
}
}
// ============================================================
// Step 4: Init LipSync
// ============================================================
const lipsync = new LipSyncWasmWrapper();
// For production, pass your license key:
// await lipsync.init({ licenseKey: 'ggl_your_key_here' });
await lipsync.init({
onProgress: (stage, pct) => console.log(`Init: ${stage} ${pct}%`)
});
// ============================================================
// Step 5: Load VRMA Bone Animations (idle + speaking)
// ============================================================
// The package embeds two VRMA animations: idle pose and speaking pose.
// Use AnimationMixer to crossfade between them when audio plays.
const vrmaData = lipsync.getVrmaBytes();
async function loadVRMA(bytes) {
const blob = new Blob([bytes], { type: 'application/octet-stream' });
const url = URL.createObjectURL(blob);
const vrmaLoader = new GLTFLoader();
vrmaLoader.register(p => new VRMAnimationLoaderPlugin(p));
const g = await new Promise((res, rej) => vrmaLoader.load(url, res, undefined, rej));
URL.revokeObjectURL(url);
return g.userData.vrmAnimations[0];
}
const idleAnim = await loadVRMA(vrmaData.idle);
const speakingAnim = await loadVRMA(vrmaData.speaking);
const idleClip = createVRMAnimationClip(idleAnim, vrm);
const speakingClip = createVRMAnimationClip(speakingAnim, vrm);
const idleAction = mixer.clipAction(idleClip);
const speakingAction = mixer.clipAction(speakingClip);
// LoopPingPong prevents visible seam when idle animation loops
idleAction.setLoop(THREE.LoopPingPong);
speakingAction.setLoop(THREE.LoopRepeat);
idleAction.setEffectiveWeight(1);
idleAction.play();
speakingAction.setEffectiveWeight(0);
speakingAction.play();
// Crossfade state
let isSpeaking = false;
let crossFadeProgress = 0; // 0 = idle, 1 = speaking
function transitionToSpeaking(instant) {
isSpeaking = true;
if (instant) crossFadeProgress = 1;
}
function transitionToIdle() {
isSpeaking = false;
}
function updateBoneWeights(delta) {
const target = isSpeaking ? 1 : 0;
if (Math.abs(crossFadeProgress - target) > 0.001) {
// Asymmetric crossfade: 0.8s into speaking, 1.0s back to idle
const duration = isSpeaking ? 0.8 : 1.0;
const step = delta / duration;
crossFadeProgress = target > crossFadeProgress
? Math.min(crossFadeProgress + step, 1)
: Math.max(crossFadeProgress - step, 0);
}
const t = crossFadeProgress;
const w = t * t * (3 - 2 * t); // smoothstep
speakingAction.setEffectiveWeight(w);
idleAction.setEffectiveWeight(1 - w);
}
// ============================================================
// Step 5.5: Idle Expression Generator
// ============================================================
// Procedural eye blinks + micro expressions when no audio is playing.
const idle = new lipsync.wasmModule.IdleExpressionGenerator();
let elapsedSeconds = 0;
let prevFrame = null;
// ============================================================
// Step 6: Frame Queue + Render Loop
// ============================================================
// Frames are consumed at 30fps regardless of monitor refresh rate.
const frameQueue = [];
let streamTimeAccum = 0;
const FRAME_INTERVAL = 1 / 30;
const clock = new THREE.Clock();
function animate() {
requestAnimationFrame(animate);
const delta = clock.getDelta();
elapsedSeconds += delta;
controls.update();
// Bone animation crossfade
updateBoneWeights(delta);
mixer.update(delta);
// Consume blendshape frames at 30fps
streamTimeAccum += delta;
while (streamTimeAccum >= FRAME_INTERVAL) {
streamTimeAccum -= FRAME_INTERVAL;
if (frameQueue.length > 0) {
prevFrame = frameQueue.shift();
if (useVrmMode) {
applyVrmBlendshapes(vrm, prevFrame);
} else {
applyBlendshapes(vrm, prevFrame);
}
}
}
// Idle expressions when queue is empty
if (frameQueue.length === 0 && !isSpeaking) {
const idleFrame = idle.get_frame(elapsedSeconds);
let frame = idleFrame;
if (prevFrame) {
frame = prevFrame.map((v, i) => v + 0.15 * ((idleFrame[i] || 0) - v));
prevFrame = frame;
}
if (useVrmMode) {
const vrmFrame = lipsync.wasmModule.convert_arkit_to_vrm(frame);
applyVrmBlendshapes(vrm, Array.from(vrmFrame));
} else {
applyBlendshapes(vrm, frame);
}
}
vrm.update(delta);
renderer.render(scene, camera);
}
animate();
// ============================================================
// Step 7: Audio File Playback (batch processing)
// ============================================================
document.getElementById('audio-file').addEventListener('change', async (e) => {
const file = e.target.files[0];
if (!file) return;
// Process blendshapes
const result = await lipsync.processFile(file);
// Fill frame queue
frameQueue.length = 0;
for (let i = 0; i < result.frame_count; i++) {
if (useVrmMode) {
frameQueue.push(lipsync.getVrmFrame(result, i));
} else {
frameQueue.push(lipsync.getFrame(result, i));
}
}
// Switch to speaking pose immediately
transitionToSpeaking(true);
// Play audio in sync
const arrayBuffer = await file.arrayBuffer();
const audioCtx = new AudioContext();
const audioBuffer = await audioCtx.decodeAudioData(arrayBuffer);
const source = audioCtx.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioCtx.destination);
source.start();
source.onended = () => transitionToIdle();
});
</script>
</body>
</html>Licensing
The first call to init() automatically starts a 30-day free trial (no signup, no API key). For production use, pass your license key:
await lipsync.init({ licenseKey: 'ggl_your_key_here' });| | Free Trial | Licensed |
|---|---|---|
| Duration | 30 days from first use | Unlimited |
| Setup | None (automatic) | Pass licenseKey to init() |
| Domain restriction | None | Configurable per key |
| Features | Full access | Full access |
Contact GoodGang Labs for license keys.
API Reference
Constructor
new LipSyncWasmWrapper(options?: { wasmPath?: string })| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| wasmPath | string | './lipsync_wasm_v2.js' | Path to the WASM glue module |
Important:
wasmPathis resolved relative to the HTML page, not the wrapper JS file.
- With bundlers (Vite, Webpack): the default
'./lipsync_wasm_v2.js'works automatically.- Without a bundler (plain
<script type="module">): use an absolute path:new LipSyncWasmWrapper({ wasmPath: '/node_modules/@goodganglabs/lipsync-wasm-v2/lipsync_wasm_v2.js' })
init(options?): Promise<InitResult>
Initializes the WASM runtime, loads the ONNX model, and applies the expression preset. ONNX Runtime must be loaded before calling this method.
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| licenseKey | string | — | GoodGang Labs license key. Omit for 30-day free trial. |
| onProgress | (stage, percent) => void | — | Progress callback. Stages: 'wasm', 'license', 'decrypt', 'onnx' |
| preset | boolean \| string | true | true = built-in preset, URL string = custom preset JSON, false = disabled |
Returns { mode: 'v2-onnx' }. Throws if ONNX Runtime is not available.
Properties
| Property | Type | Description |
|----------|------|-------------|
| ready | boolean | true after init() completes |
| modelVersion | 'v2' | Always 'v2' |
| blendshapeDim | 52 | Output dimension per frame |
| wasmModule | object | Direct access to WASM exports (for IdleExpressionGenerator, convert_arkit_to_vrm) |
Processing Methods
| Method | Input | Output | Use Case |
|--------|-------|--------|----------|
| processFile(file) | File | Promise<ProcessResult> | Audio file upload |
| processAudio(audio) | Float32Array (16kHz) | Promise<ProcessResult> | Raw PCM buffer |
| processAudioBuffer(buf) | AudioBuffer | Promise<ProcessResult> | Web Audio API buffer |
| processAudioChunk(chunk, isLast?) | Float32Array | Promise<ProcessResult \| null> | Real-time streaming |
Emotion Control
| Method | Description |
|--------|-------------|
| setEmotion(vec) | Set emotion vector [neutral, joy, anger, sadness, surprise]. Each value 0–1. Default: [0,0,0,0,0] |
| getEmotion() | Returns current emotion vector as number[5] |
| reInferWithEmotion(vec?) | Re-run ONNX inference on cached audio features with new emotion. No audio re-upload needed. Requires prior processAudio() / processFile() call. |
// Set emotion before processing (applies to next inference)
lipsync.setEmotion([0, 0.8, 0, 0, 0]); // joy at 80%
const result = await lipsync.processFile(audioFile);
// Change emotion in real-time (re-infers without re-uploading audio)
const joyResult = await lipsync.reInferWithEmotion([0, 1.0, 0, 0, 0]);
const angryResult = await lipsync.reInferWithEmotion([0, 0, 0.8, 0, 0]);getFrame(result, frameIndex): number[]
Extracts a single frame from ProcessResult. Returns number[52].
getVrmFrame(result, frameIndex): number[]
Extracts a single VRM 18-dim frame from ProcessResult. Returns number[18] with VRM expression weights. The WASM engine automatically converts ARKit 52-dim → VRM 18-dim with natural triangle blinks.
Available when result.vrm_blendshapes exists (always present in batch/streaming results).
getVrmaBytes(): { idle: Uint8Array, speaking: Uint8Array }
Returns embedded VRMA bone animation data. Load with GLTFLoader + VRMAnimationLoaderPlugin (see Complete Example above).
getVrmExpressionNames(): string[]
Returns the 18 VRM expression names in order: ['aa', 'ih', 'ou', 'ee', 'oh', 'happy', 'angry', 'sad', 'relaxed', 'surprised', 'blink', 'blinkLeft', 'blinkRight', 'lookUp', 'lookDown', 'lookLeft', 'lookRight', 'neutral'].
reset(): void
Resets internal state and ends any active streaming session.
dispose(): void
Releases all WASM and ONNX resources.
ProcessResult
{
blendshapes: number[]; // Flat array: frame_count * 52 values
vrm_blendshapes?: number[]; // Flat array: frame_count * 18 VRM values (use getVrmFrame() to extract)
frame_count: number; // Number of 30fps frames
fps: number; // Always 30
mode: string; // 'v2-onnx' | 'v2-streaming-onnx'
}ARKit Blendshape Index
Full 52-element index mapping:
| Index | Name | Index | Name |
|-------|------|-------|------|
| 0 | browDownLeft | 26 | mouthClose |
| 1 | browDownRight | 27 | mouthDimpleLeft |
| 2 | browInnerUp | 28 | mouthDimpleRight |
| 3 | browOuterUpLeft | 29 | mouthFrownLeft |
| 4 | browOuterUpRight | 30 | mouthFrownRight |
| 5 | cheekPuff | 31 | mouthFunnel |
| 6 | cheekSquintLeft | 32 | mouthLeft |
| 7 | cheekSquintRight | 33 | mouthLowerDownLeft |
| 8 | eyeBlinkLeft | 34 | mouthLowerDownRight |
| 9 | eyeBlinkRight | 35 | mouthPressLeft |
| 10 | eyeLookDownLeft | 36 | mouthPressRight |
| 11 | eyeLookDownRight | 37 | mouthPucker |
| 12 | eyeLookInLeft | 38 | mouthRight |
| 13 | eyeLookInRight | 39 | mouthRollLower |
| 14 | eyeLookOutLeft | 40 | mouthRollUpper |
| 15 | eyeLookOutRight | 41 | mouthShrugLower |
| 16 | eyeLookUpLeft | 42 | mouthShrugUpper |
| 17 | eyeLookUpRight | 43 | mouthSmileLeft |
| 18 | eyeSquintLeft | 44 | mouthSmileRight |
| 19 | eyeSquintRight | 45 | mouthStretchLeft |
| 20 | eyeWideLeft | 46 | mouthStretchRight |
| 21 | eyeWideRight | 47 | mouthUpperUpLeft |
| 22 | jawForward | 48 | mouthUpperUpRight |
| 23 | jawLeft | 49 | noseSneerLeft |
| 24 | jawOpen | 50 | noseSneerRight |
| 25 | jawRight | 51 | tongueOut |
Advanced Features
Bone Animation Tips
The package embeds idle and speaking VRMA bone animations. Two key recommendations for smooth results:
1. Use LoopPingPong for idle animation — The idle clip's first and last keyframes don't perfectly match, so LoopRepeat causes a visible jump at the loop boundary. LoopPingPong (forward→backward→forward) eliminates this seam.
2. Use asymmetric crossfade durations — A slower transition into speaking (0.8s) feels more natural than an instant snap. The return to idle can be slightly slower (1.0s) for a relaxed feel. Apply smoothstep to the linear progress for ease-in/ease-out.
// Idle: PingPong to avoid loop seam
idleAction.setLoop(THREE.LoopPingPong);
speakingAction.setLoop(THREE.LoopRepeat);
// Asymmetric crossfade: 0.8s into speaking, 1.0s back to idle
function updateBoneWeights(delta) {
const target = isSpeaking ? 1 : 0;
const duration = isSpeaking ? 0.8 : 1.0;
const step = delta / duration;
crossFadeProgress = target > crossFadeProgress
? Math.min(crossFadeProgress + step, 1)
: Math.max(crossFadeProgress - step, 0);
const w = crossFadeProgress * crossFadeProgress * (3 - 2 * crossFadeProgress); // smoothstep
speakingAction.setEffectiveWeight(w);
idleAction.setEffectiveWeight(1 - w);
}IdleExpressionGenerator
Procedural idle animation: eye blinks (2.5–4.5s random interval, 15% double-blink), micro expressions (sinusoidal). See Step 5.5 in the Complete Example.
const idle = new lipsync.wasmModule.IdleExpressionGenerator();
// In render loop (when no audio is playing):
const frame = idle.get_frame(elapsedSeconds); // number[52] (ARKit)
// For VRM mode, convert to 18-dim:
if (useVrmMode) {
const vrmFrame = lipsync.wasmModule.convert_arkit_to_vrm(frame);
applyVrmBlendshapes(vrm, Array.from(vrmFrame));
} else {
applyBlendshapes(vrm, frame);
}Real-time Microphone Streaming
Use AudioWorklet to batch 1600 samples (100ms @ 16kHz), feed to processAudioChunk(), push frames to the queue. The render loop (Step 6 in the Complete Example) consumes them at 30fps automatically.
const stream = await navigator.mediaDevices.getUserMedia({
audio: { sampleRate: 16000, channelCount: 1, echoCancellation: true }
});
const audioCtx = new AudioContext({ sampleRate: 16000 });
const source = audioCtx.createMediaStreamSource(stream);
// AudioWorklet batches 128-sample inputs into 1600-sample chunks
const workletCode = `
class MicProcessor extends AudioWorkletProcessor {
constructor() { super(); this.buf = []; this.len = 0; }
process(inputs) {
const d = inputs[0][0];
if (d) { this.buf.push(new Float32Array(d)); this.len += d.length; }
if (this.len >= 1600) {
const out = new Float32Array(this.len);
let off = 0;
for (const b of this.buf) { out.set(b, off); off += b.length; }
this.port.postMessage(out);
this.buf = []; this.len = 0;
}
return true;
}
}
registerProcessor('mic-processor', MicProcessor);
`;
const blob = new Blob([workletCode], { type: 'application/javascript' });
await audioCtx.audioWorklet.addModule(URL.createObjectURL(blob));
const worklet = new AudioWorkletNode(audioCtx, 'mic-processor');
source.connect(worklet);
transitionToSpeaking(false);
worklet.port.onmessage = async (e) => {
const result = await lipsync.processAudioChunk(e.data);
if (result) {
for (let i = 0; i < result.frame_count; i++) {
if (useVrmMode) {
frameQueue.push(lipsync.getVrmFrame(result, i));
} else {
frameQueue.push(lipsync.getFrame(result, i));
}
}
}
};
// To stop: stream.getTracks().forEach(t => t.stop());
// audioCtx.close(); lipsync.reset(); transitionToIdle();TTS Streaming Integration
When processing pre-generated TTS audio, slice into 100ms chunks and yield to the main thread periodically to prevent render freezes:
async function processTTSAudio(audioFloat32) {
const chunkSize = 1600; // 100ms @ 16kHz
const totalChunks = Math.ceil(audioFloat32.length / chunkSize);
for (let i = 0; i < totalChunks; i++) {
const start = i * chunkSize;
const chunk = audioFloat32.slice(start, start + chunkSize);
const isLast = (i === totalChunks - 1);
const result = await lipsync.processAudioChunk(chunk, isLast);
if (result) {
for (let j = 0; j < result.frame_count; j++) {
if (useVrmMode) {
frameQueue.push(lipsync.getVrmFrame(result, j));
} else {
frameQueue.push(lipsync.getFrame(result, j));
}
}
}
// Yield every 3 chunks (~300ms) to keep rAF rendering smooth
if ((i + 1) % 3 === 0) await new Promise(r => setTimeout(r, 0));
}
}Bundler Setup
Vite
Works out of the box.
Webpack
// webpack.config.js
module.exports = {
experiments: { asyncWebAssembly: true },
};Plain HTML (no bundler)
<script type="module">
import { LipSyncWasmWrapper }
from './node_modules/@goodganglabs/lipsync-wasm-v2/lipsync-wasm-wrapper.js';
// IMPORTANT: wasmPath must be absolute (resolved from HTML page, not JS file)
const lipsync = new LipSyncWasmWrapper({
wasmPath: '/node_modules/@goodganglabs/lipsync-wasm-v2/lipsync_wasm_v2.js'
});
await lipsync.init();
</script>CDN
<script type="importmap">
{ "imports": {
"@goodganglabs/lipsync-wasm-v2": "https://your-cdn.com/lipsync-wasm-v2/lipsync-wasm-wrapper.js"
}}
</script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/ort.min.js"></script>
<script type="module">
import { LipSyncWasmWrapper } from '@goodganglabs/lipsync-wasm-v2';
const lipsync = new LipSyncWasmWrapper({
wasmPath: 'https://your-cdn.com/lipsync-wasm-v2/lipsync_wasm_v2.js'
});
</script>Deployment Notes
.wasmfiles must be served withContent-Type: application/wasm- CORS headers required for cross-origin WASM loading
- ONNX Runtime Web must be loaded before
init()is called
License
Proprietary — GoodGang Labs
