three-vrm-lip-sync
v0.1.0
Published
Ready-to-use lip sync for VRM avatars: feed an audio file or a live audio stream and get real-time mouth animation on top of @pixiv/three-vrm, without interfering with your other animations.
Maintainers
Readme
three-vrm-lip-sync
Ready-to-use lip sync for VRM avatars. Feed it an audio file or a live audio stream — get real-time mouth animation on top of @pixiv/three-vrm. No phoneme data, no timeline baking, no server: everything runs in the browser.
- 🎯 A complete solution, not a toolkit — one class from audio to moving lips:
playUrl(),useMicrophone(),useStream()and you're done. - 🎙️ Files and live streams — audio file URL,
AudioBuffer,<audio>element, microphone, or anyMediaStream(WebRTC, TTS output, etc.). - ⚡ Real-time, language-independent — MFCC vowel classification in an AudioWorklet (wlipsync, a port of the battle-tested uLipSync). Works with any language, no speech recognition involved.
- 🤝 Plays nicely with your animations — writes only the five VRM viseme expressions (
aa/ih/ou/ee/oh) and releases the mouth back to your AnimationMixer clips, VRMA or emotes while the voice is silent. - 📱 Desktop and mobile — iOS/Android autoplay policies are accounted for; just create it from a tap handler.
Install
npm install three-vrm-lip-sync @pixiv/three-vrm threeQuick start
import { VRMLipSync } from 'three-vrm-lip-sync';
// `vrm` is your loaded VRM instance (gltf.userData.vrm).
// Create from (or after) a user gesture — required by mobile autoplay policies.
const lipSync = await VRMLipSync.create(vrm);
// Audio file: plays through the speakers and drives the mouth.
await lipSync.playUrl('/voice.wav');
// ...or a live stream: analysis only, no audible output.
await lipSync.useMicrophone();
await lipSync.useStream(mediaStream); // WebRTC, TTS, anything
// Render loop — order matters:
mixer.update(delta); // your animations first
lipSync.update(); // writes viseme weights only
vrm.update(delta); // three-vrm applies everything
lipSync.stop(); // mouth eases shut
lipSync.dispose(); // teardownThat's the whole integration: VRMLipSync never touches bones, other expressions or the render loop, so it drops into any existing @pixiv/three-vrm setup.
Options
const lipSync = await VRMLipSync.create(vrm, {
audioContext, // reuse an existing AudioContext
gain: 1.25, // overall mouth openness multiplier
smoothness: 0.05, // articulation smoothing, seconds
visemeGain: { ou: 0.8 }, // per-viseme multipliers for a specific model
autoRelease: true, // release the mouth to other animations when silent (default)
profile, // custom uLipSync MFCC calibration profile
});Runtime-tunable as well: lipSync.engine.gain/smoothness/minVolume/maxVolume, lipSync.visemeGain, lipSync.autoRelease — see the tuning panel in the example.
A default MFCC profile is bundled. For best accuracy with a specific voice, calibrate your own profile with uLipSync in Unity and pass it via profile.
Lower-level API
For custom pipelines, the building blocks are exported separately: WLipSyncEngine (audio → viseme weights) and the source factories createUrlSource, createBufferSource, createStreamSource, createMicrophoneSource, createMediaElementSource — all typed against the small LipSyncEngine / LipSyncSource interfaces, so the analyzer is swappable.
Notes
- Secure context required: AudioWorklet (and the microphone) work on
localhostor HTTPS only. - Echo: live-stream sources are analysis-only by design; never route a microphone to the speakers.
- Missing visemes: expressions a model doesn't have are skipped silently.
Example / development
The example is a Vite + React app that consumes the library from source (no npm download) and deploys to GitHub Pages via a workflow.
npm install
npm --prefix example install
npm run dev # demo on http://localhost:5173
npm --prefix example run dev:https # HTTPS for testing from a phone over LAN
npm run build # build the package (tsup → dist/)
npm run typecheck
npm run lintLicense
MIT © vlapky. Bundled default profile and the analysis engine come from wlipsync (MIT), a port of uLipSync (MIT).
