talking-head-studio
v0.4.11
Published
Cross-platform 3D avatar component for React Native & web — lip-sync, gestures, accessories, and LLM integration. Powered by TalkingHead + Three.js.
Maintainers
Readme
talking-head-studio
Open-source avatar platform for Web, React Native, Unity, and Unreal. Any GLB model. Full lip-sync — with or without blend shapes.
What this is
A drop-in avatar runtime and platform SDK built to be a self-hostable replacement for Ready Player Me. The core problem it solves: any arbitrary 3D model should be able to talk, emote, and respond to a voice pipeline — regardless of whether the artist baked in blend shapes, visemes, or any face rig at all.
The library ships a renderer (web iframe + React Native wgpu), a backend-agnostic face control contract, and a growing set of adapters that map TTS/audio/AI output onto whatever rendering mechanism the model actually supports.
Lip-sync tiers (any model works)
| Model type | Lip-sync method | Quality |
|---|---|---|
| GLB with Oculus viseme morphs | Direct morph drive via MorphTargetBackend | Excellent |
| GLB with ARKit blend shapes | remapArkitToOculus() → morph drive | Good |
| GLB with only jawOpen / mouthOpen | Amplitude fallback | Acceptable |
| Any other GLB | Gaussian splat backend (roadmap) | Excellent |
The last row is the goal: scan any model into a Gaussian representation, generate per-viseme deltas via FLAME-based transfer, and drive it from the same FaceControl contract everything else uses. No blend shapes required. No artist work required.
Architecture
TTS / audio / face tracking
↓
AgentVisemePayload ← canonical wire format for lip-sync schedules
↓
FaceControl ← pose (HeadPose) + expression (ExpressionState) + gaze (EyeGaze)
↓
AvatarBackend ←────────────── swap without changing anything upstream
├── MorphTargetBackend ← Three.js morph targets (GLB with blend shapes)
├── GaussianBackend ← [roadmap] Gaussian splat + FLAME delta transfer
└── (your backend) ← implement AvatarBackend, plug in
↓
Renderer
├── Web iframe ← TalkingHead.web.tsx (any React app)
├── React Native wgpu ← WgpuAvatar (native GPU, no WebView latency)
└── Unity / Unreal ← [roadmap] SDK plugins consuming same contractsEverything above AvatarBackend is renderer-agnostic. Everything above FaceControl is model-agnostic.
Installation
# React Native / Expo
npm install talking-head-studio react-native-webview
# Web (React, Next.js, Vite)
npm install talking-head-studioQuick start
import { useRef } from 'react';
import { TalkingHead, type TalkingHeadRef } from 'talking-head-studio';
export default function Avatar() {
const ref = useRef<TalkingHeadRef>(null);
return (
<TalkingHead
ref={ref}
avatarUrl="https://example.com/your-model.glb"
mood="happy"
cameraView="upper"
hairColor="#1a1a2e"
skinColor="#e0a370"
accessories={[{
id: 'sunglasses',
url: 'https://example.com/sunglasses.glb',
bone: 'Head',
position: [0, 0.08, 0.12],
rotation: [0, 0, 0],
scale: 1.0,
}]}
style={{ width: 400, height: 600 }}
onReady={() => console.log('ready')}
/>
);
}FaceControl — the core contract
The FaceControl type is the single value that flows between your voice pipeline and any avatar backend. If you're building a custom backend or integrating with a game engine, this is what you implement against.
import type { FaceControl, ExpressionState, HeadPose, EyeGaze } from 'talking-head-studio';
type HeadPose = {
yaw: number; // -1..1, left..right
pitch: number; // -1..1, down..up
roll: number; // -1..1, tilt
};
type EyeGaze = {
x: number; // -1..1, left..right
y: number; // -1..1, down..up
};
type ExpressionState = {
jawOpen: number; // 0..1
mouthSmile: number;
mouthFunnel: number;
mouthPucker: number;
mouthWide: number;
upperLipRaise: number;
lowerLipDepress: number;
cheekRaise: number;
blinkLeft: number;
blinkRight: number;
browInnerUp: number;
browDownLeft: number;
browDownRight: number;
eyeGazeLeft: EyeGaze;
eyeGazeRight: EyeGaze;
};Driving FaceControl from a viseme schedule
import { useFaceControlsFromVisemes } from 'talking-head-studio';
// schedule: AgentVisemePayload from your TTS backend
const faceControl = useFaceControlsFromVisemes(schedule);
// → { pose: { yaw:0, pitch:0, roll:0 }, expr: { jawOpen: 0.7, ... } }Implementing a custom backend
import type { AvatarBackend, AvatarRenderTarget, FaceControl } from 'talking-head-studio';
class MyGaussianBackend implements AvatarBackend {
initialize() { /* load splat data, FLAME weights */ }
attach(target: AvatarRenderTarget) { /* bind to canvas/surface */ }
setControl(control: FaceControl) { /* map ExpressionState → splat coefficients */ }
renderFrame() { /* rasterize */ }
dispose() { /* cleanup */ }
}MorphTargetBackend — Three.js GLB adapter
The first concrete AvatarBackend implementation. Give it any loaded Three.js scene and it will find morph targets, build a lookup cache, and drive them from FaceControl.
import * as THREE from 'three';
import { GLTFLoader } from 'three/examples/jsm/loaders/GLTFLoader';
import { MorphTargetBackend } from 'talking-head-studio';
const loader = new GLTFLoader();
const gltf = await loader.loadAsync('/avatar.glb');
const backend = new MorphTargetBackend(gltf.scene, {
mood: 'neutral',
expressionScale: 1.0,
calibration: {
neutral: { pose: { yaw: 0, pitch: 0, roll: 0 }, expr: createNeutralExpression() },
ranges: { jawOpen: { min: 0, max: 0.85 } }, // clamp jaw for this model
gazeLimits: { x: { min: -0.6, max: 0.6 } },
},
});
// Each frame:
backend.setControl(faceControl);
backend.renderFrame();
// Debug: what morphs does this model actually have?
console.log(backend.availableChannels);
// → { visemes: ['aa','PP','oh',...], expressions: ['jawOpen','blinkLeft',...], gaze: ['lookLeft','lookUp'] }ARKit → Oculus remap
Models with ARKit blend shapes (52 facial action units) but no Oculus viseme morphs can be remapped analytically — no ML, no FLAME, no artist work.
import { remapArkitToOculus, getArkitWeightsForViseme } from 'talking-head-studio';
// Runtime: face tracking data → Oculus viseme weights
const oculusWeights = remapArkitToOculus({
jawOpen: 0.7,
mouthLowerDownLeft: 0.4,
mouthLowerDownRight: 0.4,
});
// → { aa: 0.68, PP: 0.03, oh: 0.12, ... }
// Bake-time: get the ARKit recipe for a specific viseme
const recipe = getArkitWeightsForViseme('ou');
// → { mouthPucker: 0.9, mouthRollLower: 0.3 }The full ARKIT_TO_OCULUS coefficient table is exported so you can build your own bake pipeline.
TalkingHead component — props & ref
Props
| Prop | Type | Default | Description |
|------|------|---------|-------------|
| avatarUrl | string | required | Any .glb. Rigged or not. |
| authToken | string \| null | null | Bearer token for authenticated GLB URLs. |
| mood | TalkingHeadMood | 'neutral' | neutral \| happy \| sad \| angry \| excited \| thinking \| concerned \| surprised |
| cameraView | 'head' \| 'upper' \| 'full' | 'upper' | Framing preset. |
| cameraDistance | number | -0.5 | Zoom offset. Negative = closer. |
| hairColor | string | — | Hex color. Applied to materials named hair, fur. |
| skinColor | string | — | Applied to skin, body, face. |
| eyeColor | string | — | Applied to eye, iris. |
| accessories | TalkingHeadAccessory[] | [] | Bone-attached GLB items. |
| onReady | () => void | — | Fired when fully loaded. |
| onError | (msg: string) => void | — | Fired on load failure. |
| style | ViewStyle / CSSProperties | — | Container style. |
Ref methods
ref.current?.sendAmplitude(0.7); // amplitude 0..1 → jaw
ref.current?.scheduleVisemes(payload); // AgentVisemePayload → full lip-sync schedule
ref.current?.clearVisemes();
ref.current?.setMood('excited');
ref.current?.setHairColor('#ff0000');
ref.current?.setSkinColor('#8d5524');
ref.current?.setEyeColor('#2e86de');
ref.current?.setAccessories([...]);
ref.current?.dispatchMotion('nod');Accessories
Any GLB attached to any skeleton bone. Placement is editable at runtime via the 3D editor.
interface TalkingHeadAccessory {
id: string;
url: string;
bone: string; // 'Head' | 'Spine' | 'RightHand' | ...
position: [number, number, number];
rotation: [number, number, number]; // Euler, radians
scale: number;
}Common Mixamo bones: Head, Neck, Spine, Spine1, Spine2, LeftHand, RightHand, LeftFoot, RightFoot, Hips
The 3D editor (talking-head-studio/editor) provides a gizmo for live placement with front/top/side views. LLM-assisted placement is available via the companion backend.
Packages
| Path | Description |
|------|-------------|
| talking-head-studio | Live avatar renderer + FaceControl contracts |
| talking-head-studio/editor | R3F-based 3D editor with gizmo (web only) |
| talking-head-studio/appearance | Material color system for any GLB |
| talking-head-studio/voice | Audio recording + WAV conversion hooks |
| talking-head-studio/sketchfab | Sketchfab search + download hooks |
| talking-head-studio/api | Studio API client (avatar CRUD, voice profiles) |
| talking-head-studio/wardrobe | Accessory + outfit state management |
| talking-head-studio/wgpu | React Native wgpu renderer |
| packages/avatar-creator | Embeddable avatar creator widget |
| packages/agent-avatar | LiveKit agent + MCP integration |
Roadmap
Now — shipped
FaceControlcanonical face control space (pose + expression + gaze)AvatarBackendinterface — swap renderers without changing upstream codeMorphTargetBackend— Three.js GLB adapter with morph target discovery and mood layering- ARKit → Oculus analytical remap (
remapArkitToOculus, full coefficient table) useFaceControlsFromVisemes— rAF-sampled hook fromAgentVisemePayloadAgentVisemePayloadcanonical TTS → lip-sync wire formatAvatarGlbParams— typed API contract for quality/compression/morph group selectionCalibrationProfile— per-avatar range remapping and gaze limits- Platform type stubs: SDK (web/Unity/Unreal), marketplace catalog, avatar GLB API
packages/avatar-creator— embeddable creator widget with preset catalogpackages/agent-avatar— LiveKit agent + MCP tool integration
Next
- GLB schema walker — scan any loaded GLB and report: morph target coverage, skeleton bones, LODs, viseme tier. Prerequisite for the validator and import pipeline.
GET /avatars/{id}.glbwithAvatarGlbParams— extend the companion backend to serve quality/compression/morph-group variants on the existing endpoint.- Creator postMessage bridge — let partners embed the avatar creator in an iframe and receive avatar IDs back, like RPM's WebView creator.
Medium term
GaussianBackend— Gaussian splat renderer implementingAvatarBackend. Takes any model, scans it, drives expression via FLAME-based per-viseme delta transfer. No artist work, no blend shapes required. This is the zero-prerequisite lip-sync path.- FLAME viseme transfer pipeline (Python, companion backend) — fit FLAME to a face screenshot, generate Oculus viseme deltas, bake back into the GLB as morph targets. Background task on upload for any avatar missing viseme morphs.
- Unity SDK — C# plugin implementing the
AvatarBackendcontract. Blueprint-friendly API for loading GLBs, driving morphs, consumingAgentVisemePayload. - Unreal plugin — UE5 plugin with Blueprint-accessible
UAvatarDescriptorand a sample Quickstart map.
Longer term
- Avatar marketplace —
CatalogItem,AvatarAsset,RarityLeveltypes are already defined. Backend + web store + in-creator purchasing. - RPM migration tools — import existing RPM avatars where technically possible.
- SLA + deprecation policy — for teams that need a reliability guarantee as they move off RPM.
Contributing
git clone https://github.com/sitebay/talking-head-studio.git
cd talking-head-studio
npm install
npm run typecheck # must be clean (excluding known expo-audio peer dep warnings)
npm testThe repo is a monorepo with packages/* as npm workspaces. The main library is the root package.
License
MIT
