talking-head-studio

v0.4.11

Published

a month ago

Cross-platform 3D avatar component for React Native & web — lip-sync, gestures, accessories, and LLM integration. Powered by TalkingHead + Three.js.

talking-head-studio

Open-source avatar platform for Web, React Native, Unity, and Unreal. Any GLB model. Full lip-sync — with or without blend shapes.

What this is

A drop-in avatar runtime and platform SDK built to be a self-hostable replacement for Ready Player Me. The core problem it solves: any arbitrary 3D model should be able to talk, emote, and respond to a voice pipeline — regardless of whether the artist baked in blend shapes, visemes, or any face rig at all.

The library ships a renderer (web iframe + React Native wgpu), a backend-agnostic face control contract, and a growing set of adapters that map TTS/audio/AI output onto whatever rendering mechanism the model actually supports.

Lip-sync tiers (any model works)

| Model type | Lip-sync method | Quality | |---|---|---| | GLB with Oculus viseme morphs | Direct morph drive via MorphTargetBackend | Excellent | | GLB with ARKit blend shapes | remapArkitToOculus() → morph drive | Good | | GLB with only jawOpen / mouthOpen | Amplitude fallback | Acceptable | | Any other GLB | Gaussian splat backend (roadmap) | Excellent |

The last row is the goal: scan any model into a Gaussian representation, generate per-viseme deltas via FLAME-based transfer, and drive it from the same FaceControl contract everything else uses. No blend shapes required. No artist work required.

Architecture

TTS / audio / face tracking
        ↓
  AgentVisemePayload          ← canonical wire format for lip-sync schedules
        ↓
  FaceControl                 ← pose (HeadPose) + expression (ExpressionState) + gaze (EyeGaze)
        ↓
  AvatarBackend  ←────────────── swap without changing anything upstream
    ├── MorphTargetBackend    ← Three.js morph targets (GLB with blend shapes)
    ├── GaussianBackend       ← [roadmap] Gaussian splat + FLAME delta transfer
    └── (your backend)        ← implement AvatarBackend, plug in
        ↓
  Renderer
    ├── Web iframe            ← TalkingHead.web.tsx (any React app)
    ├── React Native wgpu     ← WgpuAvatar (native GPU, no WebView latency)
    └── Unity / Unreal        ← [roadmap] SDK plugins consuming same contracts

Everything above AvatarBackend is renderer-agnostic. Everything above FaceControl is model-agnostic.

Installation

# React Native / Expo
npm install talking-head-studio react-native-webview

# Web (React, Next.js, Vite)
npm install talking-head-studio

Quick start

import { useRef } from 'react';
import { TalkingHead, type TalkingHeadRef } from 'talking-head-studio';

export default function Avatar() {
  const ref = useRef<TalkingHeadRef>(null);

  return (
    <TalkingHead
      ref={ref}
      avatarUrl="https://example.com/your-model.glb"
      mood="happy"
      cameraView="upper"
      hairColor="#1a1a2e"
      skinColor="#e0a370"
      accessories={[{
        id: 'sunglasses',
        url: 'https://example.com/sunglasses.glb',
        bone: 'Head',
        position: [0, 0.08, 0.12],
        rotation: [0, 0, 0],
        scale: 1.0,
      }]}
      style={{ width: 400, height: 600 }}
      onReady={() => console.log('ready')}
    />
  );
}

FaceControl — the core contract

The FaceControl type is the single value that flows between your voice pipeline and any avatar backend. If you're building a custom backend or integrating with a game engine, this is what you implement against.

import type { FaceControl, ExpressionState, HeadPose, EyeGaze } from 'talking-head-studio';

type HeadPose = {
  yaw:   number; // -1..1, left..right
  pitch: number; // -1..1, down..up
  roll:  number; // -1..1, tilt
};

type EyeGaze = {
  x: number; // -1..1, left..right
  y: number; // -1..1, down..up
};

type ExpressionState = {
  jawOpen:         number; // 0..1
  mouthSmile:      number;
  mouthFunnel:     number;
  mouthPucker:     number;
  mouthWide:       number;
  upperLipRaise:   number;
  lowerLipDepress: number;
  cheekRaise:      number;
  blinkLeft:       number;
  blinkRight:      number;
  browInnerUp:     number;
  browDownLeft:    number;
  browDownRight:   number;
  eyeGazeLeft:     EyeGaze;
  eyeGazeRight:    EyeGaze;
};

Driving FaceControl from a viseme schedule

import { useFaceControlsFromVisemes } from 'talking-head-studio';

// schedule: AgentVisemePayload from your TTS backend
const faceControl = useFaceControlsFromVisemes(schedule);
// → { pose: { yaw:0, pitch:0, roll:0 }, expr: { jawOpen: 0.7, ... } }

Implementing a custom backend

import type { AvatarBackend, AvatarRenderTarget, FaceControl } from 'talking-head-studio';

class MyGaussianBackend implements AvatarBackend {
  initialize() { /* load splat data, FLAME weights */ }
  attach(target: AvatarRenderTarget) { /* bind to canvas/surface */ }
  setControl(control: FaceControl) { /* map ExpressionState → splat coefficients */ }
  renderFrame() { /* rasterize */ }
  dispose() { /* cleanup */ }
}

MorphTargetBackend — Three.js GLB adapter

The first concrete AvatarBackend implementation. Give it any loaded Three.js scene and it will find morph targets, build a lookup cache, and drive them from FaceControl.

import * as THREE from 'three';
import { GLTFLoader } from 'three/examples/jsm/loaders/GLTFLoader';
import { MorphTargetBackend } from 'talking-head-studio';

const loader = new GLTFLoader();
const gltf = await loader.loadAsync('/avatar.glb');

const backend = new MorphTargetBackend(gltf.scene, {
  mood: 'neutral',
  expressionScale: 1.0,
  calibration: {
    neutral: { pose: { yaw: 0, pitch: 0, roll: 0 }, expr: createNeutralExpression() },
    ranges: { jawOpen: { min: 0, max: 0.85 } }, // clamp jaw for this model
    gazeLimits: { x: { min: -0.6, max: 0.6 } },
  },
});

// Each frame:
backend.setControl(faceControl);
backend.renderFrame();

// Debug: what morphs does this model actually have?
console.log(backend.availableChannels);
// → { visemes: ['aa','PP','oh',...], expressions: ['jawOpen','blinkLeft',...], gaze: ['lookLeft','lookUp'] }

ARKit → Oculus remap

Models with ARKit blend shapes (52 facial action units) but no Oculus viseme morphs can be remapped analytically — no ML, no FLAME, no artist work.

import { remapArkitToOculus, getArkitWeightsForViseme } from 'talking-head-studio';

// Runtime: face tracking data → Oculus viseme weights
const oculusWeights = remapArkitToOculus({
  jawOpen: 0.7,
  mouthLowerDownLeft: 0.4,
  mouthLowerDownRight: 0.4,
});
// → { aa: 0.68, PP: 0.03, oh: 0.12, ... }

// Bake-time: get the ARKit recipe for a specific viseme
const recipe = getArkitWeightsForViseme('ou');
// → { mouthPucker: 0.9, mouthRollLower: 0.3 }

The full ARKIT_TO_OCULUS coefficient table is exported so you can build your own bake pipeline.

TalkingHead component — props & ref

Props

| Prop | Type | Default | Description | |------|------|---------|-------------| | avatarUrl | string | required | Any .glb. Rigged or not. | | authToken | string \| null | null | Bearer token for authenticated GLB URLs. | | mood | TalkingHeadMood | 'neutral' | neutral \| happy \| sad \| angry \| excited \| thinking \| concerned \| surprised | | cameraView | 'head' \| 'upper' \| 'full' | 'upper' | Framing preset. | | cameraDistance | number | -0.5 | Zoom offset. Negative = closer. | | hairColor | string | — | Hex color. Applied to materials named hair, fur. | | skinColor | string | — | Applied to skin, body, face. | | eyeColor | string | — | Applied to eye, iris. | | accessories | TalkingHeadAccessory[] | [] | Bone-attached GLB items. | | onReady | () => void | — | Fired when fully loaded. | | onError | (msg: string) => void | — | Fired on load failure. | | style | ViewStyle / CSSProperties | — | Container style. |

Ref methods

ref.current?.sendAmplitude(0.7);       // amplitude 0..1 → jaw
ref.current?.scheduleVisemes(payload); // AgentVisemePayload → full lip-sync schedule
ref.current?.clearVisemes();
ref.current?.setMood('excited');
ref.current?.setHairColor('#ff0000');
ref.current?.setSkinColor('#8d5524');
ref.current?.setEyeColor('#2e86de');
ref.current?.setAccessories([...]);
ref.current?.dispatchMotion('nod');

Accessories

Any GLB attached to any skeleton bone. Placement is editable at runtime via the 3D editor.

interface TalkingHeadAccessory {
  id: string;
  url: string;
  bone: string;                       // 'Head' | 'Spine' | 'RightHand' | ...
  position: [number, number, number];
  rotation: [number, number, number]; // Euler, radians
  scale: number;
}

Common Mixamo bones: Head, Neck, Spine, Spine1, Spine2, LeftHand, RightHand, LeftFoot, RightFoot, Hips

The 3D editor (talking-head-studio/editor) provides a gizmo for live placement with front/top/side views. LLM-assisted placement is available via the companion backend.

Packages

| Path | Description | |------|-------------| | talking-head-studio | Live avatar renderer + FaceControl contracts | | talking-head-studio/editor | R3F-based 3D editor with gizmo (web only) | | talking-head-studio/appearance | Material color system for any GLB | | talking-head-studio/voice | Audio recording + WAV conversion hooks | | talking-head-studio/sketchfab | Sketchfab search + download hooks | | talking-head-studio/api | Studio API client (avatar CRUD, voice profiles) | | talking-head-studio/wardrobe | Accessory + outfit state management | | talking-head-studio/wgpu | React Native wgpu renderer | | packages/avatar-creator | Embeddable avatar creator widget | | packages/agent-avatar | LiveKit agent + MCP integration |

Roadmap

Now — shipped

FaceControl canonical face control space (pose + expression + gaze)
AvatarBackend interface — swap renderers without changing upstream code
MorphTargetBackend — Three.js GLB adapter with morph target discovery and mood layering
ARKit → Oculus analytical remap (remapArkitToOculus, full coefficient table)
useFaceControlsFromVisemes — rAF-sampled hook from AgentVisemePayload
AgentVisemePayload canonical TTS → lip-sync wire format
AvatarGlbParams — typed API contract for quality/compression/morph group selection
CalibrationProfile — per-avatar range remapping and gaze limits
Platform type stubs: SDK (web/Unity/Unreal), marketplace catalog, avatar GLB API
packages/avatar-creator — embeddable creator widget with preset catalog
packages/agent-avatar — LiveKit agent + MCP tool integration

GLB schema walker — scan any loaded GLB and report: morph target coverage, skeleton bones, LODs, viseme tier. Prerequisite for the validator and import pipeline.
GET /avatars/{id}.glb with AvatarGlbParams — extend the companion backend to serve quality/compression/morph-group variants on the existing endpoint.
Creator postMessage bridge — let partners embed the avatar creator in an iframe and receive avatar IDs back, like RPM's WebView creator.

Medium term

GaussianBackend — Gaussian splat renderer implementing AvatarBackend. Takes any model, scans it, drives expression via FLAME-based per-viseme delta transfer. No artist work, no blend shapes required. This is the zero-prerequisite lip-sync path.
FLAME viseme transfer pipeline (Python, companion backend) — fit FLAME to a face screenshot, generate Oculus viseme deltas, bake back into the GLB as morph targets. Background task on upload for any avatar missing viseme morphs.
Unity SDK — C# plugin implementing the AvatarBackend contract. Blueprint-friendly API for loading GLBs, driving morphs, consuming AgentVisemePayload.
Unreal plugin — UE5 plugin with Blueprint-accessible UAvatarDescriptor and a sample Quickstart map.

Longer term

Avatar marketplace — CatalogItem, AvatarAsset, RarityLevel types are already defined. Backend + web store + in-creator purchasing.
RPM migration tools — import existing RPM avatars where technically possible.
SLA + deprecation policy — for teams that need a reliability guarantee as they move off RPM.

Contributing

git clone https://github.com/sitebay/talking-head-studio.git
cd talking-head-studio
npm install
npm run typecheck   # must be clean (excluding known expo-audio peer dep warnings)
npm test

The repo is a monorepo with packages/* as npm workspaces. The main library is the root package.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

talking-head-studio

What this is

Lip-sync tiers (any model works)

Architecture

Installation

Quick start

FaceControl — the core contract

Driving FaceControl from a viseme schedule

Implementing a custom backend

MorphTargetBackend — Three.js GLB adapter

ARKit → Oculus remap

TalkingHead component — props & ref

Props

Ref methods

Accessories

Packages

Roadmap

Now — shipped

Next

Medium term

Longer term

Contributing

License