@omote/avatar
v0.1.2
Published
Renderer-agnostic avatar voice composition for Omote SDK
Readme
@omote/avatar
Renderer-agnostic voice composition for the Omote AI Character SDK.
OmoteAvatarCore holds all voice state (TTSSpeaker, SpeechListener, VoiceOrchestrator) and exposes the imperative voice API. Renderer adapters (@omote/three, @omote/babylon, @omote/r3f) instantiate this class and delegate voice methods, keeping only renderer-specific code (animation, blendshape writes).
Installation
npm install @omote/avatar @omote/coreQuick Start
import { OmoteAvatarCore } from '@omote/avatar';
import { createKokoroTTS } from '@omote/core';
const core = new OmoteAvatarCore();
// Set frame handler (adapter writes these to its renderer)
core.onFrame = (frame) => {
applyBlendshapes(frame.blendshapes); // 52 ARKit weights
};
// Set state handler
core.onStateChange = (state) => {
console.log('State:', state); // 'idle' | 'listening' | 'thinking' | 'speaking'
};
// Connect full voice pipeline
await core.connectVoice({
mode: 'local',
tts: createKokoroTTS(),
onTranscript: async (text) => {
const res = await fetch('/api/chat', { method: 'POST', body: text });
return await res.text();
},
});API
OmoteAvatarCore
Voice Pipeline
| Method | Description |
|--------|-------------|
| connectVoice(config) | Connect full voice pipeline (speaker + listener + interruption). Accepts VoiceOrchestratorConfig. |
| disconnectVoice() | Disconnect and dispose the voice orchestrator. |
| connectSpeaker(tts, config?) | Connect a TTS backend for speech output and lip sync. |
| disconnectSpeaker() | Disconnect and dispose the TTS speaker. |
| connectListener(config?) | Connect speech listener (mic + VAD + ASR). |
| disconnectListener() | Disconnect and dispose the speech listener. |
Speech
| Method | Description |
|--------|-------------|
| speak(text, options?) | Speak text with lip sync. Returns a Promise that resolves when playback completes. |
| streamText(options?) | Start streaming TTS. Returns a StreamTextSink for token-by-token input. |
| stopSpeaking() | Abort current speech playback. |
| warmup() | Warm up AudioContext for iOS/Safari autoplay policy. Call from a user gesture. |
Listening
| Method | Description |
|--------|-------------|
| startListening() | Start mic capture and speech recognition. |
| stopListening() | Stop mic capture. |
Frame Source
| Method | Description |
|--------|-------------|
| connectFrameSource(source) | Wire any FrameSource (PlaybackPipeline, MicLipSync, etc.). |
| disconnectFrameSource() | Disconnect the current frame source. |
State
| Property/Method | Description |
|--------|-------------|
| isSpeaking | boolean — whether TTS is currently playing. |
| state | Current ConversationalState ('idle', 'listening', 'thinking', 'speaking'). |
| speaker | The active TTSSpeaker instance, or null. |
| listener | The active SpeechListener instance, or null. |
| setState(state) | Manually set conversational state. |
| reset() | Reset all state to idle. |
| dispose() | Clean up all resources. |
Event Subscriptions
| Method | Description |
|--------|-------------|
| onTranscript(cb) | Subscribe to transcript results. Returns unsubscribe function. |
| onVoiceStateChange(cb) | Subscribe to conversational state changes. |
| onLoadingProgress(cb) | Subscribe to model loading progress events. |
| onError(cb) | Subscribe to error events. |
| onAudioLevel(cb) | Subscribe to audio level events ({ rms, peak }). |
Types
SpeakOptions
interface SpeakOptions {
signal?: AbortSignal;
voice?: string;
speed?: number;
language?: string;
}StreamTextSink
interface StreamTextSink {
push: (token: string) => void;
end: () => Promise<void>;
}FrameHandler
type FrameHandler = (frame: { blendshapes: Float32Array; emotion?: string }) => void;License
MIT
