@omote/avatar

v0.1.2

Published

3 months ago

Renderer-agnostic avatar voice composition for Omote SDK

0High
0Medium
0Low

sepehrnour

@omote/avatar

Renderer-agnostic voice composition for the Omote AI Character SDK.

OmoteAvatarCore holds all voice state (TTSSpeaker, SpeechListener, VoiceOrchestrator) and exposes the imperative voice API. Renderer adapters (@omote/three, @omote/babylon, @omote/r3f) instantiate this class and delegate voice methods, keeping only renderer-specific code (animation, blendshape writes).

Installation

npm install @omote/avatar @omote/core

Quick Start

import { OmoteAvatarCore } from '@omote/avatar';
import { createKokoroTTS } from '@omote/core';

const core = new OmoteAvatarCore();

// Set frame handler (adapter writes these to its renderer)
core.onFrame = (frame) => {
  applyBlendshapes(frame.blendshapes); // 52 ARKit weights
};

// Set state handler
core.onStateChange = (state) => {
  console.log('State:', state); // 'idle' | 'listening' | 'thinking' | 'speaking'
};

// Connect full voice pipeline
await core.connectVoice({
  mode: 'local',
  tts: createKokoroTTS(),
  onTranscript: async (text) => {
    const res = await fetch('/api/chat', { method: 'POST', body: text });
    return await res.text();
  },
});

API

`OmoteAvatarCore`

Voice Pipeline

| Method | Description | |--------|-------------| | connectVoice(config) | Connect full voice pipeline (speaker + listener + interruption). Accepts VoiceOrchestratorConfig. | | disconnectVoice() | Disconnect and dispose the voice orchestrator. | | connectSpeaker(tts, config?) | Connect a TTS backend for speech output and lip sync. | | disconnectSpeaker() | Disconnect and dispose the TTS speaker. | | connectListener(config?) | Connect speech listener (mic + VAD + ASR). | | disconnectListener() | Disconnect and dispose the speech listener. |

Speech

| Method | Description | |--------|-------------| | speak(text, options?) | Speak text with lip sync. Returns a Promise that resolves when playback completes. | | streamText(options?) | Start streaming TTS. Returns a StreamTextSink for token-by-token input. | | stopSpeaking() | Abort current speech playback. | | warmup() | Warm up AudioContext for iOS/Safari autoplay policy. Call from a user gesture. |

Listening

| Method | Description | |--------|-------------| | startListening() | Start mic capture and speech recognition. | | stopListening() | Stop mic capture. |

Frame Source

| Method | Description | |--------|-------------| | connectFrameSource(source) | Wire any FrameSource (PlaybackPipeline, MicLipSync, etc.). | | disconnectFrameSource() | Disconnect the current frame source. |

State

| Property/Method | Description | |--------|-------------| | isSpeaking | boolean — whether TTS is currently playing. | | state | Current ConversationalState ('idle', 'listening', 'thinking', 'speaking'). | | speaker | The active TTSSpeaker instance, or null. | | listener | The active SpeechListener instance, or null. | | setState(state) | Manually set conversational state. | | reset() | Reset all state to idle. | | dispose() | Clean up all resources. |

Event Subscriptions

| Method | Description | |--------|-------------| | onTranscript(cb) | Subscribe to transcript results. Returns unsubscribe function. | | onVoiceStateChange(cb) | Subscribe to conversational state changes. | | onLoadingProgress(cb) | Subscribe to model loading progress events. | | onError(cb) | Subscribe to error events. | | onAudioLevel(cb) | Subscribe to audio level events ({ rms, peak }). |

Types

`SpeakOptions`

interface SpeakOptions {
  signal?: AbortSignal;
  voice?: string;
  speed?: number;
  language?: string;
}

`StreamTextSink`

interface StreamTextSink {
  push: (token: string) => void;
  end: () => Promise<void>;
}

`FrameHandler`

type FrameHandler = (frame: { blendshapes: Float32Array; emotion?: string }) => void;

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@omote/avatar

Installation

Quick Start

API

OmoteAvatarCore

Voice Pipeline

Speech

Listening

Frame Source

State

Event Subscriptions

Types

SpeakOptions

StreamTextSink

FrameHandler

License

`OmoteAvatarCore`

`SpeakOptions`

`StreamTextSink`

`FrameHandler`