npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@omote/core

v0.10.6

Published

Renderer-agnostic core SDK for Omote AI Characters

Downloads

554

Readme

@omote/core

Client-side AI inference for real-time lip sync, speech recognition, and avatar animation — runs entirely in browser via WebGPU and WASM.

Features

  • Lip Sync (A2E) — Audio to 52 ARKit blendshapes via LAM, with automatic WebGPU/WASM platform detection
  • PlaybackPipeline — TTS audio playback to lip sync with ExpressionProfile scaling, gapless scheduling
  • Speech Recognition — SenseVoice ASR (ONNX), 15x faster than Whisper, progressive transcription
  • Voice Activity Detection — Silero VAD with Worker and main-thread modes
  • Text-to-Speech — Kokoro TTS (82M q8, offline) with TTSBackend interface for custom engines
  • CharacterController — Renderer-agnostic avatar composition (compositor + gaze + life layer)
  • TTSPlayback — Composes TTSBackend + PlaybackPipeline for text → lip sync
  • TTSSpeaker — High-level speak(text) with abort, queueing, and LLM streaming
  • SpeechListener — Mic → VAD → ASR orchestration with adaptive silence detection
  • createTTSPlayer() — Factory composing Kokoro TTS + TTSSpeaker for zero-config playback
  • VoiceOrchestrator — Full conversational agent loop with local TTS support (cloud or offline)
  • configureModelUrls() — Self-host model files from your own CDN
  • Animation Graph — State machine (idle/listening/thinking/speaking) with emotion blending
  • Emotion Controller — Preset-based emotion system with smooth transitions
  • Model Caching — IndexedDB with versioning, LRU eviction, and quota monitoring
  • Microphone Capture — Browser noise suppression, echo cancellation, AGC
  • Logging & Telemetry — Structured logging (6 levels) and OpenTelemetry-compatible tracing
  • Offline Ready — No cloud dependencies, works entirely without internet
  • WebGPU + WASM — WebGPU-first with automatic WASM fallback

Installation

npm install @omote/core

Peer dependency: onnxruntime-web is included — no additional installs needed.

Quick Start

PlaybackPipeline (TTS Lip Sync)

The most common use case: feed TTS audio chunks and get back 52 ARKit blendshape frames at render rate.

import { PlaybackPipeline, createA2E } from '@omote/core';

// 1. Create A2E backend (auto-detects GPU vs CPU)
const lam = createA2E(); // auto-detects GPU vs CPU, fetches from HF CDN (192MB fp16)
await lam.load();

// 2. Create pipeline with expression profile
const pipeline = new PlaybackPipeline({
  lam,
  sampleRate: 16000,
  profile: { mouth: 1.0, jaw: 1.0, brows: 0.6, eyes: 0.0, cheeks: 0.5, nose: 0.3, tongue: 0.5 },
});

// 3. Listen for blendshape frames
pipeline.on('frame', (frame) => {
  applyToAvatar(frame.blendshapes); // ExpressionProfile-scaled, 52 ARKit weights
});

// 4. Feed TTS audio and play
pipeline.start();
pipeline.feedBuffer(ttsAudioChunk); // Uint8Array PCM16
pipeline.end(); // Flush remaining audio

API Reference

A2E (Audio to Expression)

Factory API (Recommended)

Auto-detects platform: Chrome/Edge/Android use WebGPU, Safari/iOS use WASM CPU fallback.

import { createA2E } from '@omote/core';

const a2e = createA2E(); // auto-detects: WebGPU on Chrome/Edge, WASM on Safari/iOS/Firefox
await a2e.load();

const { blendshapes } = await a2e.infer(audioSamples); // Float32Array (16kHz)
// → 52 ARKit blendshape weights

Custom Configuration

import { createA2E, ARKIT_BLENDSHAPES } from '@omote/core';

const a2e = createA2E({ backend: 'wasm' }); // Force WASM for testing
await a2e.load();

const { blendshapes } = await a2e.infer(audioSamples);
const jawOpen = blendshapes[ARKIT_BLENDSHAPES.indexOf('jawOpen')];

PlaybackPipeline

End-to-end TTS playback with lip sync inference, audio scheduling, and ExpressionProfile scaling.

import { PlaybackPipeline } from '@omote/core';

const pipeline = new PlaybackPipeline({
  lam,                // A2E backend from createA2E()
  sampleRate: 16000,
  profile: { mouth: 1.0, jaw: 1.0, brows: 0.6, eyes: 0.0, cheeks: 0.5, nose: 0.3, tongue: 0.5 },
});

pipeline.on('frame', (frame) => {
  // frame.blendshapes    — ExpressionProfile-scaled
  // frame.rawBlendshapes — unscaled original values
  applyToAvatar(frame.blendshapes);
});

pipeline.start();
pipeline.feedBuffer(chunk); // feed TTS audio (Uint8Array PCM16)
pipeline.end();             // flush final partial chunk

A2EProcessor

Engine-agnostic audio-to-blendshapes processor for custom integrations. Supports pull mode (timestamped frames for TTS) and push mode (drip-feed for live mic).

import { A2EProcessor } from '@omote/core';

const processor = new A2EProcessor({ backend: lam, chunkSize: 16000 });

// Pull mode: timestamp audio for later retrieval
processor.pushAudio(samples, audioContext.currentTime + delay);
const frame = processor.getFrameForTime(audioContext.currentTime);

Speech Recognition (SenseVoice)

SenseVoice ASR — 15x faster than Whisper, with progressive transcription and emotion detection.

import { createSenseVoice } from '@omote/core';

const asr = createSenseVoice(); // Auto-detects platform, fetches from HF CDN
await asr.load();

const { text, emotion, language } = await asr.transcribe(audioSamples);

Platform-Aware ASR

import { shouldUseNativeASR, SafariSpeechRecognition, createSenseVoice } from '@omote/core';

const asr = shouldUseNativeASR()
  ? new SafariSpeechRecognition({ language: 'en-US' })
  : createSenseVoice();

Voice Activity Detection (Silero VAD)

import { createSileroVAD } from '@omote/core';

const vad = createSileroVAD({
  threshold: 0.5,
  // useWorker: true   // Force off-main-thread
  // useWorker: false  // Force main thread
});
await vad.load();

const { isSpeech, probability } = await vad.process(audioSamples);

Animation Graph

State machine for avatar animation states with emotion blending and audio energy.

import { AnimationGraph, AudioEnergyAnalyzer, EmphasisDetector } from '@omote/core';

const graph = new AnimationGraph();

graph.on('state.change', ({ from, to, trigger }) => {
  console.log(`${from} → ${to}`);
});

graph.on('output.update', (output) => applyToAvatar(output));

// State transitions
graph.trigger('user_speech_start');  // idle → listening
graph.trigger('transcript_ready');   // listening → thinking
graph.trigger('ai_audio_start');     // thinking → speaking
graph.trigger('ai_audio_end');       // speaking → idle

// Blend emotion and audio energy into output
graph.setEmotion('happy', 0.8);
graph.setAudioEnergy(0.7);
graph.update(deltaTime); // call each frame

States: idlelisteningthinkingspeakingidle

Emotion Controller

import { EmotionController, EmotionPresets } from '@omote/core';

const controller = new EmotionController();
controller.setPreset('happy');
controller.transitionTo({ joy: 0.8 }, 500); // 500ms smooth transition

// In animation loop
controller.update();
const current = controller.emotion;

Presets: neutral, happy, sad, angry, surprised, scared, disgusted, excited, tired, playful, pained, contemplative

Model Caching

IndexedDB-based caching with versioning, LRU eviction, and storage quota monitoring.

import { getModelCache, fetchWithCache, preloadModels } from '@omote/core';

// Fetch with automatic caching
const data = await fetchWithCache('/models/model.onnx');

// Versioned caching for model updates
const data = await fetchWithCache('/models/model.onnx', {
  version: '1.0.0',
  validateStale: true,
});

// Cache quota monitoring
import { configureCacheLimit } from '@omote/core';

configureCacheLimit({
  maxSizeBytes: 500 * 1024 * 1024, // 500MB limit
  onQuotaWarning: (info) => console.warn(`Storage ${info.percentUsed}% used`),
});

// Cache stats
const cache = getModelCache();
const stats = await cache.getStats(); // { totalSize, modelCount, models }

Microphone Capture

import { MicrophoneCapture } from '@omote/core';

const mic = new MicrophoneCapture({
  sampleRate: 16000,
  bufferSize: 4096,
});

mic.on('audio', ({ samples }) => {
  // Process 16kHz Float32Array samples
});

await mic.start();

Logging

import { configureLogging, createLogger } from '@omote/core';

configureLogging({ level: 'debug', format: 'pretty' });

const logger = createLogger('MyModule');
logger.info('Model loaded', { backend: 'webgpu', loadTimeMs: 1234 });

Telemetry

OpenTelemetry-compatible tracing and metrics.

import { configureTelemetry, getTelemetry } from '@omote/core';

configureTelemetry({
  enabled: true,
  serviceName: 'my-app',
  exporter: 'console', // or 'otlp' for production
});

const telemetry = getTelemetry();
const span = telemetry.startSpan('custom-operation');
// ... do work
span.end();

Text-to-Speech (Kokoro TTS)

import { createKokoroTTS } from '@omote/core';

const tts = createKokoroTTS({ defaultVoice: 'af_heart' });
await tts.load();

const audio = await tts.synthesize('Hello world!');
// audio: Float32Array @ 24kHz

Kokoro auto-detects the platform: mixed-fp16 WebGPU model (156MB) on Chrome/Edge, q8 WASM model (92MB) on Safari/iOS/Firefox.

Eager Load & Warmup

Use eagerLoad to preload models at construction time:

const tts = createKokoroTTS({ eagerLoad: true }); // Starts loading immediately

Use warmup() to prime AudioContext for iOS/Safari autoplay policy. Call from a user gesture handler:

button.onclick = async () => {
  await avatar.warmup(); // Primes AudioContext
  await avatar.connectVoice({ ... });
};

Observability

The SDK includes built-in OpenTelemetry-compatible tracing and metrics:

import { configureTelemetry, getTelemetry, MetricNames } from '@omote/core';

configureTelemetry({
  enabled: true,
  serviceName: 'my-app',
  exporter: 'console', // or OTLPExporter for production
});

All inference calls, model loads, cache operations, and voice turns are automatically instrumented.

Models

All models default to the HuggingFace CDN and are auto-downloaded on first use. Self-host with configureModelUrls():

import { configureModelUrls } from '@omote/core';

configureModelUrls({
  lam: 'https://your-cdn.com/models/lam.onnx',
  lamData: 'https://your-cdn.com/models/lam.onnx.data',
  senseVoice: 'https://your-cdn.com/models/sensevoice.onnx',
  sileroVad: 'https://your-cdn.com/models/silero_vad.onnx',
});

| Model | HuggingFace Repo | Size | |-------|-------------------|------| | LAM A2E | omote-ai/lam-a2e | lam.onnx (230KB) + lam.onnx.data (192MB) | | SenseVoice | omote-ai/sensevoice-asr | 228MB | | Silero VAD | deepghs/silero-vad-onnx | ~2MB | | Kokoro TTS (WASM) | onnx-community/Kokoro-82M-v1.0-ONNX | 92MB q8 | | Kokoro TTS (WebGPU) | omote-ai/kokoro-tts | 156MB mixed-fp16 |

Browser Compatibility

WebGPU-first with automatic WASM fallback.

| Browser | WebGPU | WASM | Recommended | |---------|--------|------|-------------| | Chrome 113+ (Desktop) | Yes | Yes | WebGPU | | Chrome 113+ (Android) | Yes | Yes | WebGPU | | Edge 113+ | Yes | Yes | WebGPU | | Firefox 130+ | Flag only | Yes | WASM | | Safari 18+ (macOS) | Limited | Yes | WASM | | Safari (iOS) | No | Yes | WASM |

import { isWebGPUAvailable } from '@omote/core';
const webgpu = await isWebGPUAvailable();

iOS Notes

All iOS browsers use WebKit under the hood. The SDK handles three platform constraints automatically:

  1. WASM binary selection — iOS crashes with the default JSEP/ASYNCIFY WASM binary. The SDK imports onnxruntime-web/wasm (non-JSEP) on iOS/Safari.
  2. A2E model routingcreateA2E() routes all platforms through A2EInference via UnifiedInferenceWorker. WebGPU on Chrome/Edge, WASM on Safari/iOS/Firefox.
  3. Worker memory — Multiple Workers each load their own ORT WASM runtime, exceeding iOS tab memory (~1.5GB). The SDK defaults to main-thread inference on iOS.

Consumer requirement: COEP/COOP headers must be skipped for iOS to avoid triggering SharedArrayBuffer (which forces threaded WASM with 4GB shared memory — crashes iOS). Desktop should keep COEP/COOP for multi-threaded performance.

| Feature | iOS Status | Notes | |---------|------------|-------| | Silero VAD | Works | 0.9ms latency | | SenseVoice ASR | Works | WASM, ~200ms | | A2E Lip Sync | Works | A2EInference (WASM) via createA2E(), ~45ms |

License

MIT