npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@aituber-onair/voice

v0.7.0

Published

Voice synthesis library for AITuber OnAir

Readme

AITuber OnAir Voice

AITuber OnAir Voice - logo

@aituber-onair/voice is an independent voice synthesis library that supports multiple TTS (Text-to-Speech) engines. While originally developed for the AITuber OnAir project, it can be used standalone for any voice synthesis needs.

日本語版はこちら

This project is published as open-source software and is available as an npm package under the MIT License.

Table of Contents

Overview

@aituber-onair/voice is a comprehensive voice synthesis library that provides a unified interface for multiple TTS engines. It specializes in emotion-aware speech synthesis, making it ideal for creating expressive virtual characters, AI assistants, and interactive applications.

Key design principles:

  • Engine Independence: Switch between TTS engines without changing your code
  • Emotion Support: Built-in emotion detection and synthesis
  • Browser Ready: Full support for web audio playback
  • TypeScript First: Complete type safety and excellent IDE support
  • Zero Dependencies: Minimal external dependencies for maximum compatibility

Installation

Install using npm:

npm install @aituber-onair/voice

Or using yarn:

yarn add @aituber-onair/voice

Or using pnpm:

pnpm install @aituber-onair/voice

Main Features

  • Multiple TTS Engine Support
    Compatible with VOICEVOX, VoicePeak, OpenAI TTS, NijiVoice, MiniMax, AivisSpeech, Aivis Cloud, and more
  • Unified Interface
    Single API for all supported TTS engines
  • Emotion-Aware Synthesis
    Automatically detects and applies emotions from text tags like [happy], [sad], etc.
  • Screenplay Conversion
    Transforms text with emotion tags into structured screenplay format
  • Browser Audio Support
    Direct playback in web browsers using HTMLAudioElement
  • Custom Endpoints
    Support for self-hosted TTS servers
  • Language Detection
    Automatic language recognition for multi-language engines
  • Flexible Configuration
    Runtime engine switching and parameter updates

Basic Usage

Simple Text-to-Speech

import { VoiceService, VoiceServiceOptions } from '@aituber-onair/voice';

// Configure the voice service
const options: VoiceServiceOptions = {
  engineType: 'voicevox',
  speaker: '1',
  // Optional: specify custom endpoint
  voicevoxApiUrl: 'http://localhost:50021'
};

// Create voice service instance
const voiceService = new VoiceService(options);

// Speak text
await voiceService.speak({ text: 'Hello, world!' });

Using VoiceEngineAdapter (Recommended)

import { VoiceEngineAdapter, VoiceServiceOptions } from '@aituber-onair/voice';

const options: VoiceServiceOptions = {
  engineType: 'openai',
  speaker: 'alloy',
  apiKey: 'your-openai-api-key',
  onPlay: async (audioBuffer) => {
    // Custom audio playback handler
    console.log('Playing audio...');
  }
};

const voiceAdapter = new VoiceEngineAdapter(options);

// Speak with emotion
await voiceAdapter.speak({ 
  text: '[happy] I am so excited to talk with you!' 
});

Supported TTS Engines

VOICEVOX

High-quality Japanese speech synthesis engine with multiple character voices.

const voiceService = new VoiceService({
  engineType: 'voicevox',
  speaker: '1', // Character ID
  voicevoxApiUrl: 'http://localhost:50021' // Optional custom endpoint
});

VoicePeak

Professional speech synthesis with rich emotional expression.

const voiceService = new VoiceService({
  engineType: 'voicepeak',
  speaker: 'f1',
  voicepeakApiUrl: 'http://localhost:20202',
  voicepeakEmotion: 'happy',
  voicepeakSpeed: 140,
  voicepeakPitch: 20
});

OpenAI TTS

OpenAI's text-to-speech API with multiple voice options.

const voiceService = new VoiceService({
  engineType: 'openai',
  speaker: 'alloy',
  apiKey: 'your-openai-api-key'
});

NijiVoice

AI-based Japanese voice synthesis service.

const voiceService = new VoiceService({
  engineType: 'nijivoice',
  speaker: 'speaker-id',
  apiKey: 'your-nijivoice-api-key'
});

MiniMax

Multi-language TTS supporting 24 languages with HD quality.

const voiceService = new VoiceService({
  engineType: 'minimax',
  speaker: 'male-qn-qingse',
  apiKey: 'your-minimax-api-key',
  groupId: 'your-group-id', // Required for MiniMax
  endpoint: 'global' // or 'china'
});

Note: MiniMax requires both API key and GroupId for authentication. The GroupId is used for user group management, usage tracking, and billing.

AivisSpeech

AI-powered speech synthesis with natural voice quality.

const voiceService = new VoiceService({
  engineType: 'aivisSpeech',
  speaker: '888753760',
  aivisSpeechApiUrl: 'http://localhost:10101'
});

Aivis Cloud

High-quality cloud-based TTS service with advanced SSML support and streaming capabilities.

const voiceService = new VoiceService({
  engineType: 'aivisCloud',
  speaker: 'unused', // Not used when model UUID is specified
  apiKey: 'your-aivis-cloud-api-key',
  aivisCloudModelUuid: 'a59cb814-0083-4369-8542-f51a29e72af7', // Required
  
  // Optional advanced settings
  aivisCloudSpeakerUuid: 'speaker-uuid', // For multi-speaker models
  aivisCloudStyleId: 0, // Or use aivisCloudStyleName: 'ノーマル'
  aivisCloudUseSSML: true, // Enable SSML tags
  aivisCloudSpeakingRate: 1.0, // 0.5-2.0
  aivisCloudEmotionalIntensity: 1.0, // 0.0-2.0
  aivisCloudOutputFormat: 'mp3', // wav, flac, mp3, aac, opus
  aivisCloudOutputSamplingRate: 44100, // Hz
});

Key Features:

  • SSML Support: Rich markup for prosody, breaks, aliases, and emotions
  • Streaming Audio: Real-time audio generation and delivery
  • Multiple Formats: WAV, FLAC, MP3, AAC, Opus output
  • Emotion Control: Fine-grained emotional intensity settings
  • High Quality: Professional-grade voice synthesis

None (Silent Mode)

No audio output - useful for testing or text-only scenarios.

const voiceService = new VoiceService({
  engineType: 'none'
});

Emotion-Aware Speech

The library supports emotion tags in text for more expressive speech:

// Emotion tags are automatically detected and processed
await voiceService.speak({ 
  text: '[happy] Great to see you today!' 
});

await voiceService.speak({ 
  text: '[sad] I will miss you...' 
});

await voiceService.speak({ 
  text: '[angry] This is unacceptable!' 
});

// Supported emotions vary by engine
// Common emotions: happy, sad, angry, surprised, neutral

The emotion system works by:

  1. Extracting emotion tags from the text
  2. Converting text to screenplay format with emotion metadata
  3. Passing emotion information to engines that support it
  4. Falling back gracefully for engines without emotion support

Browser Compatibility

The library includes built-in browser audio playback support:

// Option 1: Default browser playback
const voiceService = new VoiceService({
  engineType: 'openai',
  speaker: 'alloy',
  apiKey: 'your-api-key'
  // Audio will play automatically in the browser
});

// Option 2: Custom audio handling
const voiceService = new VoiceService({
  engineType: 'voicevox',
  speaker: '1',
  onPlay: async (audioBuffer: ArrayBuffer) => {
    // Custom audio playback logic
    const audioContext = new AudioContext();
    const audioBufferSource = audioContext.createBufferSource();
    // ... handle audio playback
  }
});

// Option 3: Specify HTML audio element
const voiceService = new VoiceService({
  engineType: 'nijivoice',
  speaker: 'speaker-id',
  apiKey: 'your-api-key',
  audioElementId: 'my-audio-player' // ID of <audio> element
});

Advanced Configuration

Dynamic Engine Switching

const voiceAdapter = new VoiceEngineAdapter({
  engineType: 'voicevox',
  speaker: '1'
});

// Switch to a different engine at runtime
await voiceAdapter.updateOptions({
  engineType: 'openai',
  speaker: 'nova',
  apiKey: 'your-openai-api-key'
});

Custom Endpoints

// For self-hosted or custom TTS servers
const voiceService = new VoiceService({
  engineType: 'voicevox',
  speaker: '1',
  voicevoxApiUrl: 'https://my-custom-voicevox-server.com'
});

Engine Parameter Overrides

VoiceServiceOptions (see API Reference) now covers a consistent set of overrides for each engine (except NijiVoice). Below is a field-by-field summary to help you discover the right property without scanning the entire interface.

const voiceService = new VoiceService({
  engineType: 'voicevox',
  speaker: '1',
  openAiSpeed: 1.15,
  voicevoxSpeedScale: 1.1,
  voicevoxPitchScale: 0.05,
  voicevoxIntonationScale: 1.2,
  voicevoxQueryParameters: { pauseLength: 0.3, outputSamplingRate: 44100 },
  minimaxVoiceSettings: { speed: 1.05, vol: 1.1, pitch: 2 },
  minimaxAudioSettings: { sampleRate: 44100, format: 'mp3' },
  aivisSpeechSpeedScale: 1.05,
  aivisCloudSpeakingRate: 1.1,
  aivisCloudVolume: 1.05,
});

Tip: the React example in packages/voice/examples/react-basic exposes the same controls with collapsible cards + sliders, making it easy to try values before applying them in code.

Engine parameter reference

  • OpenAI TTS

    • openAiModel
    • openAiSpeed
  • VOICEVOX

    • Endpoint: voicevoxApiUrl
    • Scalars: voicevoxSpeedScale, voicevoxPitchScale, voicevoxIntonationScale, voicevoxVolumeScale
    • Timing: voicevoxPrePhonemeLength, voicevoxPostPhonemeLength, voicevoxPauseLength, voicevoxPauseLengthScale
    • Output: voicevoxOutputSamplingRate, voicevoxOutputStereo
    • Flags: voicevoxEnableKatakanaEnglish, voicevoxEnableInterrogativeUpspeak
    • Version: voicevoxCoreVersion
    • Low-level overrides: voicevoxQueryParameters
  • AivisSpeech

    • Endpoint: aivisSpeechApiUrl
    • Scalars: aivisSpeechSpeedScale, aivisSpeechPitchScale, aivisSpeechIntonationScale, aivisSpeechTempoDynamicsScale, aivisSpeechVolumeScale
    • Timing: aivisSpeechPrePhonemeLength, aivisSpeechPostPhonemeLength, aivisSpeechPauseLength, aivisSpeechPauseLengthScale
    • Output: aivisSpeechOutputSamplingRate, aivisSpeechOutputStereo
    • Low-level overrides: aivisSpeechQueryParameters
  • Aivis Cloud

    • Identity: aivisCloudModelUuid, aivisCloudSpeakerUuid, aivisCloudStyleId, aivisCloudStyleName, aivisCloudUserDictionaryUuid
    • Behaviour: aivisCloudUseSSML, aivisCloudLanguage, aivisCloudSpeakingRate, aivisCloudEmotionalIntensity, aivisCloudTempoDynamics, aivisCloudPitch, aivisCloudVolume
    • Silence: aivisCloudLeadingSilence, aivisCloudTrailingSilence, aivisCloudLineBreakSilence
    • Output: aivisCloudOutputFormat, aivisCloudOutputBitrate, aivisCloudOutputSamplingRate, aivisCloudOutputChannels
    • Logging: aivisCloudEnableBillingLogs
  • VoicePeak

    • Endpoint: voicepeakApiUrl
    • Emotion: voicepeakEmotion
    • Scalars: voicepeakSpeed, voicepeakPitch
  • MiniMax

    • Identity: groupId, endpoint, minimaxModel, minimaxLanguageBoost
    • Voice overrides: minimaxVoiceSettings or individual minimaxSpeed, minimaxVolume, minimaxPitch
    • Audio overrides: minimaxAudioSettings or individual minimaxSampleRate, minimaxBitrate, minimaxAudioFormat, minimaxAudioChannel
  • NijiVoice

    • Requires apiKey and speaker selection; no additional runtime parameters are currently exposed.

Error Handling

try {
  await voiceService.speak({ text: 'Hello!' });
} catch (error) {
  if (error.message.includes('API key')) {
    console.error('Invalid API key');
  } else if (error.message.includes('network')) {
    console.error('Network error - check your connection');
  } else {
    console.error('TTS error:', error);
  }
}

Engine-Specific Features

VOICEVOX Features

  • Multiple character voices with unique personalities
  • Adjustable speech parameters (speed, pitch, intonation)
  • Local server support for privacy

OpenAI TTS Features

  • High-quality multilingual support
  • Multiple voice personalities
  • Optimized for conversational AI

MiniMax Features

  • 24 language support with automatic detection
  • HD quality audio output
  • Dual-region endpoints (global/china)
  • Advanced emotion synthesis

NijiVoice Features

  • Japanese-specialized voices
  • Character-based voice models
  • Emotion-rich synthesis

Integration with AITuber OnAir Core

While this package can be used independently, it integrates seamlessly with @aituber-onair/core:

import { AITuberOnAirCore } from '@aituber-onair/core';

const core = new AITuberOnAirCore({
  apiKey: 'your-openai-key',
  voiceOptions: {
    engineType: 'voicevox',
    speaker: '1',
    voicevoxApiUrl: 'http://localhost:50021'
  }
});

// Voice synthesis is handled automatically
await core.processChat('Hello!');

API Reference

VoiceServiceOptions

interface VoiceServiceOptions {
  engineType: VoiceEngineType;
  speaker: string;
  apiKey?: string;
  groupId?: string; // For MiniMax
  endpoint?: 'global' | 'china'; // For MiniMax
  voicevoxApiUrl?: string;
  voicepeakApiUrl?: string;
  voicepeakEmotion?:
    | 'happy'
    | 'fun'
    | 'angry'
    | 'sad'
    | 'neutral'
    | 'surprised';
  voicepeakSpeed?: number; // 50-200 (integer)
  voicepeakPitch?: number; // -300 to 300 (integer)
  aivisSpeechApiUrl?: string;
  onPlay?: (audioBuffer: ArrayBuffer) => Promise<void>;
  onComplete?: () => void;
  audioElementId?: string;
}

VoiceEngine Methods

interface VoiceEngine {
  speak(params: SpeakParams): Promise<ArrayBuffer | null>;
  isAvailable(): Promise<boolean>;
  getSpeakers?(): Promise<SpeakerInfo[]>;
  getEngineInfo(): VoiceEngineInfo;
}

Screenplay Format

interface Screenplay {
  emotion?: string;
  text: string;
  speechText?: string;
}

Examples

React Integration

See the React example for a complete implementation:

import { useState } from 'react';
import { VoiceService } from '@aituber-onair/voice';

function VoiceDemo() {
  const [voiceService] = useState(
    () => new VoiceService({
      engineType: 'openai',
      speaker: 'alloy',
      apiKey: 'your-api-key'
    })
  );

  const handleSpeak = async (text: string) => {
    await voiceService.speak({ text });
  };

  return (
    <button onClick={() => handleSpeak('[happy] Hello!')}>
      Speak with emotion
    </button>
  );
}

Node.js Usage

The voice package now fully supports Node.js environments with automatic environment detection:

import { VoiceEngineAdapter } from '@aituber-onair/voice';

const voiceService = new VoiceEngineAdapter({
  engineType: 'openai',
  speaker: 'nova',
  apiKey: process.env.OPENAI_API_KEY
});

// Audio will be played using available Node.js audio libraries
await voiceService.speak({ text: 'Hello from Node.js!' });

Audio Playback in Node.js

For audio playback in Node.js, install one of these optional dependencies:

# Option 1: speaker (native bindings, better quality)
npm install speaker

# Option 2: play-sound (uses system audio player, easier to install)
npm install play-sound

If neither is installed, the package will still work but won't play audio. You can still use the onPlay callback to handle audio data:

const voiceService = new VoiceEngineAdapter({
  engineType: 'voicevox',
  speaker: '1',
  voicevoxApiUrl: 'http://localhost:50021',
  onPlay: async (audioBuffer) => {
    // Save to file or process audio data
    writeFileSync('output.wav', Buffer.from(audioBuffer));
  }
});

The package automatically detects the environment and uses the appropriate audio player:

  • Browser: Uses HTMLAudioElement
  • Node.js: Uses speaker or play-sound if available, otherwise silent

Testing

Run the test suite:

# Run all tests
npm test

# Run tests in watch mode
npm run test:watch

# Generate coverage report
npm run test:coverage

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.