@garycraft/edge-tts

v0.0.15

Published

6 months ago

Use Microsoft Edge's online text-to-speech service from JS code directly!

0High
0Medium
0Low

garycraft

tts

Edge TTS

A TypeScript library for generating speech using Microsoft Edge's text-to-speech API

Generate speech from text using Microsoft Edge's text-to-speech service. This library provides access to Edge's TTS capabilities with subtitle generation support and voice customization options.

Installation

npm install @echristian/edge-tts

CLI Usage

# List all available voices grouped by locale
npx @echristian/edge-tts voices

# Generate audio from text
npx @echristian/edge-tts synthesize "Hello world" --audio output.mp3 --voice en-US-AvaNeural

# Generate audio with subtitles
npx @echristian/edge-tts synthesize "Hello world" --audio output.mp3 --subtitle output.srt --voice en-US-AvaNeural

API Usage

import { synthesize, synthesizeStream, getVoices } from "@echristian/edge-tts";

// Get available voices
const voices = await getVoices();
console.log(voices); // Array of available voice options

// Basic usage with synthesize()
const { audio, subtitle } = await synthesize({
  text: "Hello, world!",
});

// Stream processing usage
const generator = synthesizeStream({ text: "Hello world" });
for await (const chunk of generator) {
  // chunk is a Uint8Array of raw audio data
  // Process or save each chunk as needed
}

// Collecting all streamed chunks
const chunks: Uint8Array[] = [];
for await (const chunk of synthesizeStream({ text: "Hello world" })) {
  chunks.push(chunk);
}

API

getVoices(): Promise<Array>

Returns an array of available voices with their properties.

Voice Object

| Property | Type | Description | | ------------ | ------ | ------------------------------ | | Name | string | Full name of the voice | | ShortName | string | Short identifier for the voice | | Gender | string | Voice gender (Male/Female) | | Locale | string | Language code and region | | FriendlyName | string | Display name for the voice |

synthesize(options): Promise

Main function to generate speech from text.

synthesizeStream(options): AsyncGenerator

Creates an async generator that yields chunks of processed audio data. Each chunk has metadata headers automatically removed.

Uses the same options as synthesize(), but without subtitle support:

| Option | Type | Default | Description | | ------------ | ------ | --------------------------------- | ------------------------- | | text | string | (required) | Text to convert to speech | | voice | string | "en-US-AvaNeural" | Voice ID to use | | language | string | "en-US" | Language code | | outputFormat | string | "audio-24khz-96kbitrate-mono-mp3" | Audio format | | rate | string | "default" | Speaking rate | | pitch | string | "default" | Voice pitch | | volume | string | "default" | Audio volume |

For detailed configuration options, refer to Microsoft's documentation:

Note: Some options may be limited by Microsoft Edge's service capabilities.

GenerateOptions

| Option | Type | Default | Description | | ------------ | --------------- | ------------------------------------ | ------------------------- | | text | string | (required) | Text to convert to speech | | voice | string | "en-US-AvaNeural" | Voice ID to use | | language | string | "en-US" | Language code | | outputFormat | string | "audio-24khz-96kbitrate-mono-mp3" | Audio format | | rate | string | "default" | Speaking rate | | pitch | string | "default" | Voice pitch | | volume | string | "default" | Audio volume | | subtitle | SubtitleOptions | { splitBy: "word", wordsPerCue: 10 } | Subtitle options |

SubtitleOptions

| Option | Type | Default | Description | | -------------- | -------------------- | ------- | ------------------------------------ | | splitBy | "word" | "duration" | "word" | How to split subtitles | | wordsPerCue | number | 10 | Words per subtitle when using 'word' | | durationPerCue | number | 5000 | Duration (ms) when using 'duration' |

GenerateResult

| Property | Type | Description | | -------- | --------------------- | -------------------- | | audio | Blob | Generated audio data | | subtitle | Array | Generated subtitles |

SubtitleResult

| Property | Type | Description | | -------- | ------ | --------------- | | text | string | Subtitle text | | start | number | Start time (ms) | | end | number | End time (ms) | | duration | number | Duration (ms) |