@andresaya/n8n-nodes-edgetts

v1.0.1

Published

a month ago

n8n node for Edge TTS - Text-to-Speech using Microsoft Edge capabilities

0High
0Medium
0Low

andresaya

n8n-community-node-package tts text-to-speech edge-tts microsoft speech-synthesis

n8n-nodes-edgetts# n8n-nodes-edgetts# n8n-nodes-node-name

This is an n8n community node. It lets you use Edge TTS in your n8n workflows.

Edge TTS is a free text-to-speech service that uses Microsoft Edge's neural voices to convert text into natural-sounding speech across 400+ voices in multiple languages.This is an n8n community node. It lets you use Edge TTS in your n8n workflows.This is an n8n community node. It lets you use app/service name in your n8n workflows.

n8n is a fair-code licensed workflow automation platform.

Installation Edge TTS is a free text-to-speech service that uses Microsoft Edge's neural voices to convert text into natural-sounding speech across 400+ voices in multiple languages.App/service name is one or two sentences describing the service this node integrates with.

Operations

Compatibility

Usage

Resources n8n is a fair-code licensed workflow automation platform.n8n is a fair-code licensed workflow automation platform.

Installation

Follow the installation guide in the n8n community nodes documentation.Installation Installation

OperationsOperations Operations

SynthesizeCompatibility Credentials

Text to Speech - Convert text or SSML to audio with customizable voice parameters (pitch, rate, volume)

Usage Compatibility

Voice

List All Voices - Get all 400+ available voicesResources Usage
Filter by Language - Filter voices by language code (e.g., en-US, es-ES)
Filter by Gender - Filter voices by gender (Female or Male)Resources

Compatibility## InstallationVersion history

Minimum n8n version: 0.200.0

Tested against: n8n version 1.0.0+Follow the installation guide in the n8n community nodes documentation.## Installation

Usage

Basic Text-to-Speech## OperationsFollow the installation guide in the n8n community nodes documentation.

Add the Edge TTS node to your workflow
Select Synthesize → Text to Speech
Enter text in the Input Text fieldSynthesize## Operations
Choose a voice (default: en-US-AriaNeural)
Execute the workflow- Text to Speech - Convert text or SSML to audio with customizable voice parameters (pitch, rate, volume)

The output includes an audio field with base64-encoded MP3 data.List the operations supported by your node.

Voice ParametersVoice

Customize the voice output using Additional Options:- List All Voices - Get all 400+ available voices## Credentials

Pitch - Adjust voice pitch- Filter by Language - Filter voices by language code (e.g., en-US, es-ES)

Format: ±NHz (e.g., +10Hz, -15Hz)
Range: -100Hz to +100Hz- Filter by Gender - Filter voices by gender (Female or Male)If users need to authenticate with the app/service, provide details here. You should include prerequisites (such as signing up with the service), available authentication methods, and how to set them up.
Higher values = younger/excited sound
Lower values = serious/authoritative sound

Rate - Control speaking speed## Compatibility## Compatibility

Format: ±N% (e.g., +50%, -20%)
Range: -100% to +200%
Positive values = faster speech
Negative values = slower speechMinimum n8n version: 0.200.0_State the minimum n8n version, as well as which versions you test against. You can also include any known version incompatibility issues._

Volume - Adjust audio volume

Format: ±N% (e.g., +90%, -50%)
Range: -100% to +100%Tested against n8n version 1.0.0+## Usage

Popular Voices by Language

English## Usage_This is an optional section. Use it to help users with any difficult or confusing aspects of the node._

en-US-AriaNeural - Female, American (friendly, natural)
en-US-GuyNeural - Male, American (professional)
en-GB-SoniaNeural - Female, British
en-AU-NatashaNeural - Female, Australian### Basic Text-to-Speech_By the time users are looking for community nodes, they probably already know n8n basics. But if you expect new users, you can link to the Try it out documentation to help them get started._
en-IN-NeerjaNeural - Female, Indian

Spanish

es-ES-ElviraNeural - Female, Spain1. Add the Edge TTS node to your workflow## Resources
es-MX-DaliaNeural - Female, Mexico
es-AR-ElenaNeural - Female, Argentina2. Select Synthesize > Text to Speech
es-CO-SalomeNeural - Female, Colombia

Enter text in the Input Text field* n8n community nodes documentation

French

fr-FR-DeniseNeural - Female, France4. Choose a voice (default: en-US-AriaNeural)* Link to app/service documentation.
fr-CA-SylvieNeural - Female, Canada

Execute the workflow

German

de-DE-KatjaNeural - Female## Version history
de-DE-ConradNeural - Male

The output includes an audio field with base64-encoded MP3 data.

Portuguese

pt-BR-FranciscaNeural - Female, Brazil_This is another optional section. If your node has multiple versions, include a short description of available versions and what changed, as well as any compatibility impact._
pt-PT-RaquelNeural - Female, Portugal

Voice Parameters

Italian

it-IT-ElsaNeural - Female
it-IT-DiegoNeural - MaleCustomize the voice output using Additional Options:

Chinese- Pitch: Adjust voice pitch (-100Hz to +100Hz, e.g., +10Hz)

zh-CN-XiaoxiaoNeural - Female, Mandarin- Rate: Control speaking speed (-100% to +200%, e.g., +50%)
zh-HK-HiuGaaiNeural - Female, Cantonese- Volume: Adjust audio volume (-100% to +100%, e.g., +90%)
zh-TW-HsiaoChenNeural - Female, Taiwanese

Popular Voices

Japanese

ja-JP-NanamiNeural - Female- English: en-US-AriaNeural, en-US-GuyNeural, en-GB-SoniaNeural
ja-JP-KeitaNeural - Male- Spanish: es-ES-ElviraNeural, es-MX-DaliaNeural
French: fr-FR-DeniseNeural, fr-CA-SylvieNeural

Korean- German: de-DE-KatjaNeural, de-DE-ConradNeural

ko-KR-SunHiNeural - Female- Portuguese: pt-BR-FranciscaNeural, pt-PT-RaquelNeural
ko-KR-InJoonNeural - Male

SSML Support

Set Input Type to ssml for advanced control:

For advanced voice control, set Input Type to ssml:


**Basic SSML Example:**<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">

```xml  <voice name="en-US-AriaNeural">

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">    <prosody pitch="+10Hz" rate="+20%">

  <voice name="en-US-AriaNeural">      Welcome!

    <prosody pitch="+10Hz" rate="+20%">    </prosody>

      Welcome to our service!    <break time="500ms"/>

    </prosody>    Please listen carefully.

    <break time="500ms"/>  </voice>

    <prosody rate="-10%"></speak>

      Please listen carefully to the following options.```

    </prosody>

  </voice>## Resources

</speak>

```* [n8n community nodes documentation](https://docs.n8n.io/integrations/community-nodes/)

* [Microsoft SSML reference](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-voice)

**SSML with Multiple Voices:*** [GitHub repository](https://github.com/andresayac/n8n-nodes-edgetts)

```xml
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
  <voice name="en-US-GuyNeural">
    Hello, I'm the narrator.
  </voice>
  <break time="300ms"/>
  <voice name="en-US-AriaNeural">
    And I'm the assistant!
  </voice>
</speak>

SSML with Emphasis and Say-As:

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
  <voice name="en-US-AriaNeural">
    Your order number is <say-as interpret-as="digits">12345</say-as>.
    <break time="500ms"/>
    The total is <say-as interpret-as="currency" language="en-US">$45.99</say-as>.
    <break time="500ms"/>
    <emphasis level="strong">Thank you for your purchase!</emphasis>
  </voice>
</speak>

Working with Audio Output

The node returns audio as base64-encoded data in this format:

data:audio/mp3;base64,<base64-data>

Save to File:

Add a Write Binary File node after Edge TTS
Set the file path (e.g., /tmp/speech.mp3)
The audio will be automatically saved

Use in HTTP Request:

Extract the base64 data from the audio field
Send it in an HTTP request to APIs that accept base64 audio
Or decode and send as binary data

Play in Browser:

<audio controls>
  <source src="{{ $json.audio }}" type="audio/mp3">
</audio>

Performance Metrics

The node includes performance data in the output:

{
  "success": true,
  "voice": "en-US-AriaNeural",
  "audio": "data:audio/mp3;base64,...",
  "performance": {
    "synthesizeMs": 1250,
    "conversionMs": 45,
    "totalMs": 1295
  }
}

Use these metrics to:

Monitor API response times
Identify network latency issues
Optimize workflow performance
Debug slow executions

Batch Processing Example

Convert multiple texts at once using a loop:

Step 1: Create array with Set node

[
  { "text": "Hello world", "voice": "en-US-AriaNeural", "filename": "hello.mp3" },
  { "text": "Hola mundo", "voice": "es-ES-ElviraNeural", "filename": "hola.mp3" },
  { "text": "Bonjour monde", "voice": "fr-FR-DeniseNeural", "filename": "bonjour.mp3" }
]

Step 2: Add Loop Over Items node

Step 3: Add Edge TTS node inside loop

Input Text: {{ $json.text }}
Voice: {{ $json.voice }}

Step 4: Add Write Binary File node

File Path: /tmp/{{ $json.filename }}

Finding the Right Voice

Method 1: List All Voices

Use Voice → List All Voices
Browse the complete list of 400+ voices
Note the voice name you want to use

Method 2: Filter by Language

Use Voice → Filter by Language
Enter language code (e.g., es-MX for Mexican Spanish)
Get all voices for that language

Method 3: Filter by Gender

Use Voice → Filter by Gender
Select Female or Male
Get filtered list

Method 4: Combine Filters with Code Node

// Filter Spanish female voices
return items[0].json.voices
  .filter(v => v.language.startsWith('es-') && v.gender === 'Female')
  .map(v => ({ json: v }));

Common Use Cases

1. Automated Notifications

Convert alert messages to speech
Send audio notifications via Telegram/WhatsApp
Create voice announcements

2. E-Learning Content

Generate course narrations
Create language learning materials
Produce audiobook content

3. Accessibility

Convert articles to audio
Create voice versions of documents
Generate audio descriptions

4. Customer Service

IVR menu prompts
Automated voice responses
Call center announcements

5. Social Media

Create voiceovers for videos
Generate podcast intros
Produce audio for Instagram/TikTok

Tips and Best Practices

For Best Quality:

Use shorter text segments (under 500 words)
Choose appropriate voices for your language
Test different pitch/rate settings
Use SSML for precise control

For Better Performance:

Cache frequently used audio
Process in batches when possible
Monitor synthesizeMs metric
Use parallel processing for multiple items

For Multilingual Content:

Match voice language to text language
Use SSML to switch voices in same audio
Consider regional voice variants (es-ES vs es-MX)

Troubleshooting

Audio not generating:

Check if text is not empty
Verify voice name is correct (case-sensitive)
Ensure n8n has internet connection

Slow performance:

Check performance.synthesizeMs value
High values (>3000ms) indicate network issues
Try shorter text segments
Check server location vs Microsoft servers

Voice not found:

Use List All Voices to verify voice exists
Voice names are case-sensitive
Format: {language}-{region}-{name}Neural
Example: en-US-AriaNeural (not en-us-ariaNeural)

SSML errors:

Validate SSML syntax
Ensure voice name in SSML matches selected voice
Check xml:lang attribute matches voice language
Use proper namespace declarations

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

n8n-nodes-edgetts# n8n-nodes-edgetts# n8n-nodes-node-name

Installation

OperationsOperations Operations

SynthesizeCompatibility Credentials

Voice

Compatibility## InstallationVersion history

Usage

Basic Text-to-Speech## OperationsFollow the installation guide in the n8n community nodes documentation.

Voice ParametersVoice

Popular Voices by Language

Voice Parameters

Popular Voices

SSML Support

SSML Support

Working with Audio Output

Performance Metrics

Batch Processing Example

Finding the Right Voice

Common Use Cases

Tips and Best Practices

Troubleshooting

Resources