n8n-nodes-groq-speech-to-text
v1.0.0
Published
n8n node for Groq Speech-to-Text API - works with any audio provider
Maintainers
Readme
n8n-nodes-groq-speech-to-text
This is an n8n community node for transcribing audio using Groq's Whisper API. It's specifically designed to work seamlessly with Telnyx call recordings but supports any audio source.
Features
- Fast & Affordable: Uses Groq's ultra-fast Whisper models (whisper-large-v3-turbo and whisper-large-v3)
- Multiple Input Types:
- Direct URL support (perfect for Telnyx
recording_urls) - Binary data from previous nodes
- Direct URL support (perfect for Telnyx
- Two Operations:
- Transcribe: Convert audio to text in original language
- Translate: Convert audio to English text
- Flexible Options:
- Multiple response formats (JSON, verbose JSON, text)
- Language specification for improved accuracy
- Custom prompts for context
- Timestamp granularities (segment and word-level)
- Supported Audio Formats: MP3, MP4, WAV, M4A, FLAC, OGG, WebM, MPEG
Installation
Community Nodes (Recommended)
- Go to Settings > Community Nodes in your n8n instance
- Select Install
- Enter
n8n-nodes-groq-speech-to-text - Agree to the risks and click Install
Manual Installation
Navigate to your n8n installation and run:
npm install n8n-nodes-groq-speech-to-textRestart n8n to load the node.
Prerequisites
- n8n instance (self-hosted or cloud)
- Groq API key from console.groq.com
Configuration
Credentials
- In n8n, go to Credentials
- Create new Groq API credentials
- Enter your API key from Groq Console
- Test and save
Node Parameters
Basic Settings
- Operation: Choose between Transcribe or Translate
- Input Type:
Audio URL: Provide a direct URL to the audio fileBinary Data: Use audio from a binary property
- Model:
whisper-large-v3-turbo: Faster, more affordable ($0.04/hour)whisper-large-v3: Higher accuracy ($0.111/hour)
Optional Settings
- Language: Specify language code (e.g., 'en', 'es') for better accuracy
- Response Format: Choose output format (JSON, verbose JSON, or text)
- Additional Options:
- Prompt: Guide transcription with context (max 224 tokens)
- Temperature: Control randomness (0-1)
- Timestamp Granularities: Get timestamps at segment/word level
Usage Examples
Example 1: Transcribe Telnyx Call Recordings
Perfect workflow for transcribing Telnyx call recordings:
Telnyx Webhook → Groq Speech-to-Text → Database/CRMTelnyx Webhook Configuration:
- Listen for
call.recording.savedevents - Extract
recording_urlsfrom webhook payload
Groq Speech-to-Text Node:
- Operation:
Transcribe - Input Type:
Audio URL - Audio URL:
{{ $json.recording_urls.mp3 }}or{{ $json.recording_urls.wav }} - Model:
whisper-large-v3-turbo - Response Format:
verbose_json
The node will:
- Download the audio from Telnyx
- Send it to Groq for transcription
- Return the full transcript with metadata
Example 2: Binary Data from File Upload
HTTP Request (get audio) → Groq Speech-to-Text → Process TextGroq Speech-to-Text Node:
- Operation:
Transcribe - Input Type:
Binary Data - Binary Property:
data - Model:
whisper-large-v3-turbo
Example 3: Translate Foreign Language Calls to English
Telnyx Webhook → Groq Speech-to-Text → Email/SlackGroq Speech-to-Text Node:
- Operation:
Translate - Input Type:
Audio URL - Audio URL:
{{ $json.recording_urls.mp3 }} - Model:
whisper-large-v3
Telnyx Integration Details
Webhook Event Structure
When Telnyx sends a call.recording.saved webhook, it includes:
{
"recording_urls": {
"mp3": "https://...",
"wav": "https://..."
},
"recording_id": "...",
"call_control_id": "...",
"recording_started_at": "...",
"recording_ended_at": "..."
}You can directly use these URLs in the Groq Speech-to-Text node.
Recommended Workflow
- Telnyx Webhook Trigger: Listen for
call.recording.saved - Groq Speech-to-Text: Transcribe the recording
- Post-Processing:
- Store in database
- Send to CRM
- Analyze sentiment
- Generate summary
Output Format
JSON Response Format
{
"text": "The transcribed text..."
}Verbose JSON Response Format
{
"text": "The transcribed text...",
"language": "en",
"duration": 123.45,
"segments": [
{
"id": 0,
"start": 0.0,
"end": 5.2,
"text": "First segment..."
}
],
"metadata": {
"model": "whisper-large-v3-turbo",
"operation": "transcribe",
"language": "en",
"inputType": "url",
"fileName": "recording.mp3"
}
}Pricing
Groq's Whisper API pricing (as of 2025):
- whisper-large-v3-turbo: $0.04 per audio hour
- whisper-large-v3: $0.111 per audio hour
Minimum billing is 10 seconds per request.
Rate Limits & File Size
- Free Tier: 25 MB max file size
- Dev Tier: 100 MB max file size
- Minimum audio length: 0.01 seconds
For large files, consider chunking audio before transcription.
Troubleshooting
Error: "Unable to locate package"
- Ensure you're using n8n version 0.200.0 or higher
- Try manual installation via npm
Error: "Invalid audio format"
- Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
- Check file is not corrupted
- Verify file size is within limits
Error: "Invalid API key"
- Verify your Groq API key is correct
- Check credentials are properly configured in n8n
Poor Transcription Quality
- Specify the language parameter
- Use whisper-large-v3 instead of turbo for better accuracy
- Add a context prompt to guide transcription
- Ensure audio quality is good (16KHz mono recommended)
Development
# Clone the repository
git clone <your-repo-url>
cd n8n-nodes-groq-speech-to-text
# Install dependencies
npm install
# Build the node
npm run build
# Development mode (watch for changes)
npm run dev
# Lint code
npm run lint
# Fix linting issues
npm run lintfix
# Format code
npm run formatTesting
To test the node locally:
- Link the package to your n8n installation:
npm run build
npm link
cd ~/.n8n/nodes
npm link n8n-nodes-groq-speech-to-text- Restart n8n
- The node should appear in the nodes panel
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
License
MIT
Resources
Support
For issues and questions:
- GitHub Issues: [Your repo URL]
- n8n Community Forum: n8n.community
Built with ❤️ for the n8n community
