audio-to-text-node
v0.2.0
Published
Backend audio file to text transcription using Web Speech API with Puppeteer
Downloads
123
Maintainers
Readme
🎧 audio-to-text
A free and robust backend package for transcribing audio files to text using the Web Speech API.
Features
- ✅ Convert audio files to text
- 🎤 Supports multiple languages
- 🧠 Uses Web Speech API inside a headless browser (via Puppeteer)
- 🔊 Streams audio using a virtual microphone
- 💾 Supports all audio file formats supported by ffmpeg (e.g., .mp3, .wav, .ogg, .m4a, etc.)
- 🪄 Automatically sets up required audio routing using
pactlandpaplay - ⚙️ Works in Linux environments with PipeWire or PulseAudio
🛠 Requirements
Before installing and using this package, please ensure the following dependencies are installed and properly configured on your system:
- ffmpeg — for audio format conversion and processing
- ffprobe — for audio validation (comes with ffmpeg)
- PipeWire — RECOMMENDED modern audio server
- PulseAudio — alternative audio server (older systems)
- pactl — Audio control tool
- paplay — Audio playback utility
- Microsoft Edge — Microsoft Edge Browser
- Google Chrome or Chromium — browsers
- Node.js — version 18 or higher is recommended
- bun — optional, recommended for development and build tasks
- Internet connection (required for browser-based speech recognition)
Install on Ubuntu/Debian:
PipeWire (Recommended - Modern Audio Server):
sudo apt update
sudo apt install ffmpeg pipewire pipewire-pulse wireplumberPulseAudio (Alternative - Older Systems):
sudo apt update
sudo apt install ffmpeg pulseaudio-utils pulseaudio🔐 Permissions
- Make sure Node.js has permission to run
pactlandpaplay - Puppeteer will launch a headless browser and use your virtual audio devices
📦 Installation
To install with Bun:
bun add audio-to-text-nodeOr with npm:
npm install audio-to-text-node🧼 Cleanup
The package creates temporary folders in /tmp/audio-to-text and cleans them up automatically after use.
✨ Usage
import { transcribeFromFile } from "audio-to-text-node";
async function main() {
const transcript = await transcribeFromFile("/path/to/audio.wav", {
language: "en-US",
executablePath: "/usr/bin/microsoft-edge",
speakerDevice: "virtual_speaker",
microphoneDevice: "virtual_microphone",
});
console.log(transcript);
}
main();Tested Distributions
| Distribution | Version | Status | | ------------ | ------- | ---------------- | | Ubuntu | 24.10 | ✅ Fully Tested | | MacOS | - | ❌ Not Supported | | Windows | - | ❌ Not Supported |
Note: This package is designed for Linux environments.
📚 API Reference
🧠 transcribeFromFile(filePath: string, options?: { language?: string; executablePath?: string; speakerDevice?: string; microphoneDevice?: string }): Promise<string>
| 🧩 Parameter | 📝 Type | 📖 Description | 🧵 Default |
| -------------------------- | -------- | ------------------------------------------------------ | ---------------------- |
| filePath | string | Path to the audio file (.wav, .mp3, .ogg, etc.) | — |
| options.language | string | Language code for transcription | 'en-US' |
| options.executablePath | string | Path to browser executable | Auto-detected |
| options.speakerDevice | string | Virtual speaker device name (PipeWire/PulseAudio) | 'virtual_speaker' |
| options.microphoneDevice | string | Virtual microphone device name (PipeWire/PulseAudio) | 'virtual_microphone' |
Browser Detection Priority:
- Microsoft Edge -
/usr/bin/microsoft-edge - Google Chrome -
/usr/bin/google-chrome - Chromium -
/usr/bin/chromium-browser
🔁 Returns: Promise<string> — The transcribed text.
⚙️ How it works:
- ✅ Validates and splits the audio file into 5-second chunks
- 🎛 Sets up virtual audio devices for routing (PipeWire/PulseAudio)
- 🧭 Launches a headless browser and uses Web Speech API for transcription
- 🧹 Cleans up temporary files and restores audio routing
🎵 Supported Audio Formats
This package supports all audio formats supported by ffmpeg. For a full list, see:
Common formats include: .wav, .mp3, .ogg, .flac, .aac, .m4a, and more.
🌐 Supported Languages
You can use any language supported by the Web Speech API and Google Speech-to-Text. For a full list, see:
Specify the language code (e.g., en-US, fa-IR, fr-FR, etc.) in the language option.
🛠️ Troubleshooting
- Ensure all prerequisites are installed and available in your PATH (
which ffmpeg,which ffprobe,which pactl,which paplay) - For best audio performance: Use PipeWire (modern) over PulseAudio (legacy)
- For long audio files, ensure enough disk space in
/tmp - If you get permission errors, run with appropriate user rights
- For best results, use high-quality audio files (16kHz mono recommended)
- Make sure your connection is stable and not interrupted during transcription
- Only Linux with PipeWire or PulseAudio is supported
- If browser detection fails, explicitly set
executablePathto your browser location
Common Browser Paths:
# Check if browsers are installed
which microsoft-edge
which google-chrome
which chromium-browserAudio System Check:
# Check if PipeWire is running (recommended)
systemctl --user status pipewire pipewire-pulse
# Check if PulseAudio is running (alternative)
systemctl --user status pulseaudio
# Test audio commands
which pactl paplay # Should work with both PipeWire and PulseAudio📝 Changelog
Version 0.2.0 (Latest)
- 🚀 BREAKING: Switched from
puppeteertopuppeteer-corefor better control - ✨ Added multi-browser support with automatic detection (Edge, Chrome, Chromium)
- ⚙️ Added
executablePathoption to specify custom browser location - 🎵 Added PipeWire support (recommended audio server)
💬 Contributing
Pull requests and issues are welcome! Please open issues for any bugs or feature requests. When contributing, please:
- Use clear commit messages
- Follow TypeScript best practices
📋 License
MIT © 2025 ErfanBahramali
