aiabm
v5.1.1
Published
AI Audiobook Maker - Convert PDFs and text files to audiobooks using OpenAI TTS or Thorsten-Voice (native German)
Maintainers
Readme
🎧 AI Audiobook Maker (AIABM) v5.1.0
Transform your PDFs and text files into high-quality audiobooks using OpenAI TTS (cloud) or Thorsten-Voice (native German). Choose between premium cloud voices or run everything locally at no cost!
🆕 New in v5.1: Beautiful UI/UX overhaul, flexible output options, optimized Thorsten-Voice loading, and Downloads folder as default output.
✨ Features
🎙️ Dual TTS Providers
- ☁️ OpenAI TTS: Premium cloud voices with 6 voice options (requires API key)
- 🇩🇪 Thorsten-Voice: Native German TTS with authentic pronunciation (local/free)
🚀 Core Features
- 🚀 Zero Installation: Run directly with
npx aiabm - 📁 Smart File Handling: Supports PDF and TXT files with drag & drop
- 🎤 Voice Preview: Listen to voices before choosing (2 Thorsten + 6 OpenAI voices)
- 🔒 Enhanced Security: Input sanitization, API key validation, and secure storage
- 🧪 Comprehensive Testing: 55+ unit tests with 12.6% coverage and growing
- ⏸️ Resume & Pause: Continue interrupted conversions anytime
- 🔐 Secure API Key Management: Encrypted local storage
- 📊 Progress Tracking: Real-time conversion progress with estimates
- 🎛️ Advanced Controls: Adjust speed, quality, and output format
- 💰 Cost Transparency: See exact pricing (OpenAI) or run free (local providers)
- 🔧 Smart Installation: Automatic setup for local TTS providers
🚀 Quick Start
Method 1: Direct Usage (Recommended)
# Convert a specific file
npx aiabm mybook.pdf
# Interactive mode
npx aiabmMethod 2: Global Installation
npm install -g aiabm
aiabm mybook.pdf📋 Prerequisites
Required
- Node.js 16+ (Download from nodejs.org)
- FFmpeg (for audio combining - auto-installed on most systems)
Optional (Choose One or Both)
For OpenAI TTS:
- OpenAI API key (get from platform.openai.com)
- Costs ~$0.015 per 1,000 characters
For Thorsten-Voice (German TTS):
- Python 3.9-3.11 (auto-installed)
- Coqui TTS (auto-installed)
- Completely FREE - runs locally
🎯 Usage Examples
CLI Mode
# Basic conversion
npx aiabm document.pdf
# With specific options (OpenAI)
npx aiabm book.txt --voice nova --speed 1.2 --model tts-1-hd
# Manage API key
npx aiabm --configInteractive Mode
npx aiabmThen follow the interactive prompts to:
- Select TTS Provider (OpenAI, Fish Speech, or Thorsten-Voice)
- Auto-install local providers if needed (one-time setup)
- Select your file (browse, drag & drop, or enter path)
- Preview and choose a voice
- Configure settings (speed, quality, output format)
- Monitor progress and resume if needed
🎤 Available Voices
🤖 OpenAI TTS (Cloud)
- Alloy: Neutral, versatile
- Echo: Clear, professional
- Fable: Warm, storytelling
- Onyx: Deep, authoritative
- Nova: Bright, engaging
- Shimmer: Gentle, soothing
🐟 Fish Speech (Local/Multilingual)
- 🇩🇪 German Female (Natural): High-quality German synthesis
- 🇩🇪 German Male (Clear): Professional German voice
- 🇩🇪 German Female (Expressive): Emotional German narration
- 🇺🇸 English Female (Warm): Natural English voice
- 🇺🇸 English Male (Professional): Business-quality English
- 🇺🇸 English Female (Energetic): Dynamic storytelling
- 🇫🇷 French Female (Elegant): Sophisticated French accent
- 🇫🇷 French Male (Sophisticated): Professional French voice
🇩🇪 Thorsten-Voice (Native German)
- 🇩🇪 Thorsten (Authentic German Male): High-quality native German voice
- 🇩🇪 Thorsten Emotional (German Male): German voice with emotional expression
💰 Pricing
OpenAI TTS
$0.015 per 1,000 characters
| Content Length | Estimated Cost | Example | |----------------|----------------|---------| | 10,000 characters | ~$0.15 | Short article | | 50,000 characters | ~$0.75 | Small e-book | | 100,000 characters | ~$1.50 | Average novel | | 250,000 characters | ~$3.75 | Large book |
Fish Speech & Thorsten-Voice
100% FREE - No API costs, runs entirely on your machine!
🔧 Local TTS Setup
Both Fish Speech and Thorsten-Voice run entirely on your machine - no API costs! Now with fully automated installation!
🚀 Smart Installation (Recommended)
npx aiabm
# Select "Fish Speech" or "Thorsten-Voice"
# Choose "Auto Install (recommended)"
# → System automatically downloads and configures everything!🐟 Fish Speech Setup
What happens automatically:
- 📦 Repository Cloning - Downloads latest Fish Speech
- 🐍 Virtual Environment - Creates isolated Python environment
- ⚡ PyTorch Installation - Installs optimized CPU version
- 🤖 Model Download - Downloads Fish Speech 1.2 models (~1GB)
- ✅ Dependency Check - Verifies installation works
System Requirements:
- Python 3.8+ recommended
- ~2GB disk space for models and dependencies
- 4GB+ RAM recommended
- CPU or GPU (GPU faster but optional)
🇩🇪 Thorsten-Voice Setup
What happens automatically:
- 🐍 Compatible Python Detection - Finds Python 3.9-3.11
- 📦 Virtual Environment - Creates isolated environment
- 🎤 Coqui TTS Installation - Installs German TTS framework
- 🤖 Thorsten Model - Downloads German voice model (~500MB)
- ✅ Compatibility Check - Verifies everything works
System Requirements:
- Python 3.9-3.11 (NOT 3.12+, NOT 3.13+)
- ~1GB disk space for models and dependencies
- 2GB+ RAM recommended
Python Version Issues?
# Install compatible Python on macOS
brew install [email protected]
# On Ubuntu/Debian
sudo apt install python3.11 python3.11-venv🔧 Installation Status Tracking
- ✅ Smart Detection: Avoids re-installation if already installed
- 📅 Version Tracking: Shows installation date and version
- 🔄 Update Suggestions: Recommends updates after 30+ days
- 🛠️ Installation Markers: Persistent installation state
🔧 Advanced Features
Resume Interrupted Conversions
If conversion stops, simply run the tool again - it will automatically detect and offer to resume your previous session.
Multiple Output Formats
- Single File: One complete audiobook MP3
- Chapter Files: Separate MP3 per chunk
- Both: Get both formats
Voice Preview Caching
Voice previews are cached locally to save API costs and improve performance.
Smart Text Chunking
- Respects sentence boundaries
- Preserves chapter structure for PDFs
- Configurable chunk sizes (default: 4000 characters)
📂 File Support
PDF Files
- ✅ Up to 50MB
- ✅ Text extraction with structure preservation
- ✅ Automatic chapter detection
Text Files
- ✅ Up to 1M characters
- ✅ UTF-8 encoding
- ✅ Automatic formatting cleanup
🆕 What's New in v5.0
🔒 Enhanced Security
- Input Sanitization: Prevents code injection and malicious input
- API Key Validation: Comprehensive security checks for OpenAI keys
- Secure Storage: Encrypted API key storage with multiple layers
- Environment Assessment: Automatic security environment analysis
🧪 Comprehensive Testing
- 55+ Unit Tests: Extensive test coverage for core functionality
- 12.6% Code Coverage: Growing test suite with focus on critical paths
- Mocked Services: Fast, reliable tests without external dependencies
- CI/CD Pipeline: Automated testing on every commit
🛡️ Better Error Handling
- Type-Safe Validation: Zod schemas for all configuration and data
- Graceful Failures: Better error messages and recovery mechanisms
- Logging & Monitoring: Detailed error tracking and user feedback
🎯 Developer Experience
- GitHub Actions: Automated CI/CD with security auditing
- ESLint Clean: Zero linting errors with consistent code style
- Documentation: Comprehensive inline documentation and examples
⚙️ Configuration
API Key Storage
Your OpenAI API key is encrypted and stored locally at:
- macOS/Linux:
~/.config/ai-audiobook-maker/config.json - Windows:
%APPDATA%\ai-audiobook-maker\config.json
Cache Location
Voice previews and temporary files:
- macOS/Linux:
~/.config/ai-audiobook-maker/cache/ - Windows:
%APPDATA%\ai-audiobook-maker\cache\
Local TTS Installations
Local TTS providers are installed to:
- Fish Speech:
~/.aiabm/fish-speech/ - Thorsten-Voice:
~/.aiabm/thorsten-voice/
🛠️ Troubleshooting
Common Issues
"FFmpeg not found"
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt install ffmpeg
# Windows
# Download from https://ffmpeg.org/download.html"API key invalid"
- Verify your key at OpenAI Platform
- Use
npx aiabm --configto update your key
"File too large"
- PDFs: Maximum 50MB
- Text: Maximum 1M characters
- Split large files before conversion
"Fish Speech dependencies missing"
- Check Python version:
python3 --version - Try restarting the app
- Virtual environment issues usually resolve on restart
"Thorsten-Voice requires Python 3.9-3.11"
- Install compatible Python:
brew install [email protected] - App will automatically detect and use it
- Creates separate virtual environment
Voice preview not playing
- macOS: Uses built-in
afplay - Windows: Uses PowerShell media player
- Linux: Requires
ffplay,mpv,vlc, ormplayer
Performance Tips
- Use
tts-1model for faster processing - Use
tts-1-hdfor higher quality (slower) - Local TTS providers are free but slower than cloud
- Cache clears automatically after 30 days
- Resume feature prevents re-processing completed chunks
🔒 Privacy & Security
- API keys are encrypted locally using AES-192
- No data is sent to servers when using local TTS
- OpenAI TTS sends only text chunks to OpenAI servers
- Cache files are stored locally only
- Session data helps resume interrupted conversions
- Local TTS models run entirely offline
📖 Examples
Converting a PDF Book with German Voice
npx aiabm "Mein Roman.pdf"
# Select "Thorsten-Voice"
# Choose German voice
# Enjoy authentic German pronunciation!Interactive Multilingual Setup
npx aiabm
# Select "Fish Speech"
# Auto-install if needed
# Preview German, English, and French voices
# Choose your favorite for the content languageQuick OpenAI Conversion
npx aiabm document.pdf --voice nova --speed 1.1🤝 Contributing
Issues and feature requests welcome at: GitHub Issues
📄 License
MIT License - see LICENSE file for details
🙏 Acknowledgments
- Built on OpenAI's TTS API, Fish Speech, and Thorsten-Voice/Coqui TTS
- Fish Speech: https://github.com/fishaudio/fish-speech
- Thorsten-Voice: https://github.com/thorstenMueller/Thorsten-Voice
- Coqui TTS: https://github.com/coqui-ai/TTS
- Uses FFmpeg for audio processing
📝 Changelog
v4.0.7 (2025-08-03) - 🐟 Fish Speech Fully Fixed & Operational
- 🐟 Fish Speech 100% Working - Complete resolution of all Fish Speech TTS issues
- 🔧 Fixed tokenizer.tiktoken - Proper base64 encoding of 32,000 tokens from Fish Speech
- ⚙️ Model Configuration Fixed - Created correct firefly_gan_vq.yaml matching model architecture
- 📐 Dimension Mismatch Resolved - Fixed 512-dim vs 1024-dim PyTorch tensor issues
- ✅ Parameter Validation Fixed - Corrected ServeTTSRequest use_memory_cache format
- 🎯 End-to-End Functionality - Text-to-semantic and decoder models load perfectly
- 🚀 Full Service Availability - Fish Speech now detected as available and operational
v4.0.6 (2025-08-03) - 🧪 Comprehensive Test Coverage & TTS Fixes
- 🧪 Major Test Coverage Improvement - 20% to 45.07% overall coverage (+125% improvement)
- 🎯 AudiobookMaker.js Tests - 0% to 42.58% coverage with integration tests
- 🔐 ConfigManager.js Tests - 0% to 98.03% coverage with security tests
- 📁 FileHandler.js Tests - 0% to 72.99% coverage with core functionality tests
- 🖥️ cli.js Tests - 0% to 75.75% coverage with end-to-end tests
- 🐟 Fish Speech Fixed - Installation detection and availability checking
- 🇩🇪 Thorsten Voice Fixed - Python 3.13 compatibility and installation issues
- 📊 207 Total Tests - 195 passing with comprehensive edge case coverage
- 🔧 Integration Tests - Real-world testing with actual TTS services and PDF processing
- 🛡️ Robust Error Handling - Enhanced service availability validation
v4.0.5 (2025-08-03) - 🎵 Unified Preview System
- 🎵 Unified Preview Texts - Consistent voice previews across all TTS providers
- 🌍 Language-Specific Previews - German, English, and French preview texts
- 💾 Smart Caching - Consistent cache filenames prevent preview regeneration
- 🎯 Voice Language Detection - Automatic language detection from voice names
- 🔄 Cache Optimization - Separate preview cache directories for each provider
- ⚙️ Better Performance - No more regenerating previews when switching providers
v4.0.4 (2025-08-03) - 🛠️ Fish Speech Engine Fix
- 🔧 Fixed TTSInferenceEngine initialization - Use proper ModelManager pattern
- 🏗️ Implemented correct model loading - Load LLaMA and DAC models separately
- 🎯 Auto-device detection - Support for MPS (Apple Silicon), CUDA, and CPU
- 📦 Better model management - Use launch_thread_safe_queue for text-to-semantic
- 🔄 Improved generation flow - Proper model initialization before inference
v4.0.3 (2025-08-03) - 🔧 Fish Speech Import Fix
- 🔧 Fixed MODDED_DAC import - Changed to correct DAC import from inference_engine
- ✅ Added missing torch import - Fixed undefined torch reference in generation script
- 🛠️ Simplified dependency check - Import DAC directly from inference_engine
- 📦 Better module verification - Check ServeTTSRequest schema availability
v4.0.2 (2025-08-03) - 🐟 Fish Speech API Update
- 🔧 Fixed Fish Speech dependency check - Updated to use current DAC-based architecture
- 🗑️ Removed deprecated VQGAN imports - Fish Speech now uses DAC (Descript Audio Codec)
- ✅ Updated generation script - Uses modern TTSInferenceEngine API
- 🔄 Better installation handling - Auto-removes incomplete installations
- 📦 Improved pip install - Installs Fish Speech package in development mode
- 🛠️ Enhanced error reporting - More detailed debugging information
v4.0.1 (2025-08-02) - 🔧 Installation & Compatibility Fixes
- 🔧 Fixed Fish Speech virtual environment usage - Proper dependency checking
- 🐍 Enhanced Python version detection - Blocks Thorsten-Voice on Python 3.13+
- ✅ Smart installation status tracking - Avoids unnecessary re-installations
- 📅 Installation markers - Persistent installation state with version info
- 🔄 Better error handling - More informative error messages and recovery
- 💡 Improved user guidance - Clear instructions for Python compatibility issues
v4.0.0 (2025-08-02) - 🌟 Major Refactoring
- 🗑️ REMOVED: Kyutai TTS (replaced due to Python 3.13 compatibility issues)
- 🐟 NEW: Fish Speech integration - State-of-the-art multilingual TTS
- 🇩🇪 NEW: Thorsten-Voice integration - Native German TTS
- 🎤 Enhanced Voice Selection: 16 total voices across 3 providers
- 🏗️ Automated Installation: One-click setup for local TTS providers
- 🔧 Improved Architecture: Better service abstraction and error handling
- 📊 Enhanced Testing: 80%+ test coverage with Jest
- 🛠️ Code Quality Tools: ESLint, Prettier, Snyk integration
- 🔄 Backward Compatibility: 100% compatibility with existing OpenAI workflows
v3.3.0 (2025-08-01) - 🚀 Kyutai Integration (Deprecated)
- 🆓 Kyutai TTS integration (now removed in v4.0.0)
- 🏗️ Automated installation system
- 🎤 15+ voice options
- 🔄 Provider selection system
Happy listening! 🎧 Turn any text into your personal audiobook library with the best TTS technology available.
