aiabm

v5.1.1

Published

5 months ago

AI Audiobook Maker - Convert PDFs and text files to audiobooks using OpenAI TTS or Thorsten-Voice (native German)

0High
0Medium
0Low

raccoova

audiobook tts text-to-speech openai fish-speech thorsten-voice local-tts german-tts pdf cli terminal ai

🎧 AI Audiobook Maker (AIABM) v5.1.0

Transform your PDFs and text files into high-quality audiobooks using OpenAI TTS (cloud) or Thorsten-Voice (native German). Choose between premium cloud voices or run everything locally at no cost!

🆕 New in v5.1: Beautiful UI/UX overhaul, flexible output options, optimized Thorsten-Voice loading, and Downloads folder as default output.

✨ Features

🎙️ Dual TTS Providers

☁️ OpenAI TTS: Premium cloud voices with 6 voice options (requires API key)
🇩🇪 Thorsten-Voice: Native German TTS with authentic pronunciation (local/free)

🚀 Core Features

🚀 Zero Installation: Run directly with npx aiabm
📁 Smart File Handling: Supports PDF and TXT files with drag & drop
🎤 Voice Preview: Listen to voices before choosing (2 Thorsten + 6 OpenAI voices)
🔒 Enhanced Security: Input sanitization, API key validation, and secure storage
🧪 Comprehensive Testing: 55+ unit tests with 12.6% coverage and growing
⏸️ Resume & Pause: Continue interrupted conversions anytime
🔐 Secure API Key Management: Encrypted local storage
📊 Progress Tracking: Real-time conversion progress with estimates
🎛️ Advanced Controls: Adjust speed, quality, and output format
💰 Cost Transparency: See exact pricing (OpenAI) or run free (local providers)
🔧 Smart Installation: Automatic setup for local TTS providers

🚀 Quick Start

Method 1: Direct Usage (Recommended)

# Convert a specific file
npx aiabm mybook.pdf

# Interactive mode
npx aiabm

Method 2: Global Installation

npm install -g aiabm
aiabm mybook.pdf

📋 Prerequisites

Required

Node.js 16+ (Download from nodejs.org)
FFmpeg (for audio combining - auto-installed on most systems)

Optional (Choose One or Both)

For OpenAI TTS:

OpenAI API key (get from platform.openai.com)
Costs ~$0.015 per 1,000 characters

For Thorsten-Voice (German TTS):

Python 3.9-3.11 (auto-installed)
Coqui TTS (auto-installed)
Completely FREE - runs locally

🎯 Usage Examples

CLI Mode

# Basic conversion
npx aiabm document.pdf

# With specific options (OpenAI)
npx aiabm book.txt --voice nova --speed 1.2 --model tts-1-hd

# Manage API key
npx aiabm --config

Interactive Mode

npx aiabm

Then follow the interactive prompts to:

Select TTS Provider (OpenAI, Fish Speech, or Thorsten-Voice)
Auto-install local providers if needed (one-time setup)
Select your file (browse, drag & drop, or enter path)
Preview and choose a voice
Configure settings (speed, quality, output format)
Monitor progress and resume if needed

🎤 Available Voices

🤖 OpenAI TTS (Cloud)

Alloy: Neutral, versatile
Echo: Clear, professional
Fable: Warm, storytelling
Onyx: Deep, authoritative
Nova: Bright, engaging
Shimmer: Gentle, soothing

🐟 Fish Speech (Local/Multilingual)

🇩🇪 German Female (Natural): High-quality German synthesis
🇩🇪 German Male (Clear): Professional German voice
🇩🇪 German Female (Expressive): Emotional German narration
🇺🇸 English Female (Warm): Natural English voice
🇺🇸 English Male (Professional): Business-quality English
🇺🇸 English Female (Energetic): Dynamic storytelling
🇫🇷 French Female (Elegant): Sophisticated French accent
🇫🇷 French Male (Sophisticated): Professional French voice

🇩🇪 Thorsten-Voice (Native German)

🇩🇪 Thorsten (Authentic German Male): High-quality native German voice
🇩🇪 Thorsten Emotional (German Male): German voice with emotional expression

💰 Pricing

OpenAI TTS

$0.015 per 1,000 characters

| Content Length | Estimated Cost | Example | |----------------|----------------|---------| | 10,000 characters | ~$0.15 | Short article | | 50,000 characters | ~$0.75 | Small e-book | | 100,000 characters | ~$1.50 | Average novel | | 250,000 characters | ~$3.75 | Large book |

Fish Speech & Thorsten-Voice

100% FREE - No API costs, runs entirely on your machine!

🔧 Local TTS Setup

Both Fish Speech and Thorsten-Voice run entirely on your machine - no API costs! Now with fully automated installation!

🚀 Smart Installation (Recommended)

npx aiabm
# Select "Fish Speech" or "Thorsten-Voice"
# Choose "Auto Install (recommended)"
# → System automatically downloads and configures everything!

🐟 Fish Speech Setup

What happens automatically:

📦 Repository Cloning - Downloads latest Fish Speech
🐍 Virtual Environment - Creates isolated Python environment
⚡ PyTorch Installation - Installs optimized CPU version
🤖 Model Download - Downloads Fish Speech 1.2 models (~1GB)
✅ Dependency Check - Verifies installation works

System Requirements:

Python 3.8+ recommended
~2GB disk space for models and dependencies
4GB+ RAM recommended
CPU or GPU (GPU faster but optional)

🇩🇪 Thorsten-Voice Setup

What happens automatically:

🐍 Compatible Python Detection - Finds Python 3.9-3.11
📦 Virtual Environment - Creates isolated environment
🎤 Coqui TTS Installation - Installs German TTS framework
🤖 Thorsten Model - Downloads German voice model (~500MB)
✅ Compatibility Check - Verifies everything works

System Requirements:

Python 3.9-3.11 (NOT 3.12+, NOT 3.13+)
~1GB disk space for models and dependencies
2GB+ RAM recommended

Python Version Issues?

# Install compatible Python on macOS
brew install [email protected]

# On Ubuntu/Debian
sudo apt install python3.11 python3.11-venv

🔧 Installation Status Tracking

✅ Smart Detection: Avoids re-installation if already installed
📅 Version Tracking: Shows installation date and version
🔄 Update Suggestions: Recommends updates after 30+ days
🛠️ Installation Markers: Persistent installation state

🔧 Advanced Features

Resume Interrupted Conversions

If conversion stops, simply run the tool again - it will automatically detect and offer to resume your previous session.

Multiple Output Formats

Single File: One complete audiobook MP3
Chapter Files: Separate MP3 per chunk
Both: Get both formats

Voice Preview Caching

Voice previews are cached locally to save API costs and improve performance.

Smart Text Chunking

Respects sentence boundaries
Preserves chapter structure for PDFs
Configurable chunk sizes (default: 4000 characters)

📂 File Support

PDF Files

✅ Up to 50MB
✅ Text extraction with structure preservation
✅ Automatic chapter detection

Text Files

✅ Up to 1M characters
✅ UTF-8 encoding
✅ Automatic formatting cleanup

🆕 What's New in v5.0

🔒 Enhanced Security

Input Sanitization: Prevents code injection and malicious input
API Key Validation: Comprehensive security checks for OpenAI keys
Secure Storage: Encrypted API key storage with multiple layers
Environment Assessment: Automatic security environment analysis

🧪 Comprehensive Testing

55+ Unit Tests: Extensive test coverage for core functionality
12.6% Code Coverage: Growing test suite with focus on critical paths
Mocked Services: Fast, reliable tests without external dependencies
CI/CD Pipeline: Automated testing on every commit

🛡️ Better Error Handling

Type-Safe Validation: Zod schemas for all configuration and data
Graceful Failures: Better error messages and recovery mechanisms
Logging & Monitoring: Detailed error tracking and user feedback

🎯 Developer Experience

GitHub Actions: Automated CI/CD with security auditing
ESLint Clean: Zero linting errors with consistent code style
Documentation: Comprehensive inline documentation and examples

⚙️ Configuration

API Key Storage

Your OpenAI API key is encrypted and stored locally at:

macOS/Linux: ~/.config/ai-audiobook-maker/config.json
Windows: %APPDATA%\ai-audiobook-maker\config.json

Cache Location

Voice previews and temporary files:

macOS/Linux: ~/.config/ai-audiobook-maker/cache/
Windows: %APPDATA%\ai-audiobook-maker\cache\

Local TTS Installations

Local TTS providers are installed to:

Fish Speech: ~/.aiabm/fish-speech/
Thorsten-Voice: ~/.aiabm/thorsten-voice/

🛠️ Troubleshooting

Common Issues

"FFmpeg not found"

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

"API key invalid"

Verify your key at OpenAI Platform
Use npx aiabm --config to update your key

"File too large"

PDFs: Maximum 50MB
Text: Maximum 1M characters
Split large files before conversion

"Fish Speech dependencies missing"

Check Python version: python3 --version
Try restarting the app
Virtual environment issues usually resolve on restart

"Thorsten-Voice requires Python 3.9-3.11"

Install compatible Python: brew install [email protected]
App will automatically detect and use it
Creates separate virtual environment

Voice preview not playing

macOS: Uses built-in afplay
Windows: Uses PowerShell media player
Linux: Requires ffplay, mpv, vlc, or mplayer

Performance Tips

Use tts-1 model for faster processing
Use tts-1-hd for higher quality (slower)
Local TTS providers are free but slower than cloud
Cache clears automatically after 30 days
Resume feature prevents re-processing completed chunks

🔒 Privacy & Security

API keys are encrypted locally using AES-192
No data is sent to servers when using local TTS
OpenAI TTS sends only text chunks to OpenAI servers
Cache files are stored locally only
Session data helps resume interrupted conversions
Local TTS models run entirely offline

📖 Examples

Converting a PDF Book with German Voice

npx aiabm "Mein Roman.pdf"
# Select "Thorsten-Voice"
# Choose German voice
# Enjoy authentic German pronunciation!

Interactive Multilingual Setup

npx aiabm
# Select "Fish Speech"
# Auto-install if needed
# Preview German, English, and French voices
# Choose your favorite for the content language

Quick OpenAI Conversion

npx aiabm document.pdf --voice nova --speed 1.1

🤝 Contributing

Issues and feature requests welcome at: GitHub Issues

📄 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

Built on OpenAI's TTS API, Fish Speech, and Thorsten-Voice/Coqui TTS
Fish Speech: https://github.com/fishaudio/fish-speech
Thorsten-Voice: https://github.com/thorstenMueller/Thorsten-Voice
Coqui TTS: https://github.com/coqui-ai/TTS
Uses FFmpeg for audio processing

📝 Changelog

v4.0.7 (2025-08-03) - 🐟 Fish Speech Fully Fixed & Operational

🐟 Fish Speech 100% Working - Complete resolution of all Fish Speech TTS issues
🔧 Fixed tokenizer.tiktoken - Proper base64 encoding of 32,000 tokens from Fish Speech
⚙️ Model Configuration Fixed - Created correct firefly_gan_vq.yaml matching model architecture
📐 Dimension Mismatch Resolved - Fixed 512-dim vs 1024-dim PyTorch tensor issues
✅ Parameter Validation Fixed - Corrected ServeTTSRequest use_memory_cache format
🎯 End-to-End Functionality - Text-to-semantic and decoder models load perfectly
🚀 Full Service Availability - Fish Speech now detected as available and operational

v4.0.6 (2025-08-03) - 🧪 Comprehensive Test Coverage & TTS Fixes

🧪 Major Test Coverage Improvement - 20% to 45.07% overall coverage (+125% improvement)
🎯 AudiobookMaker.js Tests - 0% to 42.58% coverage with integration tests
🔐 ConfigManager.js Tests - 0% to 98.03% coverage with security tests
📁 FileHandler.js Tests - 0% to 72.99% coverage with core functionality tests
🖥️ cli.js Tests - 0% to 75.75% coverage with end-to-end tests
🐟 Fish Speech Fixed - Installation detection and availability checking
🇩🇪 Thorsten Voice Fixed - Python 3.13 compatibility and installation issues
📊 207 Total Tests - 195 passing with comprehensive edge case coverage
🔧 Integration Tests - Real-world testing with actual TTS services and PDF processing
🛡️ Robust Error Handling - Enhanced service availability validation

v4.0.5 (2025-08-03) - 🎵 Unified Preview System

🎵 Unified Preview Texts - Consistent voice previews across all TTS providers
🌍 Language-Specific Previews - German, English, and French preview texts
💾 Smart Caching - Consistent cache filenames prevent preview regeneration
🎯 Voice Language Detection - Automatic language detection from voice names
🔄 Cache Optimization - Separate preview cache directories for each provider
⚙️ Better Performance - No more regenerating previews when switching providers

v4.0.4 (2025-08-03) - 🛠️ Fish Speech Engine Fix

🔧 Fixed TTSInferenceEngine initialization - Use proper ModelManager pattern
🏗️ Implemented correct model loading - Load LLaMA and DAC models separately
🎯 Auto-device detection - Support for MPS (Apple Silicon), CUDA, and CPU
📦 Better model management - Use launch_thread_safe_queue for text-to-semantic
🔄 Improved generation flow - Proper model initialization before inference

v4.0.3 (2025-08-03) - 🔧 Fish Speech Import Fix

🔧 Fixed MODDED_DAC import - Changed to correct DAC import from inference_engine
✅ Added missing torch import - Fixed undefined torch reference in generation script
🛠️ Simplified dependency check - Import DAC directly from inference_engine
📦 Better module verification - Check ServeTTSRequest schema availability

v4.0.2 (2025-08-03) - 🐟 Fish Speech API Update

🔧 Fixed Fish Speech dependency check - Updated to use current DAC-based architecture
🗑️ Removed deprecated VQGAN imports - Fish Speech now uses DAC (Descript Audio Codec)
✅ Updated generation script - Uses modern TTSInferenceEngine API
🔄 Better installation handling - Auto-removes incomplete installations
📦 Improved pip install - Installs Fish Speech package in development mode
🛠️ Enhanced error reporting - More detailed debugging information

v4.0.1 (2025-08-02) - 🔧 Installation & Compatibility Fixes

🔧 Fixed Fish Speech virtual environment usage - Proper dependency checking
🐍 Enhanced Python version detection - Blocks Thorsten-Voice on Python 3.13+
✅ Smart installation status tracking - Avoids unnecessary re-installations
📅 Installation markers - Persistent installation state with version info
🔄 Better error handling - More informative error messages and recovery
💡 Improved user guidance - Clear instructions for Python compatibility issues

v4.0.0 (2025-08-02) - 🌟 Major Refactoring

🗑️ REMOVED: Kyutai TTS (replaced due to Python 3.13 compatibility issues)
🐟 NEW: Fish Speech integration - State-of-the-art multilingual TTS
🇩🇪 NEW: Thorsten-Voice integration - Native German TTS
🎤 Enhanced Voice Selection: 16 total voices across 3 providers
🏗️ Automated Installation: One-click setup for local TTS providers
🔧 Improved Architecture: Better service abstraction and error handling
📊 Enhanced Testing: 80%+ test coverage with Jest
🛠️ Code Quality Tools: ESLint, Prettier, Snyk integration
🔄 Backward Compatibility: 100% compatibility with existing OpenAI workflows

v3.3.0 (2025-08-01) - 🚀 Kyutai Integration (Deprecated)

🆓 Kyutai TTS integration (now removed in v4.0.0)
🏗️ Automated installation system
🎤 15+ voice options
🔄 Provider selection system

Happy listening! 🎧 Turn any text into your personal audiobook library with the best TTS technology available.