npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

aiabm

v5.1.1

Published

AI Audiobook Maker - Convert PDFs and text files to audiobooks using OpenAI TTS or Thorsten-Voice (native German)

Readme

🎧 AI Audiobook Maker (AIABM) v5.1.0

npm version License: MIT Node.js Version

Transform your PDFs and text files into high-quality audiobooks using OpenAI TTS (cloud) or Thorsten-Voice (native German). Choose between premium cloud voices or run everything locally at no cost!

🆕 New in v5.1: Beautiful UI/UX overhaul, flexible output options, optimized Thorsten-Voice loading, and Downloads folder as default output.

✨ Features

🎙️ Dual TTS Providers

  • ☁️ OpenAI TTS: Premium cloud voices with 6 voice options (requires API key)
  • 🇩🇪 Thorsten-Voice: Native German TTS with authentic pronunciation (local/free)

🚀 Core Features

  • 🚀 Zero Installation: Run directly with npx aiabm
  • 📁 Smart File Handling: Supports PDF and TXT files with drag & drop
  • 🎤 Voice Preview: Listen to voices before choosing (2 Thorsten + 6 OpenAI voices)
  • 🔒 Enhanced Security: Input sanitization, API key validation, and secure storage
  • 🧪 Comprehensive Testing: 55+ unit tests with 12.6% coverage and growing
  • ⏸️ Resume & Pause: Continue interrupted conversions anytime
  • 🔐 Secure API Key Management: Encrypted local storage
  • 📊 Progress Tracking: Real-time conversion progress with estimates
  • 🎛️ Advanced Controls: Adjust speed, quality, and output format
  • 💰 Cost Transparency: See exact pricing (OpenAI) or run free (local providers)
  • 🔧 Smart Installation: Automatic setup for local TTS providers

🚀 Quick Start

Method 1: Direct Usage (Recommended)

# Convert a specific file
npx aiabm mybook.pdf

# Interactive mode
npx aiabm

Method 2: Global Installation

npm install -g aiabm
aiabm mybook.pdf

📋 Prerequisites

Required

  • Node.js 16+ (Download from nodejs.org)
  • FFmpeg (for audio combining - auto-installed on most systems)

Optional (Choose One or Both)

For OpenAI TTS:

For Thorsten-Voice (German TTS):

  • Python 3.9-3.11 (auto-installed)
  • Coqui TTS (auto-installed)
  • Completely FREE - runs locally

🎯 Usage Examples

CLI Mode

# Basic conversion
npx aiabm document.pdf

# With specific options (OpenAI)
npx aiabm book.txt --voice nova --speed 1.2 --model tts-1-hd

# Manage API key
npx aiabm --config

Interactive Mode

npx aiabm

Then follow the interactive prompts to:

  1. Select TTS Provider (OpenAI, Fish Speech, or Thorsten-Voice)
  2. Auto-install local providers if needed (one-time setup)
  3. Select your file (browse, drag & drop, or enter path)
  4. Preview and choose a voice
  5. Configure settings (speed, quality, output format)
  6. Monitor progress and resume if needed

🎤 Available Voices

🤖 OpenAI TTS (Cloud)

  • Alloy: Neutral, versatile
  • Echo: Clear, professional
  • Fable: Warm, storytelling
  • Onyx: Deep, authoritative
  • Nova: Bright, engaging
  • Shimmer: Gentle, soothing

🐟 Fish Speech (Local/Multilingual)

  • 🇩🇪 German Female (Natural): High-quality German synthesis
  • 🇩🇪 German Male (Clear): Professional German voice
  • 🇩🇪 German Female (Expressive): Emotional German narration
  • 🇺🇸 English Female (Warm): Natural English voice
  • 🇺🇸 English Male (Professional): Business-quality English
  • 🇺🇸 English Female (Energetic): Dynamic storytelling
  • 🇫🇷 French Female (Elegant): Sophisticated French accent
  • 🇫🇷 French Male (Sophisticated): Professional French voice

🇩🇪 Thorsten-Voice (Native German)

  • 🇩🇪 Thorsten (Authentic German Male): High-quality native German voice
  • 🇩🇪 Thorsten Emotional (German Male): German voice with emotional expression

💰 Pricing

OpenAI TTS

$0.015 per 1,000 characters

| Content Length | Estimated Cost | Example | |----------------|----------------|---------| | 10,000 characters | ~$0.15 | Short article | | 50,000 characters | ~$0.75 | Small e-book | | 100,000 characters | ~$1.50 | Average novel | | 250,000 characters | ~$3.75 | Large book |

Fish Speech & Thorsten-Voice

100% FREE - No API costs, runs entirely on your machine!

🔧 Local TTS Setup

Both Fish Speech and Thorsten-Voice run entirely on your machine - no API costs! Now with fully automated installation!

🚀 Smart Installation (Recommended)

npx aiabm
# Select "Fish Speech" or "Thorsten-Voice"
# Choose "Auto Install (recommended)"
# → System automatically downloads and configures everything!

🐟 Fish Speech Setup

What happens automatically:

  1. 📦 Repository Cloning - Downloads latest Fish Speech
  2. 🐍 Virtual Environment - Creates isolated Python environment
  3. ⚡ PyTorch Installation - Installs optimized CPU version
  4. 🤖 Model Download - Downloads Fish Speech 1.2 models (~1GB)
  5. ✅ Dependency Check - Verifies installation works

System Requirements:

  • Python 3.8+ recommended
  • ~2GB disk space for models and dependencies
  • 4GB+ RAM recommended
  • CPU or GPU (GPU faster but optional)

🇩🇪 Thorsten-Voice Setup

What happens automatically:

  1. 🐍 Compatible Python Detection - Finds Python 3.9-3.11
  2. 📦 Virtual Environment - Creates isolated environment
  3. 🎤 Coqui TTS Installation - Installs German TTS framework
  4. 🤖 Thorsten Model - Downloads German voice model (~500MB)
  5. ✅ Compatibility Check - Verifies everything works

System Requirements:

  • Python 3.9-3.11 (NOT 3.12+, NOT 3.13+)
  • ~1GB disk space for models and dependencies
  • 2GB+ RAM recommended

Python Version Issues?

# Install compatible Python on macOS
brew install [email protected]

# On Ubuntu/Debian
sudo apt install python3.11 python3.11-venv

🔧 Installation Status Tracking

  • ✅ Smart Detection: Avoids re-installation if already installed
  • 📅 Version Tracking: Shows installation date and version
  • 🔄 Update Suggestions: Recommends updates after 30+ days
  • 🛠️ Installation Markers: Persistent installation state

🔧 Advanced Features

Resume Interrupted Conversions

If conversion stops, simply run the tool again - it will automatically detect and offer to resume your previous session.

Multiple Output Formats

  • Single File: One complete audiobook MP3
  • Chapter Files: Separate MP3 per chunk
  • Both: Get both formats

Voice Preview Caching

Voice previews are cached locally to save API costs and improve performance.

Smart Text Chunking

  • Respects sentence boundaries
  • Preserves chapter structure for PDFs
  • Configurable chunk sizes (default: 4000 characters)

📂 File Support

PDF Files

  • ✅ Up to 50MB
  • ✅ Text extraction with structure preservation
  • ✅ Automatic chapter detection

Text Files

  • ✅ Up to 1M characters
  • ✅ UTF-8 encoding
  • ✅ Automatic formatting cleanup

🆕 What's New in v5.0

🔒 Enhanced Security

  • Input Sanitization: Prevents code injection and malicious input
  • API Key Validation: Comprehensive security checks for OpenAI keys
  • Secure Storage: Encrypted API key storage with multiple layers
  • Environment Assessment: Automatic security environment analysis

🧪 Comprehensive Testing

  • 55+ Unit Tests: Extensive test coverage for core functionality
  • 12.6% Code Coverage: Growing test suite with focus on critical paths
  • Mocked Services: Fast, reliable tests without external dependencies
  • CI/CD Pipeline: Automated testing on every commit

🛡️ Better Error Handling

  • Type-Safe Validation: Zod schemas for all configuration and data
  • Graceful Failures: Better error messages and recovery mechanisms
  • Logging & Monitoring: Detailed error tracking and user feedback

🎯 Developer Experience

  • GitHub Actions: Automated CI/CD with security auditing
  • ESLint Clean: Zero linting errors with consistent code style
  • Documentation: Comprehensive inline documentation and examples

⚙️ Configuration

API Key Storage

Your OpenAI API key is encrypted and stored locally at:

  • macOS/Linux: ~/.config/ai-audiobook-maker/config.json
  • Windows: %APPDATA%\ai-audiobook-maker\config.json

Cache Location

Voice previews and temporary files:

  • macOS/Linux: ~/.config/ai-audiobook-maker/cache/
  • Windows: %APPDATA%\ai-audiobook-maker\cache\

Local TTS Installations

Local TTS providers are installed to:

  • Fish Speech: ~/.aiabm/fish-speech/
  • Thorsten-Voice: ~/.aiabm/thorsten-voice/

🛠️ Troubleshooting

Common Issues

"FFmpeg not found"

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

"API key invalid"

  • Verify your key at OpenAI Platform
  • Use npx aiabm --config to update your key

"File too large"

  • PDFs: Maximum 50MB
  • Text: Maximum 1M characters
  • Split large files before conversion

"Fish Speech dependencies missing"

  • Check Python version: python3 --version
  • Try restarting the app
  • Virtual environment issues usually resolve on restart

"Thorsten-Voice requires Python 3.9-3.11"

  • Install compatible Python: brew install [email protected]
  • App will automatically detect and use it
  • Creates separate virtual environment

Voice preview not playing

  • macOS: Uses built-in afplay
  • Windows: Uses PowerShell media player
  • Linux: Requires ffplay, mpv, vlc, or mplayer

Performance Tips

  • Use tts-1 model for faster processing
  • Use tts-1-hd for higher quality (slower)
  • Local TTS providers are free but slower than cloud
  • Cache clears automatically after 30 days
  • Resume feature prevents re-processing completed chunks

🔒 Privacy & Security

  • API keys are encrypted locally using AES-192
  • No data is sent to servers when using local TTS
  • OpenAI TTS sends only text chunks to OpenAI servers
  • Cache files are stored locally only
  • Session data helps resume interrupted conversions
  • Local TTS models run entirely offline

📖 Examples

Converting a PDF Book with German Voice

npx aiabm "Mein Roman.pdf"
# Select "Thorsten-Voice"
# Choose German voice
# Enjoy authentic German pronunciation!

Interactive Multilingual Setup

npx aiabm
# Select "Fish Speech"
# Auto-install if needed
# Preview German, English, and French voices
# Choose your favorite for the content language

Quick OpenAI Conversion

npx aiabm document.pdf --voice nova --speed 1.1

🤝 Contributing

Issues and feature requests welcome at: GitHub Issues

📄 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

  • Built on OpenAI's TTS API, Fish Speech, and Thorsten-Voice/Coqui TTS
  • Fish Speech: https://github.com/fishaudio/fish-speech
  • Thorsten-Voice: https://github.com/thorstenMueller/Thorsten-Voice
  • Coqui TTS: https://github.com/coqui-ai/TTS
  • Uses FFmpeg for audio processing

📝 Changelog

v4.0.7 (2025-08-03) - 🐟 Fish Speech Fully Fixed & Operational

  • 🐟 Fish Speech 100% Working - Complete resolution of all Fish Speech TTS issues
  • 🔧 Fixed tokenizer.tiktoken - Proper base64 encoding of 32,000 tokens from Fish Speech
  • ⚙️ Model Configuration Fixed - Created correct firefly_gan_vq.yaml matching model architecture
  • 📐 Dimension Mismatch Resolved - Fixed 512-dim vs 1024-dim PyTorch tensor issues
  • Parameter Validation Fixed - Corrected ServeTTSRequest use_memory_cache format
  • 🎯 End-to-End Functionality - Text-to-semantic and decoder models load perfectly
  • 🚀 Full Service Availability - Fish Speech now detected as available and operational

v4.0.6 (2025-08-03) - 🧪 Comprehensive Test Coverage & TTS Fixes

  • 🧪 Major Test Coverage Improvement - 20% to 45.07% overall coverage (+125% improvement)
  • 🎯 AudiobookMaker.js Tests - 0% to 42.58% coverage with integration tests
  • 🔐 ConfigManager.js Tests - 0% to 98.03% coverage with security tests
  • 📁 FileHandler.js Tests - 0% to 72.99% coverage with core functionality tests
  • 🖥️ cli.js Tests - 0% to 75.75% coverage with end-to-end tests
  • 🐟 Fish Speech Fixed - Installation detection and availability checking
  • 🇩🇪 Thorsten Voice Fixed - Python 3.13 compatibility and installation issues
  • 📊 207 Total Tests - 195 passing with comprehensive edge case coverage
  • 🔧 Integration Tests - Real-world testing with actual TTS services and PDF processing
  • 🛡️ Robust Error Handling - Enhanced service availability validation

v4.0.5 (2025-08-03) - 🎵 Unified Preview System

  • 🎵 Unified Preview Texts - Consistent voice previews across all TTS providers
  • 🌍 Language-Specific Previews - German, English, and French preview texts
  • 💾 Smart Caching - Consistent cache filenames prevent preview regeneration
  • 🎯 Voice Language Detection - Automatic language detection from voice names
  • 🔄 Cache Optimization - Separate preview cache directories for each provider
  • ⚙️ Better Performance - No more regenerating previews when switching providers

v4.0.4 (2025-08-03) - 🛠️ Fish Speech Engine Fix

  • 🔧 Fixed TTSInferenceEngine initialization - Use proper ModelManager pattern
  • 🏗️ Implemented correct model loading - Load LLaMA and DAC models separately
  • 🎯 Auto-device detection - Support for MPS (Apple Silicon), CUDA, and CPU
  • 📦 Better model management - Use launch_thread_safe_queue for text-to-semantic
  • 🔄 Improved generation flow - Proper model initialization before inference

v4.0.3 (2025-08-03) - 🔧 Fish Speech Import Fix

  • 🔧 Fixed MODDED_DAC import - Changed to correct DAC import from inference_engine
  • Added missing torch import - Fixed undefined torch reference in generation script
  • 🛠️ Simplified dependency check - Import DAC directly from inference_engine
  • 📦 Better module verification - Check ServeTTSRequest schema availability

v4.0.2 (2025-08-03) - 🐟 Fish Speech API Update

  • 🔧 Fixed Fish Speech dependency check - Updated to use current DAC-based architecture
  • 🗑️ Removed deprecated VQGAN imports - Fish Speech now uses DAC (Descript Audio Codec)
  • Updated generation script - Uses modern TTSInferenceEngine API
  • 🔄 Better installation handling - Auto-removes incomplete installations
  • 📦 Improved pip install - Installs Fish Speech package in development mode
  • 🛠️ Enhanced error reporting - More detailed debugging information

v4.0.1 (2025-08-02) - 🔧 Installation & Compatibility Fixes

  • 🔧 Fixed Fish Speech virtual environment usage - Proper dependency checking
  • 🐍 Enhanced Python version detection - Blocks Thorsten-Voice on Python 3.13+
  • Smart installation status tracking - Avoids unnecessary re-installations
  • 📅 Installation markers - Persistent installation state with version info
  • 🔄 Better error handling - More informative error messages and recovery
  • 💡 Improved user guidance - Clear instructions for Python compatibility issues

v4.0.0 (2025-08-02) - 🌟 Major Refactoring

  • 🗑️ REMOVED: Kyutai TTS (replaced due to Python 3.13 compatibility issues)
  • 🐟 NEW: Fish Speech integration - State-of-the-art multilingual TTS
  • 🇩🇪 NEW: Thorsten-Voice integration - Native German TTS
  • 🎤 Enhanced Voice Selection: 16 total voices across 3 providers
  • 🏗️ Automated Installation: One-click setup for local TTS providers
  • 🔧 Improved Architecture: Better service abstraction and error handling
  • 📊 Enhanced Testing: 80%+ test coverage with Jest
  • 🛠️ Code Quality Tools: ESLint, Prettier, Snyk integration
  • 🔄 Backward Compatibility: 100% compatibility with existing OpenAI workflows

v3.3.0 (2025-08-01) - 🚀 Kyutai Integration (Deprecated)

  • 🆓 Kyutai TTS integration (now removed in v4.0.0)
  • 🏗️ Automated installation system
  • 🎤 15+ voice options
  • 🔄 Provider selection system

Happy listening! 🎧 Turn any text into your personal audiobook library with the best TTS technology available.