npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

voice-assistant-widget

v3.2.4

Published

Embeddable voice assistant widget for web applications

Downloads

96

Readme

Fonada Voice Assistant

A complete voice assistant pipeline integrating:

  • Custom ASR (Automatic Speech Recognition)
  • Custom Turn detection with ReplyOnPause handler
  • LLM for conversational responses
  • Custom Fonada TTS for high-quality voice synthesis

Table of Contents

Documentation

Detailed documentation is available in the docs/ folder:

Prerequisites

  • Python 3.8+
  • 4 CUDA-capable GPU
  • 50 GB+ disk space
  • Microphone and speakers

Setup

Building lmdeploy from source (Optional)

If you want to build lmdeploy from source instead of using the pre-built version:

pip install pybind11
sudo apt install cmake openmpi-bin libopenmpi-dev ninja-build

cd ~/lmdeploy

# Manual CMake build
mkdir -p build

# 120a-real is for RTX 5090
cmake -B build \
  -DGGML_CUDA=ON \
  -DCMAKE_CUDA_ARCHITECTURES="120a-real" \
  -Dpybind11_DIR=$(python3 -m pybind11 --cmakedir) \
  -GNinja

# Build with verbose output
cd build && ninja

# Install the extension to final position and set RPATH
ninja install

# If successful, install the Python package
cd ..
pip install -e .

Installation

  1. Install the required dependencies: Install NeMo from github.
pip install -r requirements.txt
  1. Run LLM server

lmdeploy serve api_server hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 --server-port 23333 --quant-policy 4

or

export CUDA_VISIBLE_DEVICES=2
lmdeploy serve api_server sarvamai/sarvam-m \
  --server-port 8000 \
  --tp 1 \
  --backend turbomind \
  --quant-policy 4 \
  --cache-max-entry-count 0.9
  1. Run TTS server from models/ folder
export CUDA_VISIBLE_DEVICES=1
lmdeploy serve api_server tts_hindi --server-port 23334 --quant-policy 4

Alternative: Docker deployment for RTX 5090

First, pull the Docker image:

docker pull lmsysorg/sglang:blackwell

Then run the TTS server:

docker run --gpus all -d --restart unless-stopped \
    -p 23334:23334 \
    --name sglang_blackwell \
    -v /home/fonada/voice_assistant/models/tts_hindi:/model \
    lmsysorg/sglang:blackwell \
    python3 -m sglang.launch_server --model-path /model --host 0.0.0.0 --port 23334

Running the Voice Assistant

Run the assistant with:

export LD_LIBRARY_PATH=/workspace/TensorRT-10.10.0.31/lib:$LD_LIBRARY_PATH
export OPENAI_API_ASR_KEY=
export SARVAM_API_KEY=
export DEEPGRAM_API_KEY=
export OPENAI_API_LLM_KEY=
export GROQ_API_LLM_KEY=
export GEMINI_API_KEY=
python app.py

Change the path accordingly to your TensorRT path and API key. This will start a web server and open a browser interface where you can interact with the voice assistant.

Usage

  1. Click the microphone button to start speaking
  2. The assistant will automatically detect when you've finished speaking
  3. It will transcribe your speech, generate a response with LLama 3.2, and speak the response using Fonada TTS
  4. You can interrupt the assistant by speaking while it's responding

Customization

Voice Selection

To change the voice used by Fonada TTS, modify the options dictionary in the text_to_speech_sync method:

options = {"voice_id": "Ananya"}  # Change to your preferred voice

Available voices: "Rahul", "Vikram", "Arjun", "Dev", "Sanjay", "Jaya", "Meera", "Priya", "Ananya", "Divya"

System Prompt

To change how the LLM responds, customize the system prompt when initializing the VoiceAssistant:

assistant = VoiceAssistant(
    llm_model_path=llm_model_path,
    tts_model_path=tts_model_path,
    system_prompt="You are a helpful voice assistant. Keep your responses short and friendly."
)

Turn Detection Sensitivity

Adjust the turn detection parameters in the create_voice_assistant_stream() function to change how the assistant detects when you've finished speaking:

algo_options=AlgoOptions(
    audio_chunk_duration=0.5,  # Duration of audio chunks
    started_talking_threshold=0.2,  # Threshold to detect start of speech
    speech_threshold=0.1  # General speech detection threshold
)

Integration with FastAPI

To integrate the voice assistant with a FastAPI app:

from fastapi import FastAPI
from voice_assistant.app import create_voice_assistant_stream

app = FastAPI()
stream = create_voice_assistant_stream()
stream.mount(app)

Troubleshooting

Issue: Models fail to load Solution: Verify the correct paths to your model files and ensure they're accessible.

Issue: Speech recognition is inaccurate Solution: Try speaking clearly and ensure your microphone is properly configured.

Issue: High latency in responses Solution: Consider using a more powerful GPU or reducing the model parameters.

Issue: High latency in WebSocket audio processing (1-2+ second delays) Solution: Audio Chunk Size Optimization

The voice assistant uses VAD (Voice Activity Detection) that requires specific chunk sizes for optimal performance. Mismatched chunk sizes between client and server cause significant accumulation delays.

Root Cause:

  • Server VAD requires chunks of VAD_CHUNK_SIZE_SEC * REQUIRED_SAMPLE_RATE samples
  • Default: 0.64 * 16000 = 10,240 samples (640ms chunks)
  • If client sends smaller chunks (e.g., 2048 samples = 128ms), server must accumulate 5+ chunks before processing
  • This causes up to 640ms delay per processing stage

Client-Side Optimization:

// ❌ Small chunks cause accumulation delays
const bufferSize = 2048; // 128ms chunks → high latency

// ❌ Invalid: Not a power of 2 (Web Audio API requirement)
// const bufferSize = 10240; 

// ✅ Valid power of 2, significant latency reduction
const bufferSize = 8192; // 512ms chunks → low latency

// ✅ Alternative: Eliminates all accumulation delays  
// const bufferSize = 16384; // 1024ms chunks → minimal latency

Note: Web Audio API requires buffer sizes to be powers of 2 between 256-16384.

WebSocket Transmission Optimization:

# Send larger chunks aligned with VAD processing
chunk_size = 8192  # Valid power of 2, matches client buffer size (512ms at 16kHz)

Configuration Tuning:

# config.yaml - Optimize for 8192-sample client buffers
VAD_CHUNK_SIZE_SEC: 0.512  # 8192 samples at 16kHz (matches client buffers)
NUM_CONSECUTIVE_NON_SPEECH_CHUNKS_TO_END_SEGMENT: 1  # Improve segment detection

# Alternative: Even lower latency with smaller chunks
# VAD_CHUNK_SIZE_SEC: 0.32  # 5120 samples (requires 1.6 client chunks)

Expected Impact:

  • Before optimization: 2+ seconds end-to-end latency
  • After optimization: ~1 second end-to-end latency
  • Latency reduction: Up to 1 second improvement

Performance Testing: Use the included concurrency test to measure improvements:

python test/test_concurrency.py --max_concurrent 10 --direct

Note: For telephony-specific optimizations and Asterisk AudioSocket integration, see the 📞 Telephony Integration Guide.

Sharing Conversation Recordings Across Machines

The conversation_recordings folder can be shared across different machines using NFS (Network File System), which is the recommended approach for Linux environments.

NFS Setup

On the Source Server (Sharing the folder):

  1. Install NFS server:
# Ubuntu/Debian
sudo apt install nfs-kernel-server

# CentOS/RHEL
sudo yum install nfs-utils
  1. Create and configure the shared directory:
# Navigate to your voice assistant directory
cd /home/fonada/voice_assistant

# Set proper permissions for the conversation_recordings folder
sudo chown nobody:nogroup conversation_recordings
sudo chmod 755 conversation_recordings
  1. Configure NFS exports:
# Edit the exports file
sudo nano /etc/exports

# Add this line (replace 192.168.1.100 with your target server's IP):
/home/fonada/voice_assistant/conversation_recordings 192.168.1.100(rw,sync,no_subtree_check,no_root_squash)
  1. Apply changes and restart NFS:
sudo exportfs -ra
sudo systemctl restart nfs-kernel-server

On the Target Server (Mounting the folder):

  1. Install NFS client:
# Ubuntu/Debian
sudo apt install nfs-common

# CentOS/RHEL
sudo yum install nfs-utils
  1. Create mount point and mount the folder:
# Create a local mount point
sudo mkdir -p /home/fonada/voice_assistant/conversation_recordings

# Mount the remote folder (replace 192.168.1.50 with source server's IP)
sudo mount -t nfs 192.168.1.50:/home/fonada/voice_assistant/conversation_recordings /home/fonada/voice_assistant/conversation_recordings
  1. For permanent mounting, add to /etc/fstab:
echo "192.168.1.50:/home/fonada/voice_assistant/conversation_recordings /home/fonada/voice_assistant/conversation_recordings nfs defaults 0 0" | sudo tee -a /etc/fstab

Usage Notes

  • Replace IP addresses (192.168.1.50, 192.168.1.100) with your actual server IPs
  • The conversation recordings will be automatically shared and synchronized across all mounted machines
  • Ensure proper firewall configuration to allow NFS traffic (port 2049)
  • For multiple target servers, add additional lines to /etc/exports on the source server

Troubleshooting NFS Permission Issues

If you encounter permission errors when trying to save conversation recordings on the target machine:

Error Example:

PermissionError: [Errno 13] Permission denied: 'conversation_recordings/...'

Root Cause: NFS preserves original user IDs (UIDs) from the source server. If the fonada user has different UIDs on source and target machines, permission conflicts occur.

Solution 1: Configure NFS Export with UID Mapping (Recommended for Multiple Machines)

# On the source server, edit /etc/exports:
sudo nano /etc/exports

# Update the export line to include all_squash and UID mapping:
/home/fonada/voice_assistant/conversation_recordings TARGET_IP(rw,sync,no_subtree_check,all_squash,anonuid=1000,anongid=1000)

# Apply changes:
sudo exportfs -ra
sudo systemctl restart nfs-kernel-server

Solution 2: Fix Ownership on Source Server (Single Target Machine Only)

# WARNING: This approach only works if all machines have the same UID for fonada user
# On the source server, change ownership to match target machine's fonada user UID
# First check the target machine's fonada UID: id fonada
# Then on source server:
sudo chown -R TARGET_UID:TARGET_UID /home/fonada/voice_assistant/conversation_recordings

# Example: If target machine fonada user is UID 1000:
sudo chown -R 1000:1000 /home/fonada/voice_assistant/conversation_recordings

Note: Solution 2 will break access for other machines with different UIDs. Use Solution 1 for multiple machines.

Verification: After applying either solution, test write permissions:

# On target machine:
cd /home/fonada/voice_assistant/conversation_recordings
mkdir test_write_permission
# Should succeed without permission errors

License

This project uses the same license as the Fonada TTS system.

Voice Assistant Monitoring

This document describes how to set up monitoring for the Voice Assistant application. There are two options available:

Option 1: Streamlit Dashboard (Lightweight)

A lightweight, real-time monitoring dashboard built with Streamlit.

Installation

  1. Install required packages:
pip install streamlit pandas plotly
  1. Run the monitoring dashboard:
streamlit run monitor.py

The dashboard will be available at http://localhost:8501 and includes:

  • Real-time log viewing
  • Request timeline visualization
  • Log level distribution
  • Filtering by request ID and log level
  • Auto-refresh functionality

Option 2: Graylog (Enterprise-grade)

A more comprehensive logging and monitoring solution.

Installation

  1. Install Graylog prerequisites (MongoDB and Elasticsearch):
sudo apt-get install mongodb-org elasticsearch
  1. Download and install Graylog:
wget https://packages.graylog2.org/repo/packages/graylog-4.0-repository_latest.deb
sudo dpkg -i graylog-4.0-repository_latest.deb
sudo apt-get update
sudo apt-get install graylog-server

Features

Streamlit Dashboard

  • Real-time log viewing
  • Interactive visualizations
  • Request timeline
  • Log level distribution
  • Filter by request ID and log level
  • Auto-refresh capability
  • Lightweight and easy to set up

Graylog

  • Enterprise-grade log management
  • Advanced search capabilities
  • Custom dashboards
  • Alerts and notifications
  • Log retention policies
  • Role-based access control

Usage

  1. Start your voice assistant application:
python app.py
  1. Choose your preferred monitoring solution:

For Streamlit dashboard:

streamlit run monitor.py

For Graylog:

  • Access the Graylog web interface at http://your-server:9000
  • Default credentials: admin/admin (change on first login)

Monitoring Metrics

The monitoring solutions track:

  • Total number of requests
  • Active requests (last 5 minutes)
  • Error rates
  • Log levels distribution
  • Request timelines
  • Detailed log messages

Troubleshooting

If you encounter issues:

  1. Streamlit Dashboard:
  • Ensure the log file exists and is readable
  • Check if required packages are installed
  • Verify the correct Python version
  1. Graylog:
  • Verify MongoDB and Elasticsearch are running
  • Check Graylog service status
  • Review system logs for errors

TTS Text Normalizer for Indian Context

A comprehensive Python script to normalize text for Text-to-Speech (TTS) training, specifically designed for Indian languages and contexts.

Features

📞 Phone Number Normalization

Converts phone numbers in various formats to spoken digit sequences:

  • +919876543210plus nine one nine eight seven six five four three two one zero
  • 919-876-543-211nine one nine eight seven six five four three two one one
  • 9876543213nine eight seven six five four three two one three

💰 Currency Normalization

Converts Indian currency amounts to spoken form using Indian numbering system:

  • ₹8,500rupees eight thousand five hundred
  • ₹2.5 lakhrupees two point five lakh
  • ₹45 crorerupees forty five crore
  • Rs. 1,00,000rupees one lakh

🌡️ Temperature Normalization

Converts temperature readings to spoken form:

  • 25°Ctwenty five degrees celsius
  • 98.6°Fninety eight point six degrees fahrenheit

📊 Percentage Normalization

Converts percentages to spoken form:

  • 85%eighty five percent
  • 12.5%twelve point five percent

⏰ Time Normalization

Converts time formats to spoken form:

  • 8:00 AMeight o'clock AM
  • 2:30 PMtwo thirty PM
  • 14:30fourteen thirty

📅 Date Normalization

Converts ordinal dates to spoken form:

  • 31st Marchthirty first March
  • 1st Aprilfirst April

📧 Email & URL Normalization

Converts digital addresses to spoken form:

🔢 Number Normalization

Converts numbers using Indian numbering system:

  • 1,00,000one lakh
  • 50,000fifty thousand
  • 25twenty five

📏 Measurement Units

Converts measurement units to spoken form:

  • 500 GBfive hundred gigabytes
  • 2.5 kmtwo point five kilometers
  • 100 Mbpsone hundred megabits per second

🔤 Abbreviations

Abbreviations are kept as-is (not expanded) to maintain natural pronunciation:

  • GSTGST (unchanged)
  • EMIEMI (unchanged)
  • SBISBI (unchanged)

Usage

Basic Usage

from tts_text_normalizer import TTSTextNormalizer

# Initialize normalizer
normalizer = TTSTextNormalizer()

# Normalize a single sentence
text = "Rajesh Kumar का mobile number +919876543210 है। Amount ₹12,500 pay करना है।"
normalized = normalizer.normalize_text(text)
print(normalized)
# Output: Rajesh Kumar का mobile number plus nine one nine eight seven six five four three two one zero है। Amount rupees twelve thousand five hundred pay करना है।

Batch Processing

sentences = [
    "Contact number +919876543210 है।",
    "Amount ₹12,500 pay करना है।",
    "Meeting 2:30 PM scheduled है।"
]

normalized_batch = normalizer.batch_normalize(sentences)
for original, normalized in zip(sentences, normalized_batch):
    print(f"Original: {original}")
    print(f"Normalized: {normalized}")
    print()

File Processing

# Process a file containing TTS training sentences
normalizer.save_normalized_text('input.txt', 'normalized_output.txt')

Individual Component Testing

# Test specific normalizations
phone = normalizer.normalize_phone_number("+919876543210")
currency = normalizer.normalize_currency("₹8,500")
temp = normalizer.normalize_temperature("25°C")
percentage = normalizer.normalize_percentage("85%")

Installation

No external dependencies required! Uses only Python standard library:

# Clone or download the files
# Run directly with Python 3.6+
python3 tts_text_normalizer.py

Example Script

Run the example to see all features in action:

python3 example_usage.py

Or test specific components:

python3 debug_test.py

Indian Context Features

Indian Numbering System

  • Supports lakh (1,00,000) and crore (1,00,00,000) properly
  • Handles Indian comma formatting (1,23,456)

Currency Formats

  • Indian Rupee symbol (₹)
  • Common Indian currency expressions
  • Decimal handling with paise

Phone Number Formats

  • Indian country code (+91)
  • Various formatting styles commonly used in India
  • Mobile number patterns (10 digits)

Regional Considerations

  • Preserves Hindi/Indian language text as-is
  • Maintains natural code-switching patterns
  • Handles common Indian abbreviations

Supported Input Formats

Phone Numbers

  • +919876543210
  • 919876543210
  • +91-9876543210
  • 919-876-543-210
  • 9876543210

Currency

  • ₹8,500
  • ₹2.5 lakh
  • ₹45 crore
  • Rs. 1,000
  • INR 50,000

Temperature

  • 25°C
  • 98.6°F

Time

  • 8:00 AM
  • 2:30 PM
  • 14:30

Dates

  • 31st March
  • 1st April
  • 25th December

Output Examples

Input:  "Dr. Suresh Gupta cardiologist हैं। Emergency contact +919876543210 है। Consultation fees ₹1,500 है।"
Output: "Dr. Suresh Gupta cardiologist हैं। Emergency contact plus nine one nine eight seven six five four three two one zero है। Consultation fees rupees one thousand five hundred है।"

Input:  "Property value ₹2.5 crore है। Registration 31st March तक करना है।"
Output: "Property value rupees two point five crore है। Registration thirty first March तक करना है।"

Input:  "Meeting 8:00 AM scheduled है। Success rate 95% है।"
Output: "Meeting eight o'clock AM scheduled है। Success rate ninety five percent है।"

File Structure

├── tts_text_normalizer.py    # Main normalizer class
├── example_usage.py          # Comprehensive examples
├── debug_test.py            # Debug and testing script
└── README.md               # This documentation

Customization

You can easily customize the normalizer by:

  1. Adding new abbreviations: Modify the abbreviations dictionary
  2. Changing number words: Update the ones, tens, and indian_units lists
  3. Adding new patterns: Extend the regex patterns in individual functions
  4. Custom units: Add new measurement units to the units dictionary

Error Handling

The normalizer includes robust error handling:

  • Invalid numbers fall back to original text
  • Malformed patterns are preserved as-is
  • File processing continues even if individual lines fail

Performance

  • Lightweight: Uses only Python standard library
  • Fast: Regex-based pattern matching
  • Memory efficient: Processes text line by line for files
  • Scalable: Handles large files through streaming

Perfect for TTS training data preparation with authentic Indian context and multilingual support!