npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

phonemize

v1.1.0

Published

Fast phonemizer with rule-based G2P prediction. Pure JavaScript implementation.

Readme

Phonemize

CI codecov npm version License: MIT Node.js Version

Fast phonemizer with rule-based G2P (Grapheme-to-Phoneme) prediction. Pure JavaScript implementation with no native dependencies.

Inspired by ttstokenizer

Features

  • Lightning fast - Pure rule-based processing, no ML overhead
  • 🎯 Intelligent compound word support - Automatic decomposition of complex words
  • 📚 Comprehensive dictionary - 125,000+ word pronunciations
  • 🧠 Smart rule-based G2P - Advanced phonetic rules for unknown words
  • 🌍 Multiple formats - IPA, ARPABET, and Zhuyin output
  • 🌐 Modular multilingual support - G2P models are modularize load
  • 💻 Pure JavaScript - No native dependencies, works everywhere
  • 🔧 Simple API - Easy to integrate and use

Installation

npm install phonemize

Quick Start

import { phonemize, toIPA, toARPABET } from 'phonemize'

// Default IPA output
console.log(phonemize('Hello world!'))
// Output: həˈɫoʊ ˈwɝɫd!

// ARPABET format
console.log(toARPABET('Hello world!'))
// Output: HH AX EL1 OW W1 ER EL D!

Presets

For different language support needs, you can use preset modules:

// Default: English only
import { phonemize } from 'phonemize'

// Chinese + English
import { phonemize } from 'phonemize/zh'

// All languages (English + Chinese + Japanese + Korean + Russian)
import { phonemize } from 'phonemize/all'

// Clean

API Reference

Basic Functions

phonemize(text, options?)

Convert text to phonemes.

phonemize('Hello world!')                    // IPA string
phonemize('Hello world!', { returnArray: true })  // IPA array

Options:

  • returnArray (boolean): Return array instead of string
  • format ('ipa' | 'arpabet'): Output format
  • stripStress (boolean): Remove stress markers
  • separator (string): Phoneme separator (default: ' ')
  • anyAscii (boolean): Enable multilingual support via anyAscii transliteration

toIPA(text, options?)

Convert text to IPA phonemes.

toIPA('Hello world!')  // "həˈɫoʊ ˈwɝɫd!"

toARPABET(text, options?)

Convert text to ARPABET phonemes.

toARPABET('Hello world!')  // "HH AX L OW1 W ER1 L D!"

toZhuyin(text, options?)

Convert text to Zhuyin (Bopomofo / 注音) format.

This function is specifically designed for Chinese text. Non-Chinese text will be phonemized to IPA as a fallback.

Note: The output format is Zhuyin + tone number (e.g., ㄓㄨㄥ1 ㄨㄣ2), which is optimized for Kokoro.

import { toZhuyin } from 'phonemize';

toZhuyin('中文'); // "ㄓㄨㄥ1 ㄨㄣ2"
toZhuyin('你好世界'); // "ㄋㄧ3 ㄏㄠ3 ㄕ4 ㄐㄧㄝ4"
toZhuyin('中文 and English'); // "ㄓㄨㄥ1 ㄨㄣ2 ænd ˈɪŋɡlɪʃ"

useG2P(processor)

Register a G2P processor for multilingual support.

import { useG2P } from 'phonemize'
import ChineseG2P from 'phonemize/zh-g2p'
import JapaneseG2P from 'phonemize/ja-g2p'

// Register G2P processors
useG2P(new ChineseG2P())
useG2P(new JapaneseG2P())

// Now phonemize can handle Chinese and Japanese text
phonemize('你好')  // → ni˧˥ xɑʊ˨˩˦
phonemize('こんにちは', { anyAscii: true })  // → konnitɕiwa

Custom Pronunciations

import { addPronunciation } from 'phonemize'

// Add custom word pronunciation
addPronunciation('myword', 'ˈmaɪwərd') // Can be IPA or ARPABET
console.log(phonemize('myword'))  // "ˈmaɪwərd"

Advanced Tokenization

import { Tokenizer, createTokenizer } from 'phonemize'

// Create custom tokenizer
const tokenizer = createTokenizer({
  format: 'ipa',
  stripStress: true,
  separator: '-'
})

// Tokenize with detailed info
const tokens = tokenizer.tokenizeToTokens('Hello world!')
// [
//   { phoneme: "həɫoʊ", word: "Hello", position: 0 },
//   { phoneme: "wɝɫd", word: "world", position: 6 }
// ]

Text Processing Features

Number Expansion

Numbers are automatically converted to words:

phonemize('I have 123 apples')
// "ˈaɪ ˈhæv ˈwən ˈhəndɝd ˈtwɛni ˈθɹi ˈæpəɫz"

Abbreviation Expansion

Common abbreviations are expanded:

phonemize('Dr. Smith and Mr. Johnson')
// "ˈdɑktɝ ˈsmɪθ ˈænd ˈmɪstɝ ˈdʒɑnsən"

Currency and Dates

Special handling for currency and dates:

phonemize('15 dollars in 2023')
// "ˈfɪfˈtin ˈdɑɫɝz ˈɪn ˈtwɛni ˈtwɛni ˈθɹi"

Performance

  • Dictionary lookup: O(1) - Instant for known words
  • Rule-based processing: Extremely fast, no model loading
  • Compound decomposition: Efficient balanced search algorithm
  • Memory efficient: Compressed JSON dictionaries only
  • Zero startup time: No model initialization required

Typical performance: >10000 words/second on modern hardware.

Processing Pipeline

  1. Language Detection - Detect language before anyAscii conversion (if enabled)
  2. anyAscii Transliteration - Convert non-Latin scripts to ASCII (if enabled)
  3. Dictionary Lookup - Check for exact word match
  4. Multilingual Processing - Handle Chinese, Japanese, Korean, etc.
  5. Compound Detection - Intelligent decomposition of compound words
  6. Multi-Compound Handling - Special processing for very long compounds
  7. Rule-Based G2P - Apply phonetic rules for unknown words

Note: The rule based G2P is LLM generated, may error generate. Best practice is use custom pronunciation for unknown words.

Supported Phoneme Sets

IPA (International Phonetic Alphabet)

Standard IPA symbols for English phonemes with stress marks.

ARPABET

CMU ARPABET phoneme set with stress numbers (0,1,2).

Building from Source

# Install dependencies
yarn

# Compile TypeScript and dictionaries
yarn build

# Run tests
yarn test

License

MIT