npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

charsiu-js

v0.2.0

Published

Neural phonetic forced aligner (wav2vec2) for Node and the browser — a JS port of charsiu. Aligns audio to a transcript at the phone and word level, fully client-side.

Readme

charsiu-js

Neural phonetic forced aligner for Node and the browser — a JavaScript port of charsiu. Give it audio and a transcript; get back phone- and word-level time alignments, fully client-side, no Python.

the quick brown fox …
0.00–0.10  the     DH AH
0.10–0.41  quick   K W IH K
0.41–0.76  brown   B R AW N
…

It runs a wav2vec2 frame-classification model (via onnxruntime) to get per-frame phone probabilities, converts text to phones with a ported g2p_en, and aligns the two with DTW — matching charsiu's Python output bit-for-bit (see Verification).

Status: English and Mandarin both work end-to-end. See Languages.

Install

npm install charsiu-js
# plus the onnxruntime for your runtime:
npm install onnxruntime-node    # Node
npm install onnxruntime-web     # browser bundlers
npm install tokana              # Japanese only (morphological analysis)

Written in TypeScript; ships compiled ESM + .d.ts, so it's type-safe out of the box.

What you need besides the package

The g2p data ships inside the package. You need an onnxruntime (peer dependency) — onnxruntime-node for Node, or onnxruntime-web for the browser (https://onnxruntime.ai/).

The acoustic ONNX model is not bundled (it's large), but in Node it's downloaded automatically on first use (and cached in ~/.cache/charsiu-js) from a hosted, INT8-quantized copy: https://huggingface.co/mnaoizyyy/charsiu-js-models (EN ~123 MB, ZH ~40 MB). So createNodeAligner() just works with no setup.

To use your own model, pass modelPath (local file) or modelUrl. The upstream PyTorch weights and tokenizers live at https://huggingface.co/charsiu — convert them to ONNX with the scripts (see Model setup).

Usage — Node

import { createNodeAligner } from 'charsiu-js';

// model auto-downloads from the Hugging Face Hub on first use, then caches
const aligner = await createNodeAligner();
// ...or bring your own:
//   createNodeAligner({ modelPath: './model_quantized.onnx' })
//   createNodeAligner({ modelUrl: 'https://…/model_quantized.onnx' })

// waveform: Float32Array, 16 kHz mono, samples in [-1, 1]
// (read a 16 kHz mono WAV with `loadWav16k` from 'charsiu-js/assets-node')
const { phones, words } = await aligner.align(waveform, 'the quick brown fox');
// phones: [[start, end, 'DH'], ...]   words: [[start, end, 'the'], ...]

Usage — browser

import { G2p, PhonemizerEn, ForcedAligner } from 'charsiu-js/core';
import { loadG2pAssets, loadPhoneVocab, decodeToMono16k } from 'charsiu-js/assets-web';
import * as ort from 'onnxruntime-web';

const phonemizer = new PhonemizerEn(new G2p(await loadG2pAssets('/assets/')),
                                    await loadPhoneVocab('/assets/'));
const session = await ort.InferenceSession.create('/model_quantized.onnx');
const aligner = new ForcedAligner({ session, ort, phonemizer });

const waveform = await decodeToMono16k(await file.arrayBuffer()); // any audio file
const { phones, words } = await aligner.align(waveform, transcript);

A runnable demo lives in web/ (the page imports the compiled dist/):

npm run build                         # compile src -> dist
npm run convert && npm run quantize   # produce a local model (needs the Python venv)
npm run serve                         # http://localhost:8080/web/index.html

(Or skip convert/quantize and edit MODEL_URL in web/demo.mjs to the hosted model on the Hub.)

API

  • createNodeAligner({ modelPath?, modelUrl?, cacheDir? }) → Promise<ForcedAligner> (Node) — wires onnxruntime-node + bundled assets; with no options it downloads and caches the hosted default model. createNodeAlignerZh(...) is the Mandarin equivalent.
  • new ForcedAligner({ session, ort, phonemizer, silThreshold?, resolution? }) — runtime-agnostic core.
  • aligner.align(waveform, text) → { phones, words, phoneIds } — segments are [startSec, endSec, label]; [SIL] marks silence.
  • toTextGrid([{ name, intervals }, …]) → string — Praat TextGrid (short format).
  • Lower-level building blocks are exported too: G2p, PhonemizerEn, normalize, softmaxRows, forcedAlign, seq2duration.

Input must be 16 kHz mono. In the browser, decodeToMono16k (from charsiu-js/assets-web) decodes any audio file and resamples it. In Node, loadWav16k(path) (from charsiu-js/assets-node) reads a 16 kHz mono 16-bit PCM WAV; it does not resample, so convert other sample rates / formats first.

Verification

Every stage is checked against the original Python, bit-for-bit:

npm test

| Test | Checks | |------|--------| | test:onnx | ONNX frame logits == PyTorch | | test:g2p | English g2p == g2p_en (15/15, incl. OOV, numbers, punctuation) | | test:g2pm | Mandarin g2p == g2pM (10/10, incl. polyphone disambiguation) | | test:standalone | English text→phones+words == charsiu oracle (Node) | | test:standalone-zh | Mandarin text→phones+words == charsiu oracle (Node) | | test:standalone-ja | Japanese text→phones+words == pyopenjtalk oracle (Node; skipped without the model + dict) | | test:browser | English + Mandarin alignment in headless Chrome via onnxruntime-web |

npm test also type-checks the public API against the built .d.ts as a consumer.

Languages

  • English — complete. g2p is a full port of g2p_en (CMUdict + GRU OOV predictor). Homographs (~371 words like "read") currently use their default pronunciation; the POS tagger g2p_en uses to disambiguate them isn't ported yet.
  • Mandarin (Standard Chinese) — complete. g2p is a full port of g2pM (CEDICT + a BiLSTM that disambiguates polyphonic characters, e.g. 长→cháng/zhǎng, 行→xíng/háng). Uses charsiu/zh_w2v2_tiny_fc_10ms (~40 MB quantized, 210 tonal phones).
import { createNodeAlignerZh } from 'charsiu-js';
const zh = await createNodeAlignerZh();   // model auto-downloads on first use
const { phones, words } = await zh.align(waveform, '快速的棕色狐狸');
// phones: [[0, .08,'k'], [.08,.22,'uai4'], …]   words: [[0,.22,'快'], …]
  • Japanese — works end-to-end, with a different stack from EN/ZH: morphology uses tokana (an optional peer dependency) over an IPADIC dictionary; readings are turned into OpenJTalk phonemes (g2p-ja.ts) and aligned with CTC forced alignment (align-ctc.ts, not the DTW used for EN/ZH) against prj-beatrice/japanese-hubert-base-phoneme-ctc (Apache-2.0, CTC, 20 ms, ~123 MB quantized). Vowel devoicing (OpenJTalk's U/I) is context-dependent and emitted as plain u/i — negligible for alignment.
import { createNodeAlignerJa } from 'charsiu-js';
// The model auto-downloads from the Hub on first use (like EN/ZH). You only need a
// tokana-compiled dictionary built once with `npm run setup-dict` (IPADIC; add
// `-- neologd` for a much larger lexicon — proper nouns, neologisms). Bring your
// own model with modelPath/modelUrl.
const ja = await createNodeAlignerJa({ dicPath: './models/ipadic-dict' });
const { phones, words } = await ja.align(waveform, '音声認識のテストです');
// phones: [[0,.04,'[SIL]'],[.04,.12,'o'],[.12,.28,'N'],…]   words: [[…,'音声'],…]

If you already have the reading, the phonemizer is dependency-free: kanaToPhonemes('オンセイニンシキ') (pure JS, no tokana/dictionary) gives the phone sequence directly.

Model setup (custom / your own models)

The default models are downloaded from the Hub automatically. To build your own (e.g. for another charsiu language), convert + quantize with the scripts, then pass the result via modelPath/modelUrl:

npm run convert charsiu/en_w2v2_fc_10ms    # PyTorch -> models/<name>/model.onnx
npm run quantize en_w2v2_fc_10ms           # -> model_quantized.onnx (single file)

Mandarin works in the browser too — same wiring with the zh building blocks:

import { G2pM, PhonemizerZh, ForcedAligner } from 'charsiu-js/core';
import { loadG2pmAssets, loadPhoneVocabZh, decodeToMono16k } from 'charsiu-js/assets-web';
import * as ort from 'onnxruntime-web';

const phonemizer = new PhonemizerZh(new G2pM(await loadG2pmAssets('/assets/')),
                                    await loadPhoneVocabZh('/assets/'));
const session = await ort.InferenceSession.create('/zh_model_quantized.onnx');
const aligner = new ForcedAligner({ session, ort, phonemizer });

The bundled demo (npm run serve) has an English / Mandarin / Japanese toggle. For Japanese it shows a download/load progress bar (the model and dictionary are large) and a dictionary selector — standard (IPADIC ~12 MB) or extended (NEologd ~190 MB, built with npm run setup-dict -- neologd).

How it works / internals

See FINDINGS.md for the full design notes: model architecture (Wav2Vec2ForFrameClassificationWav2Vec2ForCTC), the DTW setup (step_sizes [[1,1],[1,0]], silence handling), and the g2p port.

Development

Source is TypeScript in src/; npm run build compiles it to dist/ (ESM + .d.ts). npm test builds, runs the Python-parity tests (ONNX/g2p/alignment in Node and in headless Chrome), and type-checks the public API as a consumer would. npm run typecheck checks types without emitting.

The Python scripts in scripts/ (model conversion, quantization, asset export, oracle generation) need a virtualenv with the ML deps:

python -m venv .venv
.venv/bin/pip install torch transformers optimum onnx onnxruntime onnxscript \
  huggingface_hub numpy soundfile librosa nltk g2p_en g2pM praatio

The g2p assets in assets/ are generated from these (npm run export-g2p); the oracle JSONs the JS tests compare against are produced by scripts/*_oracle.py.

For Japanese, also build the dictionary with npm run setup-dict (downloads and compiles mecab-ipadic with tokana; add -- neologd for the larger NEologd lexicon). The Japanese g2p oracle uses pyopenjtalk-plus (pip install pyopenjtalk-plus).

License

MIT — see LICENSE. Bundled data/models retain their own licenses; see ATTRIBUTION.md (charsiu: MIT, g2p_en: Apache-2.0, CMUdict: BSD-style).