@keanu-thakalath/openjtalkjs
v0.1.0
Published
Node.js TypeScript bindings for Open JTalk (pyopenjtalk-style API)
Downloads
18
Maintainers
Readme
openjtalkjs
TypeScript/Node.js bindings for Open JTalk with a pyopenjtalk-style API.
Status
This repository includes:
- Native N-API integration of Open JTalk + HTS Engine.
- Typed TS API for
g2p,runFrontend,extractFullContext, andsynthesize(sync/async). - Browser runtime with WebAssembly + Worker entrypoint.
- Asset bootstrap for dictionary + default voice.
- Golden parity tests against pinned
pyopenjtalkfixtures. - CLI demo executables for each API surface.
Install
git clone --recurse-submodules <your-repo-url>
cd openjtalkjs
npm install
npm run build
npm testnpm install runs a postinstall step that downloads dictionary/voice assets into assets/.
API (Node)
import {
configure,
g2p, g2pAsync,
runFrontend, runFrontendAsync,
extractFullContext, extractFullContextAsync,
synthesize, synthesizeAsync,
} from "openjtalkjs";
configure({
dicPath: "assets/dic",
voicePath: "assets/voice.htsvoice",
});
// Grapheme-to-phoneme
const phonemes = g2p("こんにちは", { kana: false }); // "k o N n i ch i w a"
const kana = g2p("こんにちは", { kana: true }); // "コンニチワ"
// Per-word NJD features (pitch accent, reading, POS, …)
const nodes = runFrontend("こんにちは");
// [{ string: "こんにちは", pron: "コンニチワ", acc: 0, mora_size: 5, chain_flag: -1, … }]
// Full-context HTS labels
const labels = extractFullContext("こんにちは");
// Speech synthesis — returns PCM in int16 range as Float32Array
const wav = synthesize("こんにちは");
// wav.pcm Float32Array
// wav.sampleRate number (48000)All functions have an Async variant (g2pAsync, runFrontendAsync, …) that returns a Promise.
Note: synthesize() returns PCM-scaled samples (int16 range in a Float32Array), not normalized [-1, 1] floats.
NJDNode fields
runFrontend returns one NJDNode per word. Fields match pyopenjtalk exactly:
| Field | Type | Description |
|---|---|---|
| string | string | Surface form |
| pos | string | Part of speech |
| pos_group1/2/3 | string | POS sub-classifications |
| ctype | string | Conjugation type |
| cform | string | Conjugation form |
| orig | string | Dictionary form |
| read | string | Reading (katakana) |
| pron | string | Pronunciation (katakana) |
| acc | number | Accent nucleus position within the accentual phrase |
| mora_size | number | Mora count |
| chain_rule | string | Chain rule applied |
| chain_flag | number | -1 = phrase head (sentence start), 0 = phrase head, 1 = chained to previous |
acc is scoped to the accentual phrase, not the individual word. When chain_flag=1, the word joins the previous word's phrase and acc on the phrase head counts mora across the whole chain. acc=0 is heiban (no drop).
API (Browser)
Browser runtime uses a Web Worker. Sync APIs are unavailable; use the Async variants:
import {
configure,
g2pAsync,
runFrontendAsync,
extractFullContextAsync,
synthesizeAsync,
} from "openjtalkjs/browser";
await configure({
dicUrl: "/assets/dic",
voiceUrl: "/assets/voice.htsvoice",
});
const phonemes = await g2pAsync("こんにちは");
const nodes = await runFrontendAsync("こんにちは");
const labels = await extractFullContextAsync("こんにちは");
const audio = await synthesizeAsync("こんにちは");dicUrl must point to a directory containing the required dictionary files.
Browser Build
npm run build:wasm
npm run build:tsOr run the end-to-end browser pipeline:
npm run build:browserSee BROWSER.md for full browser setup details.
Environment Overrides (Node)
OPENJTALKJS_DIC_PATHOPENJTALKJS_VOICE_PATH
Demos
npm run demo # full pipeline → WAV file
npm run demo:g2p # g2p sync + async
npm run demo:labels # full-context labels
npm run demo:frontend # runFrontend — pitch accent table per word
npm run demo:synth # synthesis → WAV files
npm run demo:all # all of the aboveBrowser demo:
npm --prefix demo install
npm --prefix demo run devParity Workflow
npm run parity:venv
npm run parity:install
npm run parity:generate
npm run parity:check
npm run test:paritypyopenjtalk API coverage
| pyopenjtalk | openjtalkjs | Notes |
|---|---|---|
| g2p(text, kana, join) | g2p / g2pAsync | join=False (list) not supported |
| run_frontend(text) | runFrontend / runFrontendAsync | ✅ full parity |
| extract_fullcontext(text) | extractFullContext / extractFullContextAsync | ✅ |
| tts(text, speed, half_tone) | synthesize / synthesizeAsync | half_tone not supported |
| synthesize(labels, …) | — | labels-based synthesis not yet exposed |
| make_label(njd_features) | — | NJD → labels conversion not yet exposed |
| mecab_dict_index / update_global_jtalk_with_user_dict | — | user dictionary not yet supported |
| estimate_accent (marine) | — | neural accent estimation out of scope |
