@keanu-thakalath/openjtalkjs

v0.1.0

Published

a month ago

Node.js TypeScript bindings for Open JTalk (pyopenjtalk-style API)

Downloads

0High
0Medium
0Low

keanu-thakalath

openjtalk japanese tts text-to-speech node-addon-api napi speech-synthesis

openjtalkjs

TypeScript/Node.js bindings for Open JTalk with a pyopenjtalk-style API.

Status

This repository includes:

Native N-API integration of Open JTalk + HTS Engine.
Typed TS API for g2p, runFrontend, extractFullContext, and synthesize (sync/async).
Browser runtime with WebAssembly + Worker entrypoint.
Asset bootstrap for dictionary + default voice.
Golden parity tests against pinned pyopenjtalk fixtures.
CLI demo executables for each API surface.

Install

git clone --recurse-submodules <your-repo-url>
cd openjtalkjs
npm install
npm run build
npm test

npm install runs a postinstall step that downloads dictionary/voice assets into assets/.

API (Node)

import {
  configure,
  g2p, g2pAsync,
  runFrontend, runFrontendAsync,
  extractFullContext, extractFullContextAsync,
  synthesize, synthesizeAsync,
} from "openjtalkjs";

configure({
  dicPath: "assets/dic",
  voicePath: "assets/voice.htsvoice",
});

// Grapheme-to-phoneme
const phonemes = g2p("こんにちは", { kana: false }); // "k o N n i ch i w a"
const kana    = g2p("こんにちは", { kana: true });   // "コンニチワ"

// Per-word NJD features (pitch accent, reading, POS, …)
const nodes = runFrontend("こんにちは");
// [{ string: "こんにちは", pron: "コンニチワ", acc: 0, mora_size: 5, chain_flag: -1, … }]

// Full-context HTS labels
const labels = extractFullContext("こんにちは");

// Speech synthesis — returns PCM in int16 range as Float32Array
const wav = synthesize("こんにちは");
// wav.pcm        Float32Array
// wav.sampleRate number (48000)

All functions have an Async variant (g2pAsync, runFrontendAsync, …) that returns a Promise.

Note: synthesize() returns PCM-scaled samples (int16 range in a Float32Array), not normalized [-1, 1] floats.

`NJDNode` fields

runFrontend returns one NJDNode per word. Fields match pyopenjtalk exactly:

| Field | Type | Description | |---|---|---| | string | string | Surface form | | pos | string | Part of speech | | pos_group1/2/3 | string | POS sub-classifications | | ctype | string | Conjugation type | | cform | string | Conjugation form | | orig | string | Dictionary form | | read | string | Reading (katakana) | | pron | string | Pronunciation (katakana) | | acc | number | Accent nucleus position within the accentual phrase | | mora_size | number | Mora count | | chain_rule | string | Chain rule applied | | chain_flag | number | -1 = phrase head (sentence start), 0 = phrase head, 1 = chained to previous |

acc is scoped to the accentual phrase, not the individual word. When chain_flag=1, the word joins the previous word's phrase and acc on the phrase head counts mora across the whole chain. acc=0 is heiban (no drop).

API (Browser)

Browser runtime uses a Web Worker. Sync APIs are unavailable; use the Async variants:

import {
  configure,
  g2pAsync,
  runFrontendAsync,
  extractFullContextAsync,
  synthesizeAsync,
} from "openjtalkjs/browser";

await configure({
  dicUrl: "/assets/dic",
  voiceUrl: "/assets/voice.htsvoice",
});

const phonemes = await g2pAsync("こんにちは");
const nodes    = await runFrontendAsync("こんにちは");
const labels   = await extractFullContextAsync("こんにちは");
const audio    = await synthesizeAsync("こんにちは");

dicUrl must point to a directory containing the required dictionary files.

Browser Build

npm run build:wasm
npm run build:ts

Or run the end-to-end browser pipeline:

npm run build:browser

See BROWSER.md for full browser setup details.

Environment Overrides (Node)

OPENJTALKJS_DIC_PATH
OPENJTALKJS_VOICE_PATH

Demos

npm run demo           # full pipeline → WAV file
npm run demo:g2p       # g2p sync + async
npm run demo:labels    # full-context labels
npm run demo:frontend  # runFrontend — pitch accent table per word
npm run demo:synth     # synthesis → WAV files
npm run demo:all       # all of the above

Browser demo:

npm --prefix demo install
npm --prefix demo run dev

Parity Workflow

npm run parity:venv
npm run parity:install
npm run parity:generate
npm run parity:check
npm run test:parity

pyopenjtalk API coverage

| pyopenjtalk | openjtalkjs | Notes | |---|---|---| | g2p(text, kana, join) | g2p / g2pAsync | join=False (list) not supported | | run_frontend(text) | runFrontend / runFrontendAsync | ✅ full parity | | extract_fullcontext(text) | extractFullContext / extractFullContextAsync | ✅ | | tts(text, speed, half_tone) | synthesize / synthesizeAsync | half_tone not supported | | synthesize(labels, …) | — | labels-based synthesis not yet exposed | | make_label(njd_features) | — | NJD → labels conversion not yet exposed | | mecab_dict_index / update_global_jtalk_with_user_dict | — | user dictionary not yet supported | | estimate_accent (marine) | — | neural accent estimation out of scope |

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme