g2pk-nodejs
v0.1.0
Published
Node.js port of g2pK for Korean grapheme-to-phoneme conversion.
Maintainers
Readme
g2pK for Node.js
Node.js port of g2pK, a Korean grapheme-to-phoneme library for converting written Korean into pronunciation-oriented output.
What is included
- Pure JavaScript Hangul decomposition, composition, and pronunciation rule processing
- Number spelling rules
- Idiom and pronunciation rule tables reused from the original project data files
- Optional English-to-Hangul conversion when a CMU pronunciation dictionary is available
- Optional morphology-aware annotation when you provide a POS analyzer
- Small CLI for quick local use
Install
If you are using this repo directly:
npm installIf you publish it to npm and want to use it in another project:
npm install g2pk-nodejsIf you also want English word conversion, install a CMU dictionary package in the consuming project:
npm install g2pk-nodejs cmu-pronouncing-dictionaryQuick usage
const { G2p } = require("g2pk-nodejs");
const g2p = new G2p();
console.log(g2p.convert("어제는 날씨가 맑았는데, 오늘은 흐리다."));
console.log(g2p.convert("그 사람은 좀, old school 같아"));
console.log(g2p.convert("저는 예전에 그 얘기를 들은 적이 있습니다", { groupVowels: true }));
console.log(g2p.convert("어제는 날씨가 맑았는데, 오늘은 흐리다.", { toSyllables: false }));Use in another project
Install the package:
npm install g2pk-nodejsUse it from your application:
const { G2p } = require("g2pk-nodejs");
const g2p = new G2p();
const output = g2p.convert("지금 시각은 12시 12분입니다");
console.log(output);With a morphology analyzer
Context-sensitive behavior such as 의, bound nouns, and some verb endings is best when you pass a POS analyzer. The analyzer interface is intentionally simple: it should expose pos(text) and return [token, tag] pairs or { token, tag } objects.
const { G2p } = require("g2pk-nodejs");
const morphAnalyzer = {
pos(text) {
return [
["나", "NP"],
["의", "JKO"],
["친구", "NNG"],
["는", "JX"],
["계산", "NNG"],
["이", "JKS"],
["아주", "MAG"],
["빠르다", "VA"]
];
}
};
const g2p = new G2p({ morphAnalyzer });
console.log(g2p.convert("나의 친구는 계산이 아주 빠르다", { descriptive: true }));With an English pronunciation dictionary
If cmu-pronouncing-dictionary is installed, G2p will try to load it automatically. You can also inject your own dictionary:
const { G2p } = require("g2pk-nodejs");
const g2p = new G2p({
englishDict: {
old: "OW1 L D",
school: "S K UW1 L"
}
});
console.log(g2p.convert("그 사람은 좀, old school 같아"));API
new G2p(options?)
Available constructor options:
morphAnalyzer: optional analyzer withpos(text)englishDict: optional pronunciation dictionarylogger: optional function used whenverbose: truedataPath: optional override forrules.txt,idioms.txt, andtable.csv
g2p.convert(text, options?)
Available conversion options:
descriptive: return colloquial pronunciation-oriented outputgroupVowels: normalize close vowel groupstoSyllables: whenfalse, return jamo instead of assembled syllablesverbose: print intermediate rule applications
CLI
node bin/g2pk.js "어제는 날씨가 맑았는데, 오늘은 흐리다."
node bin/g2pk.js --descriptive "나의 친구는 계산이 아주 빠르다"After publishing, the bin will also be available as:
npx g2pk "그 사람은 좀, old school 같아"Publish to npm
If by "market" you mean the npm registry, this package is ready for that workflow.
- Pick an available package name in
package.json. Ifg2pk-nodejsis taken, switch to your own scope such as@your-scope/g2pk. - Update
version,name,author,repository, and any metadata you want to publish. - Run:
npm test
npm pack- Log in:
npm login- Publish:
npm publish --access publicFor scoped packages, --access public is usually required on the first publish.
Notes
- The original Python source is still in the repo as reference data for the Node.js port.
- English conversion is best when a CMU pronunciation dictionary is present.
- Context-sensitive pronunciation is best when a morphology analyzer is supplied.
Reference
If you use the original research/software lineage in academic work:
@misc{park2019g2pk,
author = {Park, Kyubyong},
title = {g2pK},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/Kyubyong/g2pk}}
}