lyric-romanizer
v0.1.1
Published
Script detection and local romanization engine for lyrics — supports Japanese, Chinese, Korean, Cyrillic, Indic scripts, Tamil, Thai, and more
Maintainers
Readme
lyric-romanizer
Script detection and local romanization engine for lyrics. Supports 13 scripts across Japanese, Chinese, Korean, Cyrillic, Indic, Tamil, Thai, and Latin — all running locally with zero API calls.
Extracted from Spotify Karaoke. Used by OpenKara.
Installation
npm install lyric-romanizeryarn add lyric-romanizerpnpm add lyric-romanizerQuick Start
import { createRomanizer, detectScript } from 'lyric-romanizer';
const romanizer = createRomanizer();
// Auto-detect script and romanize
const result = await romanizer.romanizeLines(['你好世界', 'こんにちは']);
// { script: 'chinese', lines: ['nǐ hǎo shì jiè', 'こんにちは'] }
// Romanize a single line
const line = await romanizer.romanizeLine('안녕하세요');
// 'annyeonghaseyo'API
Imports
// Main entry — full romanization engine
import {
createRomanizer,
detectScript,
isLatinScript,
requiresExternalRomanization,
UnsupportedRomanizationError,
} from 'lyric-romanizer';
// Detector-only subpath — lightweight, no romanization dependencies
import { detectScript, isLatinScript, NON_LATIN_SCRIPT_RE } from 'lyric-romanizer/detector';Types
type ScriptType =
| 'japanese' | 'chinese' | 'korean' | 'cyrillic'
| 'devanagari' | 'gujarati' | 'gurmukhi' | 'telugu'
| 'kannada' | 'odia' | 'tamil' | 'malayalam'
| 'bengali' | 'arabic' | 'hebrew' | 'thai'
| 'latin' | 'other';
interface Romanizer {
romanizeLine(line: string, options?: RomanizeOptions): Promise<string>;
romanizeLines(lines: readonly string[], options?: RomanizeOptions): Promise<RomanizeResult>;
}
type RomanizeOptions = { script?: ScriptType };
type RomanizeResult = { script: ScriptType; lines: string[] };
type RomanizerOptions = { japaneseDictPath?: string };Functions
createRomanizer(options?)
Factory that returns a Romanizer instance. The Kuroshiro engine (Japanese) is lazily initialized on first use and cached.
const romanizer = createRomanizer();
// Override the Kuromoji dictionary CDN path (e.g. for self-hosting)
const romanizer = createRomanizer({
japaneseDictPath: 'https://my-cdn.com/kuromoji/dict',
});detectScript(lines)
Detects the dominant script in the given text lines. Checks for Japanese kana first (definitive), then scores all other scripts by character count.
detectScript(['こんにちは']); // 'japanese'
detectScript(['你好世界']); // 'chinese'
detectScript(['Привет']); // 'cyrillic'
detectScript(['Hello world']); // 'latin'
detectScript(['123 ???']); // 'other'isLatinScript(lines)
Fast check — returns true if the text contains only Latin letters (no CJK, Cyrillic, Indic, etc.). Useful for skipping romanization entirely.
isLatinScript(['Hello world']); // true
isLatinScript(['안녕하세요']); // false
isLatinScript(['♪♪♪']); // false (no letters)requiresExternalRomanization(script)
Returns true for scripts that cannot be romanized locally and require an external API.
requiresExternalRomanization('chinese'); // false
requiresExternalRomanization('arabic'); // true
requiresExternalRomanization('malayalam'); // trueRomanizer Interface
romanizer.romanizeLine(line, options?)
Romanizes a single line. If script is omitted, it is auto-detected via detectScript. Returns the original line unchanged for Latin text or non-letter content.
Throws UnsupportedRomanizationError for external scripts.
await romanizer.romanizeLine('你好世界');
// 'nǐ hǎo shì jiè'
await romanizer.romanizeLine('Привет мир');
// 'Privet mir'
await romanizer.romanizeLine('Hello world');
// 'Hello world' (no-op)
await romanizer.romanizeLine('مرحبا');
// throws UnsupportedRomanizationError { script: 'arabic' }romanizer.romanizeLines(lines, options?)
Romanizes multiple lines in parallel. Returns the detected script and romanized lines.
const { script, lines } = await romanizer.romanizeLines([
'สวัสดี',
'ชาวโลก',
]);
// { script: 'thai', lines: ['sawatdi', 'chaolok'] }UnsupportedRomanizationError
Thrown when attempting to romanize a script that requires an external API. Has a script property for programmatic handling.
try {
await romanizer.romanizeLine('مرحبا');
} catch (err) {
if (err instanceof UnsupportedRomanizationError) {
console.log(err.script); // 'arabic'
// fall back to external API
}
}Supported Scripts
Local (fully offline)
| Script | Engine | Example |
|--------|--------|---------|
| Japanese | kuroshiro + kuromoji | こんにちは → konnichiha |
| Chinese | pinyin-pro | 你好 → nǐ hǎo |
| Korean | @romanize/korean | 안녕 → annyeong |
| Cyrillic | cyrillic-to-translit-js | Привет → Privet |
| Devanagari | sanscript | नमस्ते → namaste |
| Gujarati | sanscript | નમસ્તે → namaste |
| Gurmukhi | sanscript | ਨਮਸਤੇ → namaste |
| Telugu | sanscript | నమస్తే → namaste |
| Kannada | sanscript | ನಮಸ್ತೆ → namaste |
| Odia | sanscript | ନମସ୍ତେ → namaste |
| Tamil | tamil-romanizer | வணக்கம் → vanakkam |
| Thai | @dehoist/romanize-thai | สวัสดี → sawatdi |
| Latin | (no-op) | Hello → Hello |
External (requires API)
malayalam, bengali, arabic, hebrew, other — use requiresExternalRomanization() to detect these and branch to your preferred API.
Cyrillic Detection
Cyrillic auto-detects Ukrainian-specific characters (і, ї, є, ґ) and applies the Ukrainian transliteration preset. All other Cyrillic text defaults to Russian.
License
MIT
