@cloudrumbles/indic-transliterate
v1.2.2
Published
High-accuracy transliteration for 21 Indian languages using AI4Bharat's IndicXlit models via ONNX.
Maintainers
Readme
@cloudrumbles/indic-transliterate
Node.js transliteration for 21 Indian languages using AI4Bharat's IndicXlit models via ONNX Runtime.
This is an unofficial port. Models (~47MB) are bundled with the package.
Install
npm install @cloudrumbles/indic-transliterateUsage
import { IndicTransliterator } from '@cloudrumbles/indic-transliterate'
const transliterator = new IndicTransliterator()
// Tamil
const tamil = await transliterator.transliterate('amma', 'ta')
// => ['அம்மா', 'அம்ம', 'ஆம்மா', ...]
// Hindi
const hindi = await transliterator.transliterate('namaste', 'hi')
// => ['नमस्ते', 'नमस्ती', ...]
// Get more candidates
const results = await transliterator.transliterate('chennai', 'ta', 10)
// With rescoring (improves accuracy by ~6%)
const t = new IndicTransliterator({ rescore: true })
const rescored = await t.transliterate('amma', 'ta', 4)
// => ['அம்மா', 'ஆம்மா', 'அம்ம', 'ஆமா']
// Dictionary auto-downloads on first use (~200MB per language)API
new IndicTransliterator(options?)
Creates a new transliterator instance. Models are loaded lazily on first use.
const transliterator = new IndicTransliterator({
beamWidth: 4, // beam search width (default: 4)
maxLen: 20, // max output length (default: 20)
rescore: true, // use word frequency dict for better accuracy (default: false)
rescoreAlpha: 0.9, // model vs dict weight (default: 0.9)
modelPath: '...' // custom model path (default: bundled)
})transliterate(word, langCode, count?): Promise<string[]>
Transliterates a romanized word to the target script. Returns count candidates (default 5) ranked by likelihood.
getSupportedLanguages(): string[]
Returns array of supported language codes.
initialize(): Promise<void>
Pre-loads the ONNX models. Optional - models load automatically on first transliterate() call.
dispose(): Promise<void>
Releases ONNX sessions to free memory. Call when done.
isInitialized: boolean
Whether models are currently loaded.
Supported Languages
| Code | Language | Code | Language | Code | Language |
|------|----------|------|----------|------|----------|
| as | Assamese | hi | Hindi | ne | Nepali |
| bn | Bengali | kn | Kannada | or | Odia |
| brx | Bodo | ks | Kashmiri | pa | Punjabi |
| gom | Konkani | mai | Maithili | sa | Sanskrit |
| gu | Gujarati | ml | Malayalam | sd | Sindhi |
| mr | Marathi | mni | Manipuri | si | Sinhala |
| ta | Tamil | te | Telugu | ur | Urdu |
Credits
Models and vocabulary from IndicXlit by AI4Bharat (MIT License).
@article{Madhani2022AksharantarTB,
title={Aksharantar: Towards building open transliteration tools for the next billion users},
author={Yash Madhani and Sushane Parthan and Priyanka A. Bedekar and Ruchi Khapra and Vivek Seshadri and Anoop Kunchukuttan and Pratyush Kumar and Mitesh M. Khapra},
journal={ArXiv},
year={2022},
volume={abs/2205.03018}
}License
MIT
