@suhasdissa/singlish
v1.0.0
Published
Sinhala to Singlish transliterator - Convert Sinhala Unicode text to romanized Singlish
Maintainers
Readme
Singlish
A comprehensive Sinhala to Singlish (romanized Sinhala) transliterator for Node.js.
Overview
This package converts Sinhala Unicode text (U+0D80 – U+0DFF) to romanized Singlish text, making it easier to read and type Sinhala using Latin characters.
Features
- Comprehensive character support: Handles all Sinhala vowels, consonants, vowel signs, and special characters
- Accurate transliteration: Properly handles consonant clusters, gemination, and complex linguistic features
- Zero-Width Joiner handling: Automatically removes invisible formatting characters
- Flexible options: Configurable output with lowercase and space preservation options
- Type-safe: Handles invalid inputs gracefully
- Well-tested: 132 comprehensive test cases covering edge cases
- No dependencies: Lightweight and standalone
Installation
npm install singlishUsage
Basic Usage
const { singlish } = require('singlish');
// Simple transliteration
const result = singlish('සිංහල');
console.log(result); // Output: "sinhala"
// More examples
console.log(singlish('ආයුබෝවන්')); // "aayuboowan"
console.log(singlish('කොළඹ')); // "kolamba"
console.log(singlish('ශ්රී ලංකා')); // "shrii lankaa"With Options
const { singlish } = require('singlish');
// Convert to lowercase
const lowercase = singlish('සිංහල', { lowercase: true });
console.log(lowercase); // "sinhala"
// Control space preservation
const text = 'මම කන්නේ බත්'; // Multiple spaces
const result = singlish(text, { preserveSpaces: false });
console.log(result); // "mama kannee bath" (normalized spaces)Detailed Transliteration
For debugging or analysis purposes, you can get a character-by-character breakdown:
const { singlishDetailed } = require('singlish');
const detailed = singlishDetailed('සිංහල');
console.log(detailed);
// {
// result: "sinhala",
// breakdown: [
// { original: "සි", transliterated: "si", position: 0 },
// { original: "ං", transliterated: "n", position: 2 },
// { original: "හ", transliterated: "ha", position: 3 },
// { original: "ල", transliterated: "la", position: 4 }
// ]
// }API Reference
singlish(text, options)
Transliterates Sinhala text to Singlish.
Parameters:
text(string): The Sinhala text to transliterateoptions(object, optional):preserveSpaces(boolean, default:true): Whether to preserve multiple spaceslowercase(boolean, default:false): Whether to convert output to lowercase
Returns: string - The romanized Singlish text
Example:
singlish('ආයුබෝවන්', { lowercase: true })
// Returns: "aayuboowan"singlishDetailed(text)
Provides detailed transliteration with character-by-character breakdown.
Parameters:
text(string): The Sinhala text to transliterate
Returns: object - An object containing:
result(string): The transliterated textbreakdown(array): Array of objects withoriginal,transliterated, andpositionproperties
Example:
const details = singlishDetailed('මම');
// Returns:
// {
// result: "mama",
// breakdown: [
// { original: "ම", transliterated: "ma", position: 0 },
// { original: "ම", transliterated: "ma", position: 1 }
// ]
// }Transliteration Mapping
Vowels (Independent)
| Sinhala | Singlish | Unicode | |---------|----------|---------| | අ | a | U+0D85 | | ආ | aa | U+0D86 | | ඇ | ae | U+0D87 | | ඈ | aae | U+0D88 | | ඉ | i | U+0D89 | | ඊ | ii | U+0D8A | | උ | u | U+0D8B | | ඌ | uu | U+0D8C | | එ | e | U+0D91 | | ඒ | ee | U+0D92 | | ඔ | o | U+0D94 | | ඕ | oo | U+0D95 |
Consonants (with inherent 'a')
| Sinhala | Singlish | Unicode | |---------|----------|---------| | ක | ka | U+0D9A | | ග | ga | U+0D9C | | ච | cha | U+0DA0 | | ජ | ja | U+0DA2 | | ට | ta | U+0DA7 | | ඩ | da | U+0DA9 | | ත | tha | U+0DAD | | ද | dha | U+0DAF | | න | na | U+0DB1 | | ප | pa | U+0DB4 | | බ | ba | U+0DB6 | | ම | ma | U+0DB8 | | ය | ya | U+0DBA | | ර | ra | U+0DBB | | ල | la | U+0DBD | | ව | wa | U+0DC0 | | ස | sa | U+0DC3 | | හ | ha | U+0DC4 |
Vowel Signs (Dependent)
| Sinhala | Singlish | Unicode | |---------|----------|---------| | ා | aa | U+0DCF | | ැ | ae | U+0DD0 | | ෑ | aae | U+0DD1 | | ි | i | U+0DD2 | | ී | ii | U+0DD3 | | ු | u | U+0DD4 | | ූ | uu | U+0DD6 | | ෙ | e | U+0DD9 | | ේ | ee | U+0DDA | | ො | o | U+0DDC | | ෝ | oo | U+0DDD |
Special Characters
| Sinhala | Singlish | Description | Unicode | |---------|----------|-------------|---------| | ං | n | Anusvara | U+0D82 | | ඃ | h | Visarga | U+0D83 | | ් | - | Halanta (removes inherent vowel) | U+0DCA |
Examples
Common Words
singlish('අම්මා') // "ammaa" (mother)
singlish('තාත්තා') // "thaaththaa" (father)
singlish('ගෙදර') // "gedhara" (home)
singlish('පාසල') // "paasala" (school)Sentences
singlish('සුභ දවසක්') // "subha dhawasak" (good day)
singlish('මම කන්නේ බත්') // "mama kannee bath" (I eat rice)Mixed Content
The transliterator preserves non-Sinhala characters:
singlish('Hello මගේ නම John')
// "Hello magee nama John"
singlish('මම 10 යි')
// "mama 10 yi"How It Works
- Input Processing: Removes Zero-Width Joiner (ZWJ) and other invisible formatting characters
- Character Analysis: Processes text character by character, identifying vowels, consonants, vowel signs, and special characters
- Smart Handling:
- Consonants with halanta (්) remove the inherent 'a' sound
- Consonants with vowel signs replace the inherent 'a' with the appropriate vowel
- Standalone consonants retain the inherent 'a'
- Post-Processing: Cleans up any duplicate characters and normalizes spacing
Testing
The package includes 132 comprehensive test cases covering:
- Independent vowels
- Consonants with inherent vowels
- Consonants with halanta (virama)
- Consonants with vowel signs
- Special characters (anusvara, visarga)
- Common Sinhala words
- Sentences and phrases
- Consonant clusters
- Edge cases (empty strings, null values, non-Sinhala text)
- Real-world examples
- Unicode edge cases
Run tests with:
npm testLimitations
- The transliteration is phonetic and may not always match exact pronunciation
- Some consonant clusters produce multiple consonant letters in sequence (e.g., බුද්ධ → "budhdhha")
- The package focuses on standard Sinhala; archaic or rarely-used characters may not be fully supported
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
License
MIT
Author
SuhasDissa [email protected]
Keywords
sinhala, singlish, transliteration, romanization, sri lanka, unicode, language
