tamil-romanizer

v1.0.2

Published

a month ago

A robust, context-aware rule-based Tamil-to-English romanization library

0High
0Medium
0Low

tamil tamil_extended tamil_superscripted phonetic romanizer romanization transliteration indic-scripts nlp unicode iso15919 ala-lc practical

tamil-romanizer

A completely context-aware, highly accurate Tamil-to-English romanization library for Node.js and the browser.

Unlike naive character-replacement scripts that turn "சிங்கம்" into cinkam, tamil-romanizer understands Tamil phonology. It natively handles intervocalic softening, post-nasal voicing, and word-boundaries to produce natural, readable Tanglish (e.g., "singam").

It is fast, rigorously tested (100% ISO compliance), and built for real-world text.

Live Demo

Try the engine instantly in your browser: Tamil Romanizer Live Demo

Demo GIF

Why this library?

Most Tamil transliteration tools fail because they treat the language as a 1-to-1 character map. Tamil doesn't work that way. tamil-romanizer analyzes the context of every letter:

| Tamil Input | Naive approach | tamil-romanizer | Why? | |-------------|----------------|-------------------|------| | பம்பரம் | pamparam | pambaram | Identifies word-initial p vs post-nasal b | | சட்டம் | satam | sattam | Detects geminate (double) consonant clusters | | ஞானம் | nyanam | gnanam | Uses practical Tanglish conventions for word-initials | | ஃபேன் | akpaen | fan | Analyzes Aytham lookaheads and cross-references an internal proper-noun dictionary |

Installation

npm install tamil-romanizer

Quick Start

import { romanize } from 'tamil-romanizer';

// 1. Basic usage maps to highly accurate practical phonetics
const text = romanize("தமிழ்நாடு");
console.log(text); // "tamilnadu" (detected via built-in dictionary)

const text2 = romanize("பம்பரம்");
console.log(text2); // "pambaram" (context-aware mapping)

Advanced Options

Provide an options object as the second argument to control the output format, scheme, or dictionary usage.

1. Capitalization Formatting

Romanize targets English letters (which have case), while Tamil does not. You can enforce casing rules natively:

const sentence = "சென்னை ஒரு அழகான நகரம்";

console.log(romanize(sentence)); 
// "chennai oru azhagana nagaram" (Default: 'none' - strict lowercase)

console.log(romanize(sentence, { capitalize: 'sentence' })); 
// "Chennai oru azhagana nagaram"

console.log(romanize(sentence, { capitalize: 'words' })); 
// "Chennai Oru Azhagana Nagaram"

2. Scholarly Translating (ISO 15919)

If you are building an academic tool or require strict, lossless character-level transliteration, use the iso15919 scheme.

// ISO 15919 enforces direct diacritic mapping without contextual softening
const text = romanize("பம்பரம்", { scheme: 'iso15919', exceptions: false });
console.log(text); // "pamparam"

const strict = romanize("தமிழ்", { scheme: 'iso15919' });
console.log(strict); // "tamiḻ"

(Also supports ala-lc schema via { scheme: 'ala-lc' })

3. Turning off the Exception Dictionary

The library ships with a fast exception trie that automatically corrects common loan words and proper nouns (e.g. பஸ் -> bus, சென்னை -> Chennai).

If you want the raw, algorithmic output of the underlying state machine, disable the exceptions flag:

// With dictionary (Default)
romanize("பஸ்"); // "bus"

// Algorithmic output
romanize("பஸ்", { exceptions: false }); // "bas"

Mixed-language Safe

Don't worry about sanitizing your inputs. If you pass a string containing English, numbers, emojis, or punctuation, tamil-romanizer surgically transliterates only the Tamil characters and leaves everything else perfectly intact.

const mixed = "The ticket price is ௫௦௦ rupees (ரூபாய்) 🤯!";
console.log(romanize(mixed)); 
// "The ticket price is 500 rupees (roobaay) 🤯!"

(Notice how it also safely converts native Tamil numerals natively!)

API Reference

romanize(text: string, options?: Object) => string

| Option | Type | Default | Description | |---|---|---|---| | scheme | 'practical' \| 'iso15919' \| 'ala-lc' | 'practical' | Determines the transliteration ruleset. | | exceptions | boolean | true | Enables/disables the internal dictionary for loan words. | | capitalize | 'none' \| 'sentence' \| 'words' | 'none' | Controls the casing of the returned string. |

Built for Tamil by Harold Alan.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

tamil-romanizer

Live Demo

Why this library?

Installation

Quick Start

Advanced Options

1. Capitalization Formatting

2. Scholarly Translating (ISO 15919)

3. Turning off the Exception Dictionary

Mixed-language Safe

API Reference