amharic-normalizer
v1.0.2
Published
Normalize Amharic text by mapping phonetically equivalent characters (homophones) to a canonical form. Useful for search, NLP, games, and text processing.
Downloads
259
Maintainers
Readme
Amharic Normalizer
Normalize Amharic text by mapping phonetically equivalent characters (homophones) to a canonical form.
Why?
Amharic script has multiple characters that represent the same sound. For example:
| Sound | Characters | Normalized | |-------|------------|------------| | /h/ | ሀ ሐ ኀ | ሀ | | /s/ | ሰ ሠ | ሰ | | /ʔ/ | አ ዐ | አ | | /ts'/ | ጸ ፀ | ጸ |
This causes problems when:
- Searching text - "ሰላም" won't match "ሠላም"
- Comparing strings - Two identical-sounding words appear different
- NLP preprocessing - Models treat homophones as different tokens
- Building games/quizzes - User input might use different character variants
This package solves these problems by normalizing all homophones to a single canonical form.
Installation
npm install amharic-normalizerUsage
import { normalizeAmharic } from "amharic-normalizer";
// Normalize text
normalizeAmharic("ሠላም"); // → "ሰላም"
normalizeAmharic("ሐበሻ"); // → "ሀበሻ"
normalizeAmharic("ዐማርኛ"); // → "አማርኛ"
// Mixed text works too
normalizeAmharic("Hello ሠላም!"); // → "Hello ሰላም!"
// Compare normalized strings
const input1 = normalizeAmharic("ሰላም");
const input2 = normalizeAmharic("ሠላም");
console.log(input1 === input2); // trueAccess the Normalization Map
import { AMHARIC_NORMALIZATION_MAP } from "amharic-normalizer";
// Check if a character has a normalized form
console.log(AMHARIC_NORMALIZATION_MAP["ሠ"]); // "ሰ"
console.log(AMHARIC_NORMALIZATION_MAP["ሰ"]); // "ሰ"Supported Character Families
| Family | Variants | Canonical | Description | |--------|----------|-----------|-------------| | H | ሀ ሐ ኀ (+ all vowel forms) | ሀ series | Glottal fricative /h/ | | S | ሰ ሠ (+ all vowel forms) | ሰ series | Voiceless alveolar fricative /s/ | | A | አ ዐ (+ all vowel forms) | አ series | Glottal stop /ʔ/ | | TS | ጸ ፀ (+ all vowel forms) | ጸ series | Ejective alveolar affricate /ts'/ |
Each family includes all 7 vowel forms (ə, u, i, a, e, ɨ, o).
API
normalizeAmharic(text: string): string
Normalizes Amharic text by replacing phonetically equivalent characters with their canonical form.
Parameters:
text- The Amharic text to normalize
Returns:
- The normalized text
AMHARIC_NORMALIZATION_MAP: Record<string, string>
A map of Amharic characters to their canonical forms. Characters not in the map are left unchanged.
License
MIT
Author
Eyasu Lingerih
