normalize-vietnamese
v1.0.2
Published
A TypeScript library for Vietnamese text processing including accent normalization, text masking, and string utilities
Maintainers
Readme
Normalize Vietnamese
A TypeScript library for Vietnamese text processing including accent normalization, text masking, and string utilities.
Features
- ✅ Vietnamese Accent Normalization: Automatically correct Vietnamese accent placement according to grammar rules
- ✅ Text Masking: Mask sensitive information in strings
- ✅ Text Normalization: Convert text to lowercase, remove special characters
- ✅ TypeScript Support: Full TypeScript definitions included
- ✅ Zero Dependencies: Only requires
slugifyfor text normalization
Installation
# use npm
npm install normalize-vietnamese
# use yarn
yarn add normalize-vietnameseUsage
Import
import Str from "normalize-vietnamese";
// or
import { Str } from "normalize-vietnamese";Vietnamese Accent Normalization
The normalizeVietnameseAccent method corrects Vietnamese accent placement according to Vietnamese grammar rules:
// Correct diphthongs (2 vowels) - accent on first vowel
Str.normalizeVietnameseAccent("toà"); // returns 'tòa'
Str.normalizeVietnameseAccent("thuỷ"); // returns 'thủy'
// Correct triphthongs (3 vowels) - accent on second vowel
Str.normalizeVietnameseAccent("tòan"); // returns 'toàn'
Str.normalizeVietnameseAccent("khủyu"); // returns 'khuỷu'
// Exception: ê and ơ have priority regardless of position
Str.normalizeVietnameseAccent("thủơ"); // returns 'thuở'
Str.normalizeVietnameseAccent("chuỵên"); // returns 'chuyện'
// Handle special consonant clusters (gi, qu)
Str.normalizeVietnameseAccent("gìa"); // returns 'già'
Str.normalizeVietnameseAccent("qủa"); // returns 'quả'
// Process multiple words
Str.normalizeVietnameseAccent("tòa nhà toàn"); // returns 'tòa nhà toàn'Vietnamese Accent Rules
- Single vowel: Accent stays on the vowel
- Two vowels (diphthong): Accent on first vowel
- Three vowels (triphthong): Accent on second vowel
- Exception:
êandơhave priority regardless of position - Special consonants:
giandquare treated as single consonants
Text Masking
// Mask entire string
Str.mask("hello"); // returns '*****'
// Mask with start and end positions
Str.mask("hello", 1, 4); // returns 'h***o'
// Negative positions (from end)
Str.mask("hello", -2, 4); // returns 'hel*o'
Str.mask("hello", 1, -1); // returns 'h***o'Text Normalization
// Convert to lowercase, remove special characters
Str.normalize("Hello World!"); // returns 'hello world'
// Handles Vietnamese characters
Str.normalize("Xin chào thế giới!"); // returns 'xin chao the gioi'API Reference
Str.normalizeVietnameseAccent(text: string): string
Normalizes Vietnamese accent marks according to Vietnamese grammar rules.
- Parameters:
text- The text to normalize - Returns: The normalized text with proper accent placement
- Throws: Returns original input if not a string
Str.mask(text: string, start?: number, end?: number): string
Masks part of a string with asterisks.
- Parameters:
text- The text to maskstart- Start position (default: 0, supports negative values)end- End position (default: 0, supports negative values)
- Returns: The masked string
- Throws: Returns original text for invalid parameters
Str.normalize(text: string): string
Normalizes text by converting to lowercase and removing special characters.
- Parameters:
text- The text to normalize - Returns: The normalized text
- Throws: Returns original input if not a string
Development
Setup
git clone https://github.com/nvminh461/normalize-vietnamese
cd normalize-vietnamese
npm installScripts
# Build the library
npm run build
# Run tests
npm test
# Run tests in watch mode
npm run test:watch
# Generate test coverage
npm run test:coverage
# Development mode (watch for changes)
npm run devTesting
The library includes comprehensive tests covering all functionality:
npm testRequirements
- Node.js >= 14.0.0
- TypeScript >= 4.0.0 (for development)
License
MIT
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Changelog
1.0.0
- Initial release
- Vietnamese accent normalization
- Text masking functionality
- Text normalization utilities
- Full TypeScript support
