georgian-hyphenation
v2.2.7
Published
Georgian Language Hyphenation Library with 17+ utility functions - Browser + Node.js compatible
Maintainers
Readme
Georgian Language Hyphenation
Georgian Language Hyphenation Library - Fast, accurate syllabification for Georgian (ქართული) text with support for both browser and Node.js environments.
Features
- ✅ Accurate Georgian syllabification based on phonetic rules
- ✅ Harmonic consonant clusters recognition (ბრ, გრ, კრ, etc.)
- ✅ Gemination handling (double consonant splitting)
- ✅ Exception dictionary for irregular words (148 words)
- ✅ HTML-aware hyphenation - preserves tags and code blocks (new in v2.2.7)
- ✅ 17+ utility functions for advanced text processing (new in v2.2.7)
- ✅ Configurable settings - adjust margins and hyphen character (new in v2.2.7)
- ✅ Browser + Node.js compatible (ESM & CommonJS)
- ✅ Zero dependencies
- ✅ Lightweight (~12KB)
Installation
npm install georgian-hyphenationQuick Start
ES Modules (Modern)
import GeorgianHyphenator from 'georgian-hyphenation';
const hyphenator = new GeorgianHyphenator();
// Basic hyphenation
console.log(hyphenator.hyphenate('საქართველო'));
// Output: საქართველო
// Get syllables as array
console.log(hyphenator.getSyllables('თბილისი'));
// Output: ['თბი', 'ლი', 'სი']
// Count syllables (NEW in v2.2.7)
console.log(hyphenator.countSyllables('გამარჯობა'));
// Output: 4
// Hyphenate HTML (NEW in v2.2.7)
const html = '<p>ქართული ენა <code>console.log()</code> პროგრამირება</p>';
console.log(hyphenator.hyphenateHTML(html));
// Code tags are preserved!CommonJS (Node.js)
const GeorgianHyphenator = require('georgian-hyphenation');
const hyphenator = new GeorgianHyphenator();
console.log(hyphenator.hyphenate('კომპიუტერი'));Browser (CDN)
<script type="module">
import GeorgianHyphenator from 'https://cdn.jsdelivr.net/npm/[email protected]/src/javascript/index.js';
const hyphenator = new GeorgianHyphenator();
console.log(hyphenator.hyphenate('პროგრამირება'));
</script>API Reference
Constructor
const hyphenator = new GeorgianHyphenator(hyphenChar = '\u00AD');Parameters:
hyphenChar(optional): Character to use for hyphenation. Default is soft hyphen (\u00AD)
Core Methods
hyphenate(word)
Hyphenates a single word.
hyphenator.hyphenate('საქართველო');
// Returns: 'საქართველო'getSyllables(word)
Returns an array of syllables.
hyphenator.getSyllables('თბილისი');
// Returns: ['თბი', 'ლი', 'სი']hyphenateText(text)
Hyphenates all words in a text string.
hyphenator.hyphenateText('საქართველო არის ლამაზი ქვეყანა');
// Returns: 'საქართველო არის ლამაზი ქვეყანა'New in v2.2.7: Utility Functions
countSyllables(word)
Get the number of syllables in a word.
hyphenator.countSyllables('გამარჯობა');
// Returns: 4getHyphenationPoints(word)
Get the number of hyphenation points (hyphens) in a word.
hyphenator.getHyphenationPoints('გამარჯობა');
// Returns: 3 (four syllables = three hyphens)isGeorgian(text)
Check if text contains only Georgian characters.
hyphenator.isGeorgian('გამარჯობა'); // true
hyphenator.isGeorgian('hello'); // false
hyphenator.isGeorgian('გამარჯობა123'); // falsecanHyphenate(word)
Check if a word meets minimum length requirements for hyphenation.
hyphenator.canHyphenate('გა'); // false (too short)
hyphenator.canHyphenate('გამარ'); // trueunhyphenate(text)
Remove all hyphenation from text.
const hyphenated = hyphenator.hyphenate('გამარჯობა');
hyphenator.unhyphenate(hyphenated);
// Returns: 'გამარჯობა'hyphenateWords(words)
Hyphenate multiple words at once (batch processing).
const words = ['ქართული', 'ენა', 'მშვენიერია'];
hyphenator.hyphenateWords(words);
// Returns: ['ქართული', 'ენა', 'მშვენიერია']hyphenateHTML(html) ⭐ Most Useful!
Hyphenate HTML content while preserving tags and skipping code blocks.
const html = `
<article>
<h1>ქართული ენა</h1>
<p>პროგრამირება და კომპიუტერული მეცნიერება</p>
<code>console.log('skip me')</code>
<pre>this won't be hyphenated</pre>
</article>
`;
const result = hyphenator.hyphenateHTML(html);
// Only <p> content gets hyphenated
// <code>, <pre>, <script>, <style>, <textarea> are preservedNew in v2.2.7: Configuration Methods
All configuration methods support method chaining:
setLeftMin(value)
Set minimum characters before the first hyphen (default: 2).
hyphenator.setLeftMin(3);
// Now requires at least 3 characters before first hyphensetRightMin(value)
Set minimum characters after the last hyphen (default: 2).
hyphenator.setRightMin(3);
// Now requires at least 3 characters after last hyphensetHyphenChar(char)
Change the hyphen character.
// Use visible hyphen for debugging
hyphenator.setHyphenChar('-');
console.log(hyphenator.hyphenate('გამარჯობა'));
// Output: 'გა-მარ-ჯო-ბა'
// Use custom separator
hyphenator.setHyphenChar('•');
console.log(hyphenator.hyphenate('საქართველო'));
// Output: 'სა•ქარ•თვე•ლო'Method Chaining
const hyphenator = new GeorgianHyphenator()
.setLeftMin(3)
.setRightMin(3)
.setHyphenChar('-');
console.log(hyphenator.hyphenate('გამარჯობა'));New in v2.2.7: Dictionary Management
loadLibrary(data)
Load custom exception dictionary.
const customWords = {
'განათლება': 'გა-ნათ-ლე-ბა',
'უნივერსიტეტი': 'უ-ნი-ვერ-სი-ტე-ტი'
};
hyphenator.loadLibrary(customWords);async loadDefaultLibrary()
Load the built-in exception dictionary (148 words).
await hyphenator.loadDefaultLibrary();
// Dictionary loaded with tech terms, places, political termsaddException(word, hyphenated)
Add a single custom hyphenation exception.
hyphenator.addException('ტესტი', 'ტეს-ტი');
console.log(hyphenator.hyphenate('ტესტი'));
// Returns: 'ტესტი' (uses your custom hyphenation)removeException(word)
Remove an exception from the dictionary.
hyphenator.removeException('ტესტი');
// Returns: true (if word was removed)exportDictionary()
Export the entire dictionary as a JSON object.
const dict = hyphenator.exportDictionary();
console.log(dict);
// { "გამარჯობა": "გა-მარ-ჯო-ბა", ... }getDictionarySize()
Get the number of words in the dictionary.
await hyphenator.loadDefaultLibrary();
console.log(hyphenator.getDictionarySize());
// Output: 148New in v2.2.7: Advanced Features
Harmonic Cluster Management
For advanced users who need to customize consonant cluster recognition:
// Add a custom harmonic cluster
hyphenator.addHarmonicCluster('ტვ');
// Remove a cluster
hyphenator.removeHarmonicCluster('ტვ');
// Get all clusters
const clusters = hyphenator.getHarmonicClusters();
console.log(clusters);
// ['ბლ', 'ბრ', 'ბღ', ... (70+ clusters)]CSS Integration
Use soft hyphens for automatic line breaking:
.georgian-text {
hyphens: auto;
-webkit-hyphens: auto;
-ms-hyphens: auto;
}const hyphenator = new GeorgianHyphenator('\u00AD'); // soft hyphen
document.querySelector('.georgian-text').innerHTML =
hyphenator.hyphenateText('თქვენი ტექსტი აქ');Built-in Dictionary
The library includes 148 pre-hyphenated words including:
Tech Terms: კომპიუტერი, ფეისბუქი, იუთუბი, ინსტაგრამი
Places: საქართველო, თბილისი
Political: პარლამენტი, დემოკრატია, რესპუბლიკა
Compound Words: სახელმწიფო, გულმავიწყი, თავდადებული
await hyphenator.loadDefaultLibrary();
console.log(hyphenator.hyphenate('კომპიუტერი'));
// Uses dictionary: 'კომპიუტერი'Algorithm
The library uses a phonetic algorithm based on Georgian syllable structure:
- Vowel Detection: Identifies vowels (ა, ე, ი, ო, უ)
- Consonant Cluster Analysis: Recognizes 70+ harmonic clusters
- Gemination Rules: Splits double consonants (კკ → კკ)
- Orphan Prevention: Ensures minimum syllable length (2 characters by default)
- Dictionary Lookup: Checks exceptions first for accuracy
Supported Harmonic Clusters
ბლ, ბრ, ბღ, ბზ, გდ, გლ, გმ, გნ, გვ, გზ, გრ, დრ, თლ, თრ, თღ,
კლ, კმ, კნ, კრ, კვ, მტ, პლ, პრ, ჟღ, რგ, რლ, რმ, სწ, სხ, ტკ,
ტპ, ტრ, ფლ, ფრ, ფქ, ფშ, ქლ, ქნ, ქვ, ქრ, ღლ, ღრ, ყლ, ყრ, შთ,
შპ, ჩქ, ჩრ, ცლ, ცნ, ცრ, ცვ, ძგ, ძვ, ძღ, წლ, წრ, წნ, წკ, ჭკ,
ჭრ, ჭყ, ხლ, ხმ, ხნ, ხვ, ჯგUse Cases & Examples
E-book Reader
const hyphenator = new GeorgianHyphenator();
await hyphenator.loadDefaultLibrary();
function formatBook(htmlContent) {
return hyphenator.hyphenateHTML(htmlContent);
}
document.getElementById('content').innerHTML = formatBook(bookHTML);Text Justification
const hyphenator = new GeorgianHyphenator('\u00AD');
const justified = hyphenator.hyphenateText(
'საქართველო არის ერთ-ერთი უძველესი ქვეყანა მსოფლიოში'
);Blog/CMS Integration
const hyphenator = new GeorgianHyphenator();
await hyphenator.loadDefaultLibrary();
// Hyphenate all articles
document.querySelectorAll('article p').forEach(p => {
p.innerHTML = hyphenator.hyphenateHTML(p.innerHTML);
});Form Validation
const hyphenator = new GeorgianHyphenator();
function validateGeorgianInput(text) {
if (!hyphenator.isGeorgian(text)) {
alert('გთხოვთ შეიყვანოთ მხოლოდ ქართული ტექსტი');
return false;
}
return true;
}Syllable-based Animation
const hyphenator = new GeorgianHyphenator();
const syllables = hyphenator.getSyllables('პროგრამირება');
syllables.forEach((syllable, i) => {
setTimeout(() => {
console.log(syllable);
}, i * 200);
});
// Displays: პრო... გრა... მი... რე... ბაBrowser Support
- ✅ Chrome/Edge 90+
- ✅ Firefox 88+
- ✅ Safari 14+
- ✅ Node.js 14+
Performance
- Average hyphenation speed: ~0.05ms per word
- HTML hyphenation: ~2ms for 1000 words
- Memory usage: ~100KB with dictionary loaded
- Optimized with
Setfor O(1) cluster lookups
Changelog
v2.2.7 (2025-02-13) 🎉
New Features (17 functions added):
✨ Utility Functions:
countSyllables(word)- Get syllable countgetHyphenationPoints(word)- Get hyphen countisGeorgian(text)- Validate Georgian textcanHyphenate(word)- Check if word can be hyphenatedunhyphenate(text)- Remove all hyphenshyphenateWords(words)- Batch processinghyphenateHTML(html)- HTML-aware hyphenation 🌟
✨ Configuration (Chainable):
setLeftMin(value)- Configure left marginsetRightMin(value)- Configure right marginsetHyphenChar(char)- Change hyphen character
✨ Dictionary Management:
addException(word, hyphenated)- Add custom wordremoveException(word)- Remove exceptionexportDictionary()- Export as JSONgetDictionarySize()- Get word count
✨ Advanced:
addHarmonicCluster(cluster)- Add custom clusterremoveHarmonicCluster(cluster)- Remove clustergetHarmonicClusters()- List all clusters
Improvements:
- 🔧 All configuration methods support method chaining
- 📚 JSDoc documentation for all methods
- ✅ 100% backwards compatible
- 🎯 No breaking changes
v2.2.6 (2026-01-30)
- ✨ Preserves regular hyphens in compound words
- 🐛 Fixed hyphen stripping to only remove soft hyphens and zero-width spaces
- 📝 Improved documentation
v2.2.4 (2026-01-27)
- 🌐 Browser Fix: Fixed CDN URL for reliable dictionary loading
- 📦 NPM Files: Added
data/folder to published package - 🔧 Error Handling: Improved fallback when dictionary unavailable
- 📝 Documentation: Corrected examples
v2.2.1 (2026-01-26)
- 🧹 Sanitization: Added
_stripHyphensfor automatic input cleaning - ⚡ Performance: Converted
harmonicClusterstoSet(O(1) lookup) - 📦 ESM: Full ES Modules support
- 📚 Dictionary: Added
loadDefaultLibrary()method
v2.0.1 (2026-01-22)
- 🎓 Academic Rewrite: Phonological distance analysis
- 🛡️ Anti-Orphan: Minimum 2 characters on each side
- 🎼 Harmonic Clusters: Georgian-specific consonant groups
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT © Guram Zhgamadze
Author
Guram Zhgamadze
- GitHub: @guramzhgamadze
- Email: [email protected]
Related
Made with ❤️ for the Georgian language community
