@cadriva/greek-stemmer-ts
v0.1.1
Published
TypeScript implementation of the Ntais (2006) Greek stemmer (fork of [email protected] by Apmats, MIT)
Maintainers
Readme
greek-stemmer-ts
A dependency-free TypeScript implementation of the Ntais (2006) Greek stemmer. Fork of [email protected] by Apmats (2015) — MIT License.
Install
npm install @cadriva/greek-stemmer-tsUsage
import { stem } from "@cadriva/greek-stemmer-ts";
// Greek input must be pre-normalised: NFD + strip combining marks + uppercase.
const result = stem("ΤΙΜΟΛΟΓ"); // => 'ΤΙΜΟΛΟΓ'
function normaliseGreek(word: string): string {
return word
.normalize("NFD")
.replace(/[̀-ͯ]/g, "")
.replace(/ς/g, "σ")
.toUpperCase();
}
console.log(stem(normaliseGreek("τιμολόγια"))); // => 'ΤΙΜΟΛΟΓ'
console.log(stem("invoice")); // => 'invoice' (non-Greek passes through)
console.log(stem("")); // => ''Notes
- Operates on monotonic uppercase Greek text only.
- Polytonic Greek is not supported — normalise polytonic input to monotonic before calling
stem(). - No language detection, tokenisation, or normalisation: callers pre-normalise. Non-Greek inputs (Latin, numbers, etc.) pass through unchanged.
Algorithm
Ntais (2006) Greek stemmer, implemented as a Snowball-style cascade of regex-based suffix-stripping rules. Pure TypeScript, zero runtime dependencies, zero network/disk/eval surface.
Two rule differences from [email protected] upstream:
| Step | Rule | Upstream | This package |
| ------- | -------------------------------- | -------------------------------------------------- | ------------------------------- |
| Step 2a | ΑΔΕΣ/ΑΔΩΝ exception list | includes ΓΑΛΑΤ, ΦΑΦΛΑΤ | does not include them |
| Step 5h | ΟΥΣΑ/ΟΥΣΕΣ/ΟΥΣΕ retention list | missing ΔΕ, ΔΕΥΤΕΡΕΥ, ΚΑΘΑΡΕΥ, ΠΛΕ, ΤΣΑ | includes all five |
See src/index.ts comments for source citations.
License
MIT — see LICENSE.
