trigrams
v6.0.0
Published
Trigram files for 500+ languages
Maintainers
Readme
trigrams
Trigrams for 500+ languages.
Contents
What is this?
This package exposes all trigrams for natural languages. Based on the most translated copyright-free document on this planet: UDHR.
When should I use this?
When you are dealing with natural language detection.
Install
This package is ESM only. In Node.js (version 18+), install with npm:
npm install trigramsIn Deno with esm.sh:
import {min, top} from 'https://esm.sh/trigrams@6'In browsers with esm.sh:
<script type="module">
import {min, top} from 'https://esm.sh/trigrams@6?bundle'
</script>Use
import {min, top} from 'trigrams'
console.log((await min()).nld)
console.log((await top()).pam)Yields:
[ // 300 top trigrams.
' ar',
'eer',
'tij',
// …
'de ',
'an ',
'en ' // Most common trigram.
]{ // 300 top trigrams.
'isa': 6,
'upa': 6,
'i k': 6,
// …
'ang': 273,
'ing': 282,
'ng ': 572 // Most common trigram with how often it was found.
}API
This package exports the identifiers
min and
top.
It exports no TypeScript types.
There is no default export.
min()
Get top trigrams.
Returns
Returns a promise resolving to arrays containing the top 300 trigrams sorted
from least occurring to most occurring
(Promise<Record<string, Array<string>>>).
top()
Get top trigrams to occurrence counts.
Returns
Returns a promise resolving to an object mapping
UDHR in Unicode
codes to objects mapping the top 300 trigrams to occurrence counts
(Promise<Record<string, Record<string, number>>>).
Data
The trigrams are based on the unicode versions of the universal declaration of human rights.
The files are created from all paragraphs made available by
wooorm/udhr and do not include headings and such.
Before creating trigrams,
- the unicode characters from
\u0021to\u0040(both including) are removed - one or more white space characters (
\s+) are replaced with a single space - alphabetic characters are lower cased (
[A-Z])
Additionally, the input is padded with two spaces on both sides.
| Code | Name |
| - | - |
| 007 | Sãotomense |
| 008 | Crioulo, Upper Guinea (008) |
| 009 | Mbundu (009) |
| 010 | Tetun Dili |
| 011 | Umbundu (011) |
| 013 | (Mijisa) |
| 014 | (Maiunan) |
| 016 | (Minjiang, spoken) |
| 017 | (Minjiang, written) |
| 020 | Drung |
| 021 | (Muzzi) |
| 022 | (Klau) |
| 025 | (Bizisa) |
| 026 | (Yeonbyeon) |
| 027 | Gumuz |
| 028 | Kafa |
| 029 | Sidamo |
| 030 | Kituba (2) |
| 032 | South Azerbaijani |
| 041 | Latvian (2) |
| 042 | Spanish (resolution) |
| 043 | Zarma |
| 044 | Mirandese |
| 045 | Maasai |
| 046 | Malay, Papuan |
| 047 | Malay, Ambonese |
| 048 | Minangkabau (2) |
| 049 | Banjar |
| 050 | (Bataknese) |
| 052 | Morisyen |
| 053 | Hausa (2) |
| 054 | Catalan (2) |
| 055 | Jamaican Creole English |
| 056 | Saint Lucian Creole French |
| 057 | Maay |
| 058 | Somali (Af Marka) |
| 059 | North Saami (2) |
| 060 | Inari Saami |
| 061 | Skolt Saami |
| 062 | Swahili (Chimwiini) |
| 063 | Swahili (Kibajuni) |
| 064 | Dabarre |
| 065 | Garre |
| 066 | Jiiddu |
| 067 | Finnish (2) |
| 068 | French (Welche) |
| 069 | Maori (2) |
| 071 | Kabyle |
| aar | Afar |
| abk | Abkhaz |
| ace | Aceh |
| acu | Achuar-Shiwiar |
| acu_1 | Achuar-Shiwiar (1) |
| ada | Dangme |
| ady | Adyghe |
| afr | Afrikaans |
| agr | Aguaruna |
| aii | Assyrian Neo-Aramaic |
| ajg | Aja |
| aka_akuapem | Twi (Akuapem) |
| aka_asante | Twi (Asante) |
| aka_fante | Fante |
| als | Albanian, Tosk |
| alt | Altai, Southern |
| amc | Amahuaca |
| ame | Yaneshaʼ |
| amh | Amharic |
| ami | Amis |
| amr | Amarakaeri |
| arb | Arabic, Standard |
| arl | Arabela |
| arn | Mapudungun |
| ast | Asturian |
| auc | Waorani |
| auv | Occitan (Auvergnat) |
| ayo | Ayoreo |
| ayr | Aymara, Central |
| azj_cyrl | Azerbaijani, North (Cyrillic) |
| azj_latn | Azerbaijani, North (Latin) |
| bam | Bamanankan |
| ban | Bali |
| bax | Bamun |
| bba | Baatonum |
| bci | Baoulé |
| bcl | Bicolano, Central |
| bel | Belarusan |
| bem | Bemba |
| ben | Bengali |
| bfa | Bari |
| bho | Bhojpuri |
| bin | Edo |
| bis | Bislama |
| blt | Tai Dam |
| blu | Hmong Njua |
| boa | Bora |
| bod | Tibetan, Central |
| bos_cyrl | Bosnian (Cyrillic) |
| bos_latn | Bosnian (Latin) |
| bre | Breton |
| btb | Bulu |
| buc | Bushi |
| bug | Bugis |
| bul | Bulgarian |
| bvi | Belanda Viri |
| cab | Garifuna |
| cak | Kaqchikel, Central |
| cas | Tsimané |
| cat | Catalan |
| cbi | Chachi |
| cbr | Cashibo-Cacataibo |
| cbs | Cashinahua |
| cbt | Chayahuita |
| cbu | Candoshi-Shapra |
| ccx | Zhuang, Yongbei |
| ceb | Cebuano |
| ces | Czech |
| cha | Chamorro |
| chj | Chinantec, Ojitlán |
| chk | Chuukese |
| chr_cased | Cherokee (cased) |
| chr_uppercase | Cherokee (uppercase) |
| chv | Chuvash |
| cic | Chickasaw |
| cjk | Chokwe |
| cjk_AO | Chokwe (Angola) |
| cjs | Shor |
| ckb | Kurdish, Central |
| cnh | Chin, Haka |
| cni | Asháninka |
| cnr | Montenegrin |
| cof | Colorado |
| cos | Corsican |
| cot | Caquinte |
| cpu | Ashéninka, Pichis |
| crh | Crimean Tatar |
| crs | Seselwa Creole French |
| csa | Chinantec, Chiltepec |
| csw | Cree, Swampy |
| ctd | Chin, Tedim |
| cym | Welsh |
| dag | Dagbani |
| dan | Danish |
| ddn | Dendi |
| deu_1901 | German, Standard (1901) |
| deu_1996 | German, Standard (1996) |
| dga | Dagaare, Southern |
| dip | Dinka, Northeastern |
| div | Maldivian |
| dyo | Jola-Fonyi |
| dyu | Jula |
| dzo | Dzongkha |
| ell_monotonic | Greek (monotonic) |
| ell_polytonic | Greek (polytonic) |
| emk | Maninkakan, Eastern |
| eml | Romagnolo |
| eng | English |
| epo | Esperanto |
| ese | Ese Ejja |
| est | Estonian |
| eus | Basque |
| eve | Even |
| evn | Evenki |
| ewe | Éwé |
| fao | Faroese |
| fij | Fijian |
| fin | Finnish |
| fkv | Finnish, Kven |
| flm | Chin, Falam |
| fon | Fon |
| fra | French |
| fri | Frisian, Western |
| fuf | Pular |
| fur | Friulian |
| fuv | Fulfulde, Nigerian |
| fuv2 | Fulfulde, Nigerian (2) |
| fvr | Fur |
| gaa | Ga |
| gag | Gagauz |
| gax | Oromo, Borana-Arsi-Guji |
| gjn | Gonja |
| gkp | Kpelle, Guinea |
| gla | Gaelic, Scottish |
| gld | Nanai |
| gle | Gaelic, Irish |
| glg | Galician |
| glv | Manx |
| gnw | Guarani, Western Bolivian |
| gsw1 | Alemannisch (Elsassisch) |
| guc | Wayuu |
| gug | Guaraní, Paraguayan |
| guj | Gujarati |
| guu | Yanomamö |
| gyr | Guarayu |
| hat_kreyol | Haitian Creole French (Kreyol) |
| hat_popular | Haitian Creole French (Popular) |
| hau_NE | Hausa (Niger) |
| hau_NG | Hausa (Nigeria) |
| hau_3 | Hausa |
| haw | Hawaiian |
| hea | Hmong, Northern Qiandong |
| heb | Hebrew |
| hil | Hiligaynon |
| hin | Hindi |
| hlt | Chin, Matu |
| hms | Hmong, Southern Qiandong |
| hna | Gen |
| hni | Hani |
| hns | Hindustani, Sarnami |
| hrv | Croatian |
| hsb | Sorbian, Upper |
| hsf | Huastec (Sierra de Otontepec) |
| hun | Hungarian |
| hus | Huastec (Veracruz) |
| huu | Huitoto, Murui |
| hva | Huastec (San Luís Potosí) |
| hye | Armenian |
| ibb | Ibibio |
| ibo | Igbo |
| ido | Ido |
| idu | Idoma |
| ijs | Ijo, Southeast |
| ike | Inuktitut, Eastern Canadian |
| ilo | Ilocano |
| ina | Interlingua |
| ind | Indonesian |
| isl | Icelandic |
| ita | Italian |
| jav | Javanese (Latin) |
| jav_java | Javanese (Javanese) |
| jiv | Shuar |
| jpn | Japanese |
| jpn_osaka | Japanese (Osaka) |
| jpn_tokyo | Japanese (Tokyo) |
| kaa | Karakalpak |
| kal | Inuktitut, Greenlandic |
| kan | Kannada |
| kat | Georgian |
| kaz | Kazakh |
| kbd | Kabardian |
| kbp | Kabiyé |
| kde | Makonde |
| kdh | Tem |
| kea | Kabuverdianu |
| kek | Q'eqchi' |
| kha | Khasi |
| khk | Mongolian, Halh (Cyrillic) |
| khm | Khmer, Central |
| kin | Rwanda |
| kir | Kirghiz |
| kjh | Khakas |
| kkh_lana | Khün |
| kmb | Mbundu |
| kmr | Kurdish, Northern |
| knc | Kanuri, Central |
| kng | Koongo |
| kng_AO | Koongo (Angola) |
| koi | Komi-Permyak |
| koo | Konjo |
| kor | Korean |
| kqn | Kaonde |
| kqs | Kissi, Northern |
| kri | Krio |
| krl | Karelian |
| ktu | Kituba |
| kwi | Awa-Cuaiquer |
| lad | Ladino |
| lao | Lao |
| lat | Latin |
| lat_1 | Latin (1) |
| lav | Latvian |
| lia | Limba, West-Central |
| lij | Ligurian |
| lin | Lingala |
| lin_tones | Lingala (tones) |
| lit | Lithuanian |
| lld | Ladin |
| lnc | Occitan (Languedocien) |
| lns | Lamnso' |
| lob | Lobi |
| lot | Otuho |
| loz | Lozi |
| ltz | Luxembourgeois |
| lua | Luba-Kasai |
| lue | Luvale |
| lug | Ganda |
| lun | Lunda |
| lus | Mizo |
| mad | Madura |
| mag | Magahi |
| mah | Marshallese |
| mai | Maithili |
| mal | Malayalam |
| mal_chillus | Malayalam |
| mam | Mam, Northern |
| mar | Marathi |
| maz | Mazahua Central |
| mcd | Sharanahua |
| mcf | Matsés |
| men | Mende |
| mfq | Moba |
| mic | Micmac |
| min | Minangkabau |
| miq | Mískito |
| mkd | Macedonian |
| mlt | Maltese |
| mly_arab | Malay (Arabic) |
| mly_latn | Malay (Latin) |
| mnw | Mon |
| mor | Moro |
| mos | Mòoré |
| mri | Maori |
| mto | Mixe, Totontepec |
| mtp | Wichí Lhamtés Nocten |
| mxi | Mozarabic |
| mxv | Mixtec, Metlatónoc |
| mya | Burmese |
| mzi | Mazatec, Ixcatlán |
| nav | Navajo |
| nba | Nyemba |
| nbl | Ndebele |
| ndo | Ndonga |
| nds | Saxon, Low |
| nep | Nepali |
| nhn | Nahuatl, Central |
| nio | Nganasan |
| niu | Niue |
| niv | Gilyak |
| njo | Naga, Ao |
| nku | Kulango, Bouna |
| nld | Dutch |
| nno | Norwegian, Nynorsk |
| nob | Norwegian, Bokmål |
| not | Nomatsiguenga |
| nso | Sotho, Northern |
| nya_chechewa | Nyanja (Chechewa) |
| nya_chinyanja | Nyanja (Chinyanja) |
| nym | Nyamwezi |
| nyn | Nyankore |
| nzi | Nzema |
| oaa | Orok |
| oci_1 | Francoprovençal (Fribourg) |
| oci_2 | Francoprovençal (Savoie) |
| oci_3 | Francoprovençal (Vaud) |
| oci_4 | Francoprovençal (Valais) |
| ojb | Ojibwa, Northwestern |
| oki | Okiek |
| orh | Oroqen |
| oss | Osetin |
| ote | Otomi, Mezquital |
| pam | Pampangan |
| pan | Panjabi, Eastern |
| pap | Papiamentu |
| pau | Palauan |
| pbb | Páez |
| pbu | Pashto, Northern |
| pcd | Picard |
| pcm | Pidgin, Nigerian |
| pes_1 | Farsi, Western |
| pes_2 | Dari |
| pis | Pijin |
| piu | Pintupi-Luritja |
| plt | Malagasy, Plateau |
| pnb | Panjabi, Western |
| pol | Polish |
| pon | Pohnpeian |
| por_BR | Portuguese (Brazil) |
| por_PT | Portuguese (Portugal) |
| pov | Crioulo, Upper Guinea |
| ppl | Pipil |
| prv | Occitan |
| quc | K'iche', Central |
| qud | Quechua (Unified Quichua, old Hispanic orthography) |
| qug | Quichua, Chimborazo Highland |
| qul | Quechua, North Bolivian |
| quy | Quechua, Ayacucho |
| quz | Quechua, Cusco |
| qva | Quechua, Ambo-Pasco |
| qvc | Quechua, Cajamarca |
| qvh | Quechua, Huamalíes-Dos de Mayo Huánuco |
| qvm | Quechua, Margos-Yarowilca-Lauricocha |
| qvn | Quechua, North Junín |
| qwh | Quechua, Huaylas Ancash |
| qxa | Quechua, South Bolivian |
| qxn | Quechua, Northern Conchucos Ancash |
| qxu | Quechua, Arequipa-La Unión |
| rar | Rarotongan |
| rmn | Romani, Balkan |
| rmn_1 | Romani, Balkan (1) |
| rmy | Aromanian |
| roh | Romansch |
| roh_puter | Romansch (Puter) |
| roh_rumgr | Romansch (Grischun) |
| roh_surmiran | Romansch (Surmiran) |
| roh_sursilv | Romansch (Sursilvan) |
| roh_sutsilv | Romansch (Sutsilvan) |
| roh_vallader | Romansch (Vallader) |
| ron_1953 | Romanian (1953) |
| ron_1993 | Romanian (1993) |
| ron_2006 | Romanian (2006) |
| run | Rundi |
| rus | Russian |
| sag | Sango |
| sah | Yakut |
| san | Sanskrit |
| sco | Scots |
| sey | Secoya |
| shk | Shilluk |
| shn | Shan |
| shp | Shipibo-Conibo |
| sin | Sinhala |
| skr | Seraiki |
| slk | Slovak |
| slr | Salar |
| slv | Slovenian |
| sme | North Saami |
| smo | Samoan |
| sna | Shona |
| snk | Soninke |
| snn | Siona |
| som | Somali |
| sot | Sotho, Southern |
| spa | Spanish |
| src | Sardinian, Logudorese |
| srp_cyrl | Serbian (Cyrillic) |
| srp_latn | Serbian (Latin) |
| srq | Sirionó |
| srr | Serer-Sine |
| ssw | Swati |
| suk | Sukuma |
| sun | Sunda |
| sus | Susu |
| swb | Comorian, Maore |
| swe | Swedish |
| swh | Swahili |
| tah | Tahitian |
| tam | Tamil |
| tam_LK | Tamil (Sri Lanka) |
| tat | Tatar |
| tbz | Ditammari |
| tca | Ticuna |
| tel | Telugu |
| tem | Themne |
| tet | Tetun |
| tgk | Tajiki |
| tgl | Tagalog |
| tha | Thai |
| tha2 | Thai (2) |
| tir | Tigrigna |
| tiv | Tiv |
| tji | Tujia, Nothern |
| tly | Talysh |
| tna | Tacana |
| tob | Toba |
| toi | Tonga |
| toj | Tojolabal |
| ton | Tongan |
| top | Totonac, Papantla |
| tpi | Tok Pisin |
| trn | Trinitario |
| tsn | Tswana |
| tso_MZ | Tsonga (Mozambique) |
| tso_ZW | Tsonga (Zimbabwe) |
| tsz | Purepecha |
| tuk_cyrl | Turkmen (Cyrillic) |
| tuk_latn | Turkmen (Latin) |
| tur | Turkish |
| tyv | Tuva |
| tzc | Tzotzil (Chamula) |
| tzh | Tzeltal, Oxchuc |
| tzm | Tamazight, Central Atlas |
| udu | Uduk |
| uig_arab | Uyghur (Arabic) |
| uig_latn | Uyghur (Latin) |
| ukr | Ukrainian |
| umb | Umbundu |
| ura | Urarina |
| urd | Urdu |
| urd_2 | Urdu (2) |
| uzn_cyrl | Uzbek, Northern (Cyrillic) |
| uzn_latn | Uzbek, Northern (Latin) |
| vai | Vai |
| vec | Venetian |
| ven | Venda |
| ven2 | Venda |
| vep | Veps |
| vie | Vietnamese |
| vmw | Makhuwa |
| war | Waray-Waray |
| wln | Walloon |
| wol | Wolof |
| wwa | Waama |
| xho | Xhosa |
| xsm | Kasem |
| yad | Yagua |
| yao | Yao |
| yap | Yapese |
| ydd | Yiddish, Eastern |
| ykg | Yukaghir, Northern |
| yor | Yoruba |
| yrk | Nenets |
| yua | Maya, Yucatán |
| yuz | Yuracare |
| zam | Zapotec, Miahuatlán |
| zdj | Comorian, Ngazidja |
| zgh | Tamazight, Standard Morocan |
| zro | Záparo |
| ztu | Zapotec, Güilá |
| zul | Zulu |
Compatibility
This package is at least compatible with all maintained versions of Node.js. As of now, that is Node.js 18+. It also works in Deno and modern browsers.
Contribute
Yes please! See How to Contribute to Open Source.
Security
This package is safe.
