@polyglot-bundles/th-word-lists
v0.2.0
Published
Thai CEFR-aligned word lists - vocabulary collections by proficiency level
Maintainers
Readme
Thai Word Lists (@syllst/th/word-lists)
CEFR-aligned Thai vocabulary collections for language learning.
Overview
This package provides 60+ Thai word lists organized by CEFR proficiency levels (Pre-A1 through C1) and thematic categories.
Installation
pnpm add @syllst/th/word-listsUsage
Import Full Corpus
import { thaiWordListSets, getWordListSetById } from "@syllst/th/word-lists";
// Get all word lists (60+ lists, ~2600 words)
const allLists = thaiWordListSets;
// Get specific list by ID
const greetings = getWordListSetById("a1-basic-greetings");Import by CEFR Level (Tree-Shaking)
// Import only A1 vocabulary (~15 lists)
import { a1WordListSets } from "@syllst/th/word-lists/a1";
// Import B1 vocabulary
import { b1WordListSets } from "@syllst/th/word-lists/b1";
// Available levels: pre-a1, a1, a2, b1, b2, c1Import by Category (Coming Soon)
// Import only greetings vocabulary
import { greetingsLists } from "@syllst/th/word-lists/categories/greetings";
// Available categories: greetings, numbers, colors, food, travel, etc.Data Structure
Word List Set
interface WordListSet {
id: string; // Unique identifier (e.g., "a1-basic-greetings")
name: string; // Display name (e.g., "Basic Greetings")
description?: string; // Description of the list
examGrade?: ExamGrade; // CEFR level: "Pre-A1" | "A1" | "A2" | "B1" | "B2" | "C1"
difficulty?: "beginner" | "intermediate" | "advanced";
category?: string; // Thematic category (e.g., "greetings", "food")
words: WordListItem[];
}Word List Item
interface WordListItem {
word: string; // Thai word (e.g., "สวัสดี")
translation?: string; // English translation (e.g., "hello")
transcriptions?: Record<string, string>; // Multiple transcription schemes
transliteration?: string; // @deprecated Use transcriptions instead
ipa?: string; // @deprecated Use transcriptions["ipa"] instead
partOfSpeech?: string; // "noun", "verb", "adjective", etc.
exampleSentence?: string; // Example usage
notes?: string; // Usage notes
difficulty?: "beginner" | "intermediate" | "advanced";
examGrade?: "Pre-A1" | "A1" | "A2" | "B1" | "B2" | "C1" | "C2";
category?: string; // Thematic category
tags?: string[]; // Search/filter tags
id?: string; // Unique word ID (e.g., "th:vocab:greetings:hello")
ciliId?: string; // CILI concept ID for cross-language linking
frequency?: number; // Frequency rank (lower = more common)
usedInLessons?: string[]; // Lesson IDs that teach this word
usedInStories?: string[]; // Story IDs that contain this word
}Transcription Schemes
Thai words support multiple romanization systems:
| Scheme | Description | Example (สวัสดี) |
|--------|-------------|------------------|
| paiboon+ | Learner-friendly with tone marks | sà-wàt-dii |
| aua | AUA phonetic system | sawàtdii |
| rtgs | Royal Thai General System | sawatdi |
| ipa | International Phonetic Alphabet | /sa˨˩.wat˨˩.diː˧/ |
{
"word": "สวัสดี",
"translation": "hello",
"transcriptions": {
"paiboon+": "sà-wàt-dii",
"aua": "sawàtdii",
"rtgs": "sawatdi",
"ipa": "/sa˨˩.wat˨˩.diː˧/"
}
}JSON Format
Word lists are stored as JSON files in src/json/:
{
"id": "a1-basic-greetings",
"name": "Basic Greetings",
"desc": "Essential greetings and polite expressions",
"level": "A1",
"cat": "greetings",
"difficulty": "beginner",
"words": ["ดี", "ไป", "มา", "สวัสดี", "ลาก่อน", "ขอบคุณ", "ขอโทษ"]
}Compact format: Words are stored as string arrays when metadata is uniform. List-level metadata (pos, difficulty, examGrade) is inherited by all words.
Extended format: Words can include individual metadata:
{
"id": "a1-colors",
"name": "Colors",
"level": "A1",
"cat": "colors",
"words": [
{"word": "แดง", "translation": "red", "partOfSpeech": "adjective"},
{"word": "เขียว", "translation": "green", "partOfSpeech": "adjective"}
]
}API Reference
thaiWordListSets
Array of all word list sets.
import { thaiWordListSets } from "@syllst/th/word-lists";
// Filter by category
const greetings = thaiWordListSets.filter(set => set.category === "greetings");
// Filter by CEFR level
const a1Lists = thaiWordListSets.filter(set => set.examGrade === "A1");getWordListSetById(id: string)
Get a specific word list by ID.
const colors = getWordListSetById("a1-colors");getWordListSetsByDifficulty(difficulty: string)
Filter by difficulty level.
const beginner = getWordListSetsByDifficulty("beginner");getWordListSetsByExamGrade(grade: ExamGrade)
Filter by CEFR exam grade.
const a1 = getWordListSetsByExamGrade("A1");getWordListSetsByCategory(category: string)
Filter by thematic category.
const food = getWordListSetsByCategory("food");getWordListCategories()
Get all available categories.
const categories = getWordListCategories();
// ["greetings", "numbers", "colors", "food", "travel", ...]getAllWordListItems()
Get all words from all lists as a flat array.
const allWords = getAllWordListItems();
// Useful for: frequency analysis, search indexing, flashcard poolsWord List Inventory
Pre-A1 (Most Common Words)
| List ID | Category | Word Count |
|---------|----------|------------|
| most-common-adjectives | adjectives | 500 |
| most-common-adverbs | adverbs | 50 |
| most-common-nouns | nouns | 500 |
| most-common-verbs | verbs | 200 |
A1 (Beginner)
| List ID | Category | Word Count |
|---------|----------|------------|
| basic-greetings | greetings | 7 |
| body-parts | body | 20 |
| colors | colors | 12 |
| common-adjectives | adjectives | 50 |
| common-verbs | verbs | 50 |
| daily-activities | activities | 30 |
| family-members | family | 15 |
| food-basics | food | 40 |
| numbers-1-20 | numbers | 20 |
| places-basics | places | 20 |
| time-basics | time | 15 |
A2 (Elementary)
| List ID | Category | Word Count |
|---------|----------|------------|
| animals | animals | 40 |
| clothing | clothing | 30 |
| emotions | emotions | 25 |
| health | health | 30 |
| numbers-advanced | numbers | 30 |
| restaurant | food | 40 |
| school | education | 35 |
| shopping | daily life | 30 |
| sports | sports | 30 |
| travel-phrases | travel | 40 |
| weather | weather | 20 |
B1 (Intermediate)
| List ID | Category | Word Count |
|---------|----------|------------|
| communication | communication | 50 |
| descriptions | adjectives | 40 |
| entertainment | entertainment | 40 |
| food-advanced | food | 50 |
| housing | housing | 40 |
| nature | nature | 40 |
| relationships | social | 40 |
| technology | technology | 40 |
| time-advanced | time | 30 |
| transportation | transport | 40 |
| work | work | 50 |
B2 (Upper Intermediate)
| List ID | Category | Word Count |
|---------|----------|------------|
| abstract-concepts | abstract | 50 |
| arts | arts | 40 |
| business | business | 50 |
| education-advanced | education | 40 |
| health-advanced | health | 40 |
| idioms | idioms | 40 |
| media | media | 40 |
| politics | politics | 40 |
| science | science | 40 |
| travel-advanced | travel | 40 |
| verbs-advanced | verbs | 50 |
C1 (Advanced)
| List ID | Category | Word Count |
|---------|----------|------------|
| academic-vocabulary | academic | 100 |
| economics | economics | 50 |
| formal-language | formal | 50 |
| legal | legal | 50 |
| literature | literature | 50 |
| nuanced-verbs | verbs | 50 |
| philosophy | philosophy | 50 |
| professional | professional | 50 |
| psychology | psychology | 50 |
| sociology | sociology | 50 |
| technology-advanced | technology | 50 |
Ingesting New Word Lists
From Anki Decks
import { convertAnkiCardsToWordList } from "@syllst/content-shared";
const ankiCards = await extractAnkiCards("thai-4000.apkg");
const wordList = convertAnkiCardsToWordList(ankiCards, {
id: "anki-thai-4000",
name: "Thai 4000 Words",
examGrade: "B1",
});From CSV/Spreadsheets
import { convertTabularRowsToWordList } from "@syllst/content-shared";
const rows = await parseCSV("thai-vocab.csv");
const wordList = convertTabularRowsToWordList(rows, {
word: "thai_word",
translation: "english",
pos: "pos",
}, {
id: "a1-thai-vocab",
name: "Thai Vocabulary",
examGrade: "A1",
});From Frequency Lists
import { convertFrequencyListToWordList } from "@syllst/content-shared";
const freqData = await parseFrequencyList("thai-frequency.tsv");
const wordList = convertFrequencyListToWordList(freqData, {
id: "thai-frequency-2000",
name: "Thai 2000 Most Common",
difficulty: "beginner",
});Related Packages
- @syllst/th - Thai syllabi (MDX lessons)
- @polyglot-bundles/th-lang - Thai alphabet data (consonants, vowels, tones)
- @syllst/content-shared - Shared utilities and types
License
MIT
