npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@polyglot-bundles/th-word-lists

v0.2.0

Published

Thai CEFR-aligned word lists - vocabulary collections by proficiency level

Readme

Thai Word Lists (@syllst/th/word-lists)

CEFR-aligned Thai vocabulary collections for language learning.

Overview

This package provides 60+ Thai word lists organized by CEFR proficiency levels (Pre-A1 through C1) and thematic categories.

Installation

pnpm add @syllst/th/word-lists

Usage

Import Full Corpus

import { thaiWordListSets, getWordListSetById } from "@syllst/th/word-lists";

// Get all word lists (60+ lists, ~2600 words)
const allLists = thaiWordListSets;

// Get specific list by ID
const greetings = getWordListSetById("a1-basic-greetings");

Import by CEFR Level (Tree-Shaking)

// Import only A1 vocabulary (~15 lists)
import { a1WordListSets } from "@syllst/th/word-lists/a1";

// Import B1 vocabulary
import { b1WordListSets } from "@syllst/th/word-lists/b1";

// Available levels: pre-a1, a1, a2, b1, b2, c1

Import by Category (Coming Soon)

// Import only greetings vocabulary
import { greetingsLists } from "@syllst/th/word-lists/categories/greetings";

// Available categories: greetings, numbers, colors, food, travel, etc.

Data Structure

Word List Set

interface WordListSet {
  id: string;           // Unique identifier (e.g., "a1-basic-greetings")
  name: string;         // Display name (e.g., "Basic Greetings")
  description?: string; // Description of the list
  examGrade?: ExamGrade; // CEFR level: "Pre-A1" | "A1" | "A2" | "B1" | "B2" | "C1"
  difficulty?: "beginner" | "intermediate" | "advanced";
  category?: string;    // Thematic category (e.g., "greetings", "food")
  words: WordListItem[];
}

Word List Item

interface WordListItem {
  word: string;                    // Thai word (e.g., "สวัสดี")
  translation?: string;            // English translation (e.g., "hello")
  transcriptions?: Record<string, string>; // Multiple transcription schemes
  transliteration?: string;        // @deprecated Use transcriptions instead
  ipa?: string;                    // @deprecated Use transcriptions["ipa"] instead
  partOfSpeech?: string;           // "noun", "verb", "adjective", etc.
  exampleSentence?: string;        // Example usage
  notes?: string;                  // Usage notes
  difficulty?: "beginner" | "intermediate" | "advanced";
  examGrade?: "Pre-A1" | "A1" | "A2" | "B1" | "B2" | "C1" | "C2";
  category?: string;               // Thematic category
  tags?: string[];                 // Search/filter tags
  id?: string;                     // Unique word ID (e.g., "th:vocab:greetings:hello")
  ciliId?: string;                 // CILI concept ID for cross-language linking
  frequency?: number;              // Frequency rank (lower = more common)
  usedInLessons?: string[];        // Lesson IDs that teach this word
  usedInStories?: string[];        // Story IDs that contain this word
}

Transcription Schemes

Thai words support multiple romanization systems:

| Scheme | Description | Example (สวัสดี) | |--------|-------------|------------------| | paiboon+ | Learner-friendly with tone marks | sà-wàt-dii | | aua | AUA phonetic system | sawàtdii | | rtgs | Royal Thai General System | sawatdi | | ipa | International Phonetic Alphabet | /sa˨˩.wat˨˩.diː˧/ |

{
  "word": "สวัสดี",
  "translation": "hello",
  "transcriptions": {
    "paiboon+": "sà-wàt-dii",
    "aua": "sawàtdii",
    "rtgs": "sawatdi",
    "ipa": "/sa˨˩.wat˨˩.diː˧/"
  }
}

JSON Format

Word lists are stored as JSON files in src/json/:

{
  "id": "a1-basic-greetings",
  "name": "Basic Greetings",
  "desc": "Essential greetings and polite expressions",
  "level": "A1",
  "cat": "greetings",
  "difficulty": "beginner",
  "words": ["ดี", "ไป", "มา", "สวัสดี", "ลาก่อน", "ขอบคุณ", "ขอโทษ"]
}

Compact format: Words are stored as string arrays when metadata is uniform. List-level metadata (pos, difficulty, examGrade) is inherited by all words.

Extended format: Words can include individual metadata:

{
  "id": "a1-colors",
  "name": "Colors",
  "level": "A1",
  "cat": "colors",
  "words": [
    {"word": "แดง", "translation": "red", "partOfSpeech": "adjective"},
    {"word": "เขียว", "translation": "green", "partOfSpeech": "adjective"}
  ]
}

API Reference

thaiWordListSets

Array of all word list sets.

import { thaiWordListSets } from "@syllst/th/word-lists";

// Filter by category
const greetings = thaiWordListSets.filter(set => set.category === "greetings");

// Filter by CEFR level
const a1Lists = thaiWordListSets.filter(set => set.examGrade === "A1");

getWordListSetById(id: string)

Get a specific word list by ID.

const colors = getWordListSetById("a1-colors");

getWordListSetsByDifficulty(difficulty: string)

Filter by difficulty level.

const beginner = getWordListSetsByDifficulty("beginner");

getWordListSetsByExamGrade(grade: ExamGrade)

Filter by CEFR exam grade.

const a1 = getWordListSetsByExamGrade("A1");

getWordListSetsByCategory(category: string)

Filter by thematic category.

const food = getWordListSetsByCategory("food");

getWordListCategories()

Get all available categories.

const categories = getWordListCategories();
// ["greetings", "numbers", "colors", "food", "travel", ...]

getAllWordListItems()

Get all words from all lists as a flat array.

const allWords = getAllWordListItems();
// Useful for: frequency analysis, search indexing, flashcard pools

Word List Inventory

Pre-A1 (Most Common Words)

| List ID | Category | Word Count | |---------|----------|------------| | most-common-adjectives | adjectives | 500 | | most-common-adverbs | adverbs | 50 | | most-common-nouns | nouns | 500 | | most-common-verbs | verbs | 200 |

A1 (Beginner)

| List ID | Category | Word Count | |---------|----------|------------| | basic-greetings | greetings | 7 | | body-parts | body | 20 | | colors | colors | 12 | | common-adjectives | adjectives | 50 | | common-verbs | verbs | 50 | | daily-activities | activities | 30 | | family-members | family | 15 | | food-basics | food | 40 | | numbers-1-20 | numbers | 20 | | places-basics | places | 20 | | time-basics | time | 15 |

A2 (Elementary)

| List ID | Category | Word Count | |---------|----------|------------| | animals | animals | 40 | | clothing | clothing | 30 | | emotions | emotions | 25 | | health | health | 30 | | numbers-advanced | numbers | 30 | | restaurant | food | 40 | | school | education | 35 | | shopping | daily life | 30 | | sports | sports | 30 | | travel-phrases | travel | 40 | | weather | weather | 20 |

B1 (Intermediate)

| List ID | Category | Word Count | |---------|----------|------------| | communication | communication | 50 | | descriptions | adjectives | 40 | | entertainment | entertainment | 40 | | food-advanced | food | 50 | | housing | housing | 40 | | nature | nature | 40 | | relationships | social | 40 | | technology | technology | 40 | | time-advanced | time | 30 | | transportation | transport | 40 | | work | work | 50 |

B2 (Upper Intermediate)

| List ID | Category | Word Count | |---------|----------|------------| | abstract-concepts | abstract | 50 | | arts | arts | 40 | | business | business | 50 | | education-advanced | education | 40 | | health-advanced | health | 40 | | idioms | idioms | 40 | | media | media | 40 | | politics | politics | 40 | | science | science | 40 | | travel-advanced | travel | 40 | | verbs-advanced | verbs | 50 |

C1 (Advanced)

| List ID | Category | Word Count | |---------|----------|------------| | academic-vocabulary | academic | 100 | | economics | economics | 50 | | formal-language | formal | 50 | | legal | legal | 50 | | literature | literature | 50 | | nuanced-verbs | verbs | 50 | | philosophy | philosophy | 50 | | professional | professional | 50 | | psychology | psychology | 50 | | sociology | sociology | 50 | | technology-advanced | technology | 50 |

Ingesting New Word Lists

From Anki Decks

import { convertAnkiCardsToWordList } from "@syllst/content-shared";

const ankiCards = await extractAnkiCards("thai-4000.apkg");
const wordList = convertAnkiCardsToWordList(ankiCards, {
  id: "anki-thai-4000",
  name: "Thai 4000 Words",
  examGrade: "B1",
});

From CSV/Spreadsheets

import { convertTabularRowsToWordList } from "@syllst/content-shared";

const rows = await parseCSV("thai-vocab.csv");
const wordList = convertTabularRowsToWordList(rows, {
  word: "thai_word",
  translation: "english",
  pos: "pos",
}, {
  id: "a1-thai-vocab",
  name: "Thai Vocabulary",
  examGrade: "A1",
});

From Frequency Lists

import { convertFrequencyListToWordList } from "@syllst/content-shared";

const freqData = await parseFrequencyList("thai-frequency.tsv");
const wordList = convertFrequencyListToWordList(freqData, {
  id: "thai-frequency-2000",
  name: "Thai 2000 Most Common",
  difficulty: "beginner",
});

Related Packages

  • @syllst/th - Thai syllabi (MDX lessons)
  • @polyglot-bundles/th-lang - Thai alphabet data (consonants, vowels, tones)
  • @syllst/content-shared - Shared utilities and types

License

MIT