npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

tltk-js

v0.1.4

Published

Pure JS port of TLTK (Thai Language Toolkit)

Readme

TLTK-JS

A JavaScript/TypeScript port of the Python TLTK library for Thai text processing.

Installation

npm install tltk-js

Usage

import { g2p, th2roman } from 'tltk-js';

// Convert Thai to IPA
const ipa = g2p("สวัสดี");
// Output: สวัส~ดี<tr/>sa1'wat1~dii0|<s/>

// Convert Thai to Romanized form
const roman = th2roman("สวัสดี", { stripTags: true });
// Output: sawatdi

API

g2p(input: string, options?: TLTKOptions): string

Converts Thai text to IPA transcription.

th2roman(input: string, options?: TLTKOptions): string

Converts Thai text to Romanized form (RTGS approximation).

Options

  • stripTags?: boolean - If true, removes XML-like tags and separators from output. Default: false.
  • fallbackHeuristics?: boolean - If true, enables heuristic IPA generation for unknown syllables AND consonant shifting logic for clusters. Default: false (matches Python TLTK behavior).

Architecture & Design Rationale

Core Pipeline

Input -> preprocess -> sylparse -> wordparse -> selectPhones -> Output
  1. preprocess: Handles mixed Thai/English, spacing, and special characters.
  2. sylparse: Syllable segmentation using regex patterns and trigram probabilities.
  3. wordparse: Word segmentation using dictionary lookup (TDICT) and chart parsing.
  4. selectPhones: Selects the best pronunciation for each syllable from tltk_data.json.

Heuristic Fallback (src/heuristics.ts)

The Problem with Python TLTK

Python TLTK relies entirely on its dictionary (tltk_data.json) for pronunciation lookups. When it encounters a syllable not in the dictionary, it:

  1. Silently drops the syllable from the output, OR
  2. Truncates the remaining text after the unknown syllable.

Example:

Input:  "โอม มฤกกึกกึย"  (Mantra with rare/nonsense syllables)
Python: "om marue"       (The "กกึ", "กกึย" parts are lost!)

This is problematic for applications that need to handle:

  • Religious/mantra texts with non-standard combinations
  • User-generated content with typos
  • Rare or archaic Thai words not in the dictionary
  • Invented words or names

Our Solution: Heuristic IPA Generation

Instead of silent failure, we implemented guessIPA() in src/heuristics.ts:

  1. When invoked: Only when selectPhones() finds zero pronunciations for a syllable.
  2. What it does: Analyzes Thai graphemes and constructs a plausible IPA string using:
    • Consonant mappings (initial vs. final position)
    • Vowel mappings
    • Implicit vowel insertion rules (e.g., กกkok)
  3. Result: The syllable is preserved with an approximate pronunciation instead of being dropped.

Same example with JS TLTK:

Input:  "โอม มฤกกึกกึย"
JS:     "om maruek kuek kuei"  (All syllables preserved!)

Key Guarantee: Standard Thai is Unaffected

For valid, standard Thai phrases, heuristics.ts is NEVER invoked.

The dictionary lookup in selectPhones handles all known words. This ensures:

  • 100% parity with Python TLTK for compliant inputs (verified by verify_parity.js)
  • Graceful degradation for unknown inputs (tested by test_deviations.mjs)

Why Not Just Add Words to the Dictionary?

Adding every possible syllable combination to tltk_data.json is impractical because:

  1. The dictionary is already ~17MB
  2. Nonsense/mantra words are infinite variations
  3. Typos and invented words cannot be pre-enumerated

A heuristic approach provides reasonable coverage without bloating the data file.

Consonant Shifting in th2roman

When a segment starts with a double consonant (e.g., kkue) and the previous segment ends with a vowel (e.g., marue), the shifting logic moves the first consonant to close the previous syllable:

marue + kkue  ->  maruek + kue

This is applied via regex:

tran.replace(/([aeiou])\s+([bcdfghjklmnpqrstvwxyz])\2/g, "$1$2 $2");

Test Suites

1. verify_parity.js - Parity Tests

Purpose: Ensure JS output matches Python TLTK for standard inputs.

Data Source: ground_truth.json generated from Python TLTK.

Usage:

node verify_parity.js

Expected: 100% pass rate. Any failure indicates a regression.


2. test_deviations.mjs - Enhancement Tests

Purpose: Test enhanced behavior for edge cases where we intentionally deviate from Python TLTK.

Rationale: Python TLTK truncates/silently drops unknown syllables. Our JS port provides a heuristic fallback instead. This test suite validates that fallback produces reasonable output.

Example:

{
    input: "โอม มฤกกึกกึย",
    // Python TLTK: "om marue" (truncated)
    // JS TLTK:     "om maruek kuek kuei" (enhanced)
    expectedRomKeywords: ["om", "maruek", "kue", "kuei"]
}

Usage:

node test_deviations.mjs

3. test_dist.js - Smoke Test

Purpose: Quick sanity check that the built distribution is importable and functional.

Usage:

node test_dist.js

Summary Table

| Test File | Purpose | Expected Behavior | |---------------------|-----------------------------|----------------------------------------| | verify_parity.js | Standard Thai parity | 100% match with Python TLTK | | test_deviations.mjs | Unknown syllable handling | Enhanced output (no truncation) | | test_dist.js | Build smoke test | No errors, basic output |


License

MIT