npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@mera-vansh/ms-ltd

v2.2.1

Published

Zero-dependency TypeScript NLP engine for multilingual Indian-language applications

Readme

ms-ltd

Multilingual Semantic Language Toolkit — Deterministic

Zero-dependency TypeScript NLP engine for multilingual Indian-language applications. Provides TF-IDF retrieval, emotion/tone classification, Unicode script detection, language identification, Sanskrit grammar tools, and a ~2200-entry curated lexicon across 18 Indian languages — all without any external runtime dependencies.

npm version license types zero deps


Why ms-ltd?

Most NLP libraries require heavy models, Python environments, or large dependency trees. ms-ltd is different:

  • Zero runtime dependencies — ships nothing but your own code
  • Deterministic — no ML models; every classification is explainable and auditable
  • Multilingual by design — built specifically for the 18 official Indian languages
  • Works offline — no API calls, no network, no cloud
  • Tree-shakeable — import only what you need; full TypeScript types included

Supported Languages

| Code | Language | Script | Code | Language | Script | |------|-----------|------------|------|-----------|------------| | en | English | Latin | pa | Punjabi | Gurmukhi | | hi | Hindi | Devanagari | or | Odia | Odia | | mr | Marathi | Devanagari | sa | Sanskrit | Devanagari | | ne | Nepali | Devanagari | kok| Konkani | Devanagari | | ma | Maithili | Devanagari | ta | Tamil | Tamil | | bn | Bengali | Bengali | te | Telugu | Telugu | | as | Assamese | Bengali | ml | Malayalam | Malayalam | | gu | Gujarati | Gujarati | kn | Kannada | Kannada | | ur | Urdu | Arabic | sd | Sindhi | Arabic |


Installation

npm install @mera-vansh/ms-ltd
# or
yarn add @mera-vansh/ms-ltd
# or
pnpm add @mera-vansh/ms-ltd

Requirements: Node.js ≥ 22


Quick Start

import { LTD } from "@mera-vansh/ms-ltd";

// 1. Create engine
const ltd = new LTD();

// 2. Ingest your knowledge base
ltd.ingest([
  { id: "g1", text: "भारद्वाज गोत्र के बारे में जानकारी", metadata: { topic: "gotra" } },
  { id: "g2", text: "Bharadwaj gotra pravara rishis", metadata: { topic: "gotra" } },
  { id: "r1", text: "माता पिता का रिश्ता पारिवारिक संबंध", metadata: { topic: "family" } },
  { id: "r2", text: "mother father relation family tree", metadata: { topic: "family" } },
]);

// 3. Query
const result = ltd.call("मेरा गोत्र भारद्वाज है, पिताजी का नाम राम है");

console.log(result);
// {
//   input:      "मेरा गोत्र भारद्वाज है, पिताजी का नाम राम है",
//   lang:       "hi",
//   script:     "Devanagari",
//   emotion:    "NEUTRAL",
//   tone:       "NEUTRAL",
//   candidates: [{ id: "g1", score: 0.82, metadata: { topic: "gotra" } }, ...],
//   confidence: 0.82
// }

Core Concepts

The LTD Pipeline

Every call to ltd.call(input) runs a six-stage pipeline:

Raw text
  → [1] Normalise        NFKC + lowercase + zero-width strip
  → [2] Detect Script    Unicode block counting (10 scripts)
  → [3] Detect Language  Whole-word seed-keyword disambiguation (18 languages)
  → [4] Retrieve         TF-IDF cosine similarity against ingested docs
  → [5] Emotion          Deterministic keyword + regex rules
  → [6] Tone             Deterministic honorific + regex rules
  → LTDResponse

Documents (Knowledge Base)

You train the engine by ingesting IngestDocument objects. Each document has:

  • id — unique string key (use for feedback, remove, lookup)
  • text — the content to vectorise and search
  • metadata — arbitrary key/value pairs returned with retrieval results

Retrieval (TF-IDF + Cosine Similarity)

Documents are represented as sparse TF-IDF vectors. Queries are vectorised using the same vocabulary and ranked by cosine similarity weighted by each document's feedback weight.

Feedback Loop

After a retrieval, you can signal whether a result was helpful:

ltd.feedback("g1", "positive"); // boost: weight × 1.1, max 3.0
ltd.feedback("g1", "negative"); // penalise: weight × 0.9, min 0.1

API Reference

new LTD(options?)

const ltd = new LTD({ defaultTopK: 5 });

| Option | Type | Default | Description | |---|---|---|---| | defaultTopK | number | 5 | Default number of candidates returned |


ltd.ingest(docs)

Indexes a batch of documents. Fits the TF-IDF vocabulary on the entire batch so IDF scores are globally calibrated. Prefer one large ingest() call over many small ones.

ltd.ingest([
  { id: "q1", text: "namaste greetings hello", metadata: { lang: "en" } },
  { id: "q2", text: "नमस्ते प्रणाम", metadata: { lang: "hi" } },
  { id: "q3", text: "வணக்கம் நன்றி", metadata: { lang: "ta" } },
]);

Calling ingest() again with different documents adds to the store (does not replace previous entries). Entries with duplicate IDs are overwritten.


ltd.add(doc)

Adds a single document without re-fitting the vocabulary. Useful for incremental additions after initial ingestion. Terms not in the fitted vocabulary are silently ignored.

ltd.add({ id: "q4", text: "bonjour", metadata: { lang: "fr" } });

ltd.call(input, targetLang?, topK?)

Runs the full NLP pipeline.

const res = ltd.call("guruji ka ashirwad chahiye", undefined, 3);
// res.tone === "REVERENTIAL"
// res.emotion === "REVERENCE"
// res.lang === "hi" (or null for mixed script)

Returns LTDResponse:

interface LTDResponse {
  input:      string;               // normalised input
  lang:       LangCode | null;      // detected language
  script:     Script;               // dominant script
  emotion:    Emotion;              // emotional register
  tone:       Tone;                 // formality register
  candidates: LTDCandidate[];       // ranked retrieval results
  confidence: number;               // top candidate score [0, 1]
}

ltd.suggest(input, maxResults?)

Searches the built-in ~2200-entry LEXICON for entries whose text or romanized form starts with the given prefix. Uses a lazy-built LexiconTrie (memoized on first call per instance).

import { LTD } from "@mera-vansh/ms-ltd";
import type { LexiconSuggestion } from "@mera-vansh/ms-ltd";

const ltd = new LTD();

// Devanagari prefix
const hits = ltd.suggest("नमस्ते");
// → [{ entry: { text: "नमस्ते", lang: "hi", romanized: "namaste", gloss: "hello", category: "salutation" }, matchedPrefix: "नमस्ते" }]

// IAST romanized prefix (diacritics supported)
const kinship = ltd.suggest("dādā", 3);
// → up to 3 results matching paternal-grandfather entries across languages

// Geography via IAST
const rivers = ltd.suggest("gaṃgā");
// → Ganga entries (Hindi, Sanskrit, etc.)

// Limit results
const few: LexiconSuggestion[] = ltd.suggest("न", 5);

// Returns empty array for no match (never null/undefined)
ltd.suggest("xyz_no_match"); // → []

| Parameter | Type | Default | Description | |---|---|---|---| | input | string | — | Devanagari, IAST-romanized, or plain-ASCII prefix | | maxResults | number | 10 | Maximum suggestions returned |

Returns LexiconSuggestion[]:

interface LexiconSuggestion {
  entry:         LexiconEntry;  // the matched lexicon entry
  matchedPrefix: string;        // the lowercased prefix that matched
}

ltd.feedback(id, signal)

Adjusts a document's retrieval weight based on user feedback.

ltd.feedback("q1", "positive"); // weight × 1.1, capped at 3.0
ltd.feedback("q1", "negative"); // weight × 0.9, floored at 0.1
ltd.feedback("q1", "neutral");  // no change

ltd.export() / ltd.import(state)

Persist and restore the full engine state. The snapshot is fully JSON-serialisable.

// Save to MongoDB / Redis / disk
const snapshot = ltd.export();
await db.collection("brain").replaceOne({ _id: "v1" }, snapshot, { upsert: true });

// Restore in a new process
const saved = await db.collection("brain").findOne({ _id: "v1" });
const ltd2 = new LTD();
ltd2.import(saved);

ltd.reset()

Clears all documents and resets the vectoriser.

ltd.reset();
ltd.storeSize(); // → 0

ltd.storeSize() / ltd.hasDocument(id)

ltd.storeSize();           // number of indexed documents
ltd.hasDocument("q1");     // → true / false

Emotion Detection

detectEmotion(text) classifies text into one of six emotional registers in strict priority order:

| Priority | Emotion | Triggers | |---|---|---| | 1 | REVERENCE | namaste, pranam, jai, श्री, ওম, ੴ, ನಮಸ್ಕಾರ, हरे कृष्ण, राम राम, ॐ नमः शिवाय, and 20+ cross-script equivalents | | 2 | JOY | good, great, धन्यवाद, நன்றி, ধন্যবাদ, 😊🎉 | | 3 | GRIEF | died, मृत्यु, மரணம், మరణం, مرحوم, 😢💔 | | 4 | ANGER | wrong, error, गलत, தவறு, ভুল, غلط, 😠😡 | | 5 | CONFUSION | confused, समझ नहीं, புரியவில்லை, అర్థం కాలేదు, ?? | | 6 | NEUTRAL | (fallback — no rule matched) |

import { detectEmotion } from "@mera-vansh/ms-ltd";

detectEmotion("नमस्ते, आपका स्वागत है");  // → "REVERENCE"
detectEmotion("हरे कृष्ण");               // → "REVERENCE"
detectEmotion("ॐ नमः शिवाय");            // → "REVERENCE"
detectEmotion("bahut acha kiya!");          // → "JOY"
detectEmotion("wrong answer!!");            // → "ANGER"
detectEmotion("what do you mean??");        // → "CONFUSION"
detectEmotion("my gotra is Bharadwaj");     // → "NEUTRAL"

Hindu devotional invocations (added in Sprint 2)

The following high-frequency devotional phrases are recognised as REVERENCE keywords:

| Phrase | Transliteration | |--------|----------------| | हरे कृष्ण | Hare Krishna | | राम राम | Ram Ram | | जय श्री राम | Jai Shri Ram | | हर हर महादेव | Har Har Mahadev | | जय माता दी | Jai Mata Di | | ॐ नमः शिवाय | Om Namah Shivay | | जय जय | Jai Jai | | | Om |

These take REVERENCE priority, so co-occurring joy or anger keywords are overridden.

Using EMOTION_RULES directly

import { EMOTION_RULES } from "@mera-vansh/ms-ltd";

// Inspect or extend the rule set
EMOTION_RULES.forEach(rule => {
  console.log(rule.emotion, "—", rule.keywords.length, "keywords");
});

Tone Detection

detectTone(text) classifies formality/register in strict priority order:

| Priority | Tone | Triggers | |---|---|---| | 1 | REVERENTIAL | param pujya, guruji, swami, माता श्री, ஐயா, అయ్యా | | 2 | FORMAL | Dr., Mr., aap, आप, shriman, full proper names | | 3 | URGENT | abhi, jaldi, immediately, asap, right now, !! | | 4 | CURIOUS | ?, why, how, what, kyun, kaise, batao | | 5 | INFORMAL | tu, tum, yaar, dost, bhai, lol, haha | | 6 | NEUTRAL | (fallback — no rule matched) |

import { detectTone } from "@mera-vansh/ms-ltd";

detectTone("Param Pujya Guruji ka ashirwad");  // → "REVERENTIAL"
detectTone("Dr. Sharma please help");           // → "FORMAL"
detectTone("abhi batao, urgent!");              // → "URGENT"
detectTone("gotra kya hota hai?");              // → "CURIOUS"
detectTone("yaar bata na");                     // → "INFORMAL"
detectTone("my gotra is Bharadwaj");            // → "NEUTRAL"

Priority in action

When multiple rules fire, the highest-priority rule wins:

detectTone("aap ko pujya mata shri pranam");
// "aap" → FORMAL, "pujya mata shri" → REVERENTIAL
// → "REVERENTIAL" (priority 1 beats priority 2)

detectTone("Dr. Sharma abhi aao");
// "Dr." → FORMAL, "abhi" → URGENT
// → "FORMAL" (priority 2 beats priority 3)

Script & Language Detection

ScriptDetector.detectScript(text)

Identifies the dominant Unicode writing system:

import { ScriptDetector } from "@mera-vansh/ms-ltd";

ScriptDetector.detectScript("नमस्ते");           // → "Devanagari"
ScriptDetector.detectScript("hello world");       // → "Latin"
ScriptDetector.detectScript("வணக்கம்");          // → "Tamil"
ScriptDetector.detectScript("hello नमस्ते");     // → "Mixed"
ScriptDetector.detectScript("123 !@#");           // → "Unknown"

Returns one of: Devanagari | Tamil | Telugu | Bengali | Gujarati | Odia | Malayalam | Kannada | Gurmukhi | Arabic | Latin | Mixed | Unknown

ScriptDetector.detectLanguage(text)

Narrows the script to a specific language using whole-word seed-keyword disambiguation. Characters are compared as complete tokens (split on whitespace and punctuation) — never as substrings — so Devanagari words shared between Hindi, Nepali, Marathi, and Sanskrit are correctly disambiguated.

ScriptDetector.detectLanguage("मेरा गोत्र भारद्वाज है");  // → "hi"
ScriptDetector.detectLanguage("माझ्या घरी आहे");           // → "mr"
ScriptDetector.detectLanguage("मेरो नाम के छ");            // → "ne"
ScriptDetector.detectLanguage("भवति करोति");               // → "sa"  (Sanskrit)
ScriptDetector.detectLanguage("মোৰ গোত্ৰ কি");            // → "as"  (Assamese, not Bengali)
ScriptDetector.detectLanguage("hello मेरा");               // → null  (Mixed script)

ScriptDetector.detectMixedScripts(text, threshold?)

Returns all scripts present above a share threshold (default 10%):

ScriptDetector.detectMixedScripts("hello नमस्ते world गोत्र");
// → ["Devanagari", "Latin"]

ScriptDetector.detectMixedScripts("hello नमस्ते world गोत्र test", 0.5);
// → ["Devanagari"]  (Latin < 50% with high threshold)

LEXICON

The built-in LEXICON contains ~2200 curated entries across 7 semantic categories covering 18 languages, with IAST romanizations and English glosses.

import { LEXICON } from "@mera-vansh/ms-ltd";
import type { LexiconCategory, LexiconEntry } from "@mera-vansh/ms-ltd";

Categories

| Category | Description | Example entries | |---|---|---| | salutation | Greetings and farewells | नमस्ते, வணக்கம், నమస్కారం | | kinship | Family relations (24 roles × 18 langs) | पिता, माता, दादा, नानी | | emotion_rasa | The 9 Sanskrit rasas | SHRINGAR, KARUNA, VEERA, SHANTA… | | geography | Sacred rivers, pilgrimage sites | गंगा (gaṃgā), काशी (kāśī) | | literature | Epics and canonical texts | रामायण (rāmāyaṇa), महाभारत | | time | Vikrama Samvat months, days, tithis | चैत्र (caitra), सोमवार | | number | Numerals in Devanagari, Tamil, etc. | ०१२, ௦௧௨ |

Accessing entries

// All salutations
const salutations = LEXICON.salutation;
// → [{ text: "नमस्ते", lang: "hi", romanized: "namaste", gloss: "hello", category: "salutation" }, ...]

// Spot-check: Hindi father
LEXICON.kinship.find(e => e.subcategory === "father" && e.lang === "hi");
// → { text: "पिता", lang: "hi", romanized: "pitā", gloss: "father", category: "kinship", subcategory: "father" }

// All 9 rasas
const rasas = LEXICON.emotion_rasa
  .filter(e => e.lang === "sa")
  .map(e => e.subcategory);
// → ["SHRINGAR", "HASYA", "KARUNA", "RAUDRA", "VEERA", "BHAYANAK", "BIBHATSA", "ADBHUTA", "SHANTA"]

LexiconEntry shape

interface LexiconEntry {
  text:         string;           // native script form
  lang:         LangCode;         // BCP-47 language code
  romanized:    string;           // IAST transliteration
  gloss:        string;           // English definition
  category:     LexiconCategory;  // one of the 7 categories above
  subcategory?: string;           // e.g. "father", "SHRINGAR", "samvat_month"
}

LexiconTrie

LexiconTrie is a Unicode-safe BFS prefix trie for fast autocomplete over any set of LexiconEntry objects.

import { LexiconTrie, LEXICON } from "@mera-vansh/ms-ltd";
import type { LexiconEntry } from "@mera-vansh/ms-ltd";

const trie = new LexiconTrie();

// Build from full LEXICON (index by both native text and IAST romanized)
for (const cat of Object.values(LEXICON)) {
  for (const entry of cat) {
    trie.insert(entry.text, entry);
    trie.insert(entry.romanized, entry);
  }
}

// Prefix search
const results = trie.suggest("नम", 5);
// → up to 5 LexiconSuggestion objects for entries starting with "नम"

// Trie size
console.log(trie.size); // number of entries indexed

// Case-insensitive prefix (suggest() lowercases the prefix)
trie.suggest("NAMASTE");  // same as trie.suggest("namaste")

Empty / invalid input

trie.insert("", entry);     // silently ignored — empty keys are rejected
trie.suggest("");            // → []
trie.suggest("xyz_none");   // → []

BFS ordering

Results are returned in breadth-first order (shallower / shorter matches first), so exact matches bubble to the top of the result list.


Grammar Tools

Transliterator

Bidirectional transliteration between IAST romanization and 9 Indic scripts.

import { Transliterator } from "@mera-vansh/ms-ltd";

Transliterator.iastToScript(iast, scheme)

Converts an IAST string to the target Indic script. Each phoneme maps to its standalone glyph (vowels output independent vowel characters, not matra vowel signs).

Transliterator.iastToScript("k",    "Devanagari");  // → "क"
Transliterator.iastToScript("kh",   "Devanagari");  // → "ख"
Transliterator.iastToScript("ā",    "Devanagari");  // → "आ"  (standalone vowel)
Transliterator.iastToScript("rāma", "Devanagari");  // → "रआमअ"  (standalone vowels, not matras)

Transliterator.iastToScript("k",    "Bengali");     // → "ক"
Transliterator.iastToScript("k",    "Tamil");        // → "க"  (Tamil is lossy — voiced/aspirated → same glyph)
Transliterator.iastToScript("kh",   "Tamil");        // → "க"  (same as k)
Transliterator.iastToScript("k",    "Telugu");       // → "క"
Transliterator.iastToScript("k",    "Gujarati");     // → "ક"
Transliterator.iastToScript("k",    "Gurmukhi");     // → "ਕ"
Transliterator.iastToScript("k",    "Odia");         // → "କ"
Transliterator.iastToScript("k",    "Kannada");      // → "ಕ"
Transliterator.iastToScript("k",    "Malayalam");    // → "ക"

// Non-IAST characters pass through unchanged
Transliterator.iastToScript("rāma 123", "Devanagari");  // → "रआमअ 123"
Transliterator.iastToScript("rāma!",    "Devanagari");  // → "रआमअ!"

Supported schemes: Devanagari | Bengali | Tamil | Telugu | Gujarati | Gurmukhi | Odia | Kannada | Malayalam

Note: The output uses standalone vowel characters (e.g., , ), not Devanagari matra vowel signs (U+093E–U+094C). This is intentional for phoneme-level mapping. For fully-formed syllabic Devanagari (consonant + matra), a syllabification pass is needed.

Transliterator.devanagariToIast(text)

Maps standalone Devanagari characters to IAST. Characters not in the mapping (matras, conjuncts) pass through unchanged.

Transliterator.devanagariToIast("क");   // → "k"
Transliterator.devanagariToIast("आ");   // → "ā"
Transliterator.devanagariToIast("म");   // → "m"
Transliterator.devanagariToIast("ा");   // → "ा"  (U+093E matra — passthrough, not IAST ā)

Transliterator.isIAST(text)

Returns true if the string contains any IAST diacritic character.

Transliterator.isIAST("rāma");          // → true  (ā)
Transliterator.isIAST("saṃskṛta");      // → true  (ṃ, ṛ)
Transliterator.isIAST("rama");          // → false  (no diacritics)
Transliterator.isIAST("नमस्ते");        // → false  (Devanagari, no IAST diacritics)

VibhaktiEngine

Sanskrit nominal inflection across all 8 vibhaktis (cases) and 3 vacanas (numbers) for 6 stem classes.

import { VibhaktiEngine } from "@mera-vansh/ms-ltd";
import type { Vibhakti, Vacana, StemClass, InflectedForm } from "@mera-vansh/ms-ltd";

Stem classes

| Class | Characteristic | Example | Note | |---|---|---|---| | a_m | -a masculine | rām → rāmaḥ | Pass stem WITHOUT thematic -a | | aa_f | -ā feminine | sīt → sītā | Pass stem WITHOUT final -ā | | i_m | -i masculine | kav → kaviḥ | Pass stem WITHOUT final -i | | ii_f | -ī feminine | nad → nadī | Pass stem WITHOUT final -ī | | u_m | -u masculine | bandh → bandhuḥ | Pass stem WITHOUT final -u | | cons | consonant-final | rāj → rāj | Nominative sg has empty suffix |

VibhaktiEngine.inflect(stem, stemClass, vibhakti, vacana)

VibhaktiEngine.inflect("rām", "a_m", 1, "sg");
// → { form: "rāmaḥ", vibhakti: 1, vacana: "sg", linga: "m", kāraka: "kartā" }

VibhaktiEngine.inflect("sīt", "aa_f", 3, "sg");
// → { form: "sītayā", vibhakti: 3, vacana: "sg", linga: "f", kāraka: "karaṇa" }

VibhaktiEngine.inflect("kav", "i_m", 4, "sg");
// → { form: "kavaye", vibhakti: 4, vacana: "sg", linga: "m", kāraka: "sampradāna" }

Vibhakti numbers 1–8 correspond to: Nominative (kartā), Accusative (karma), Instrumental (karaṇa), Dative (sampradāna), Ablative (apādāna), Genitive (sambandha), Locative (adhikaraṇa), Vocative (sambodhan).

VibhaktiEngine.paradigm(stem, stemClass)

Returns all 24 forms (8 vibhaktis × 3 vacanas):

const paradigm = VibhaktiEngine.paradigm("rām", "a_m");
paradigm.length;          // → 24
paradigm[0]!.form;        // → "rāmaḥ"  (Nom sg)
paradigm[0]!.kāraka;      // → "kartā"

GenderAgreement

Adjective agreement and honorific pronoun detection for Hindi, Marathi, and Sanskrit.

import { GenderAgreement } from "@mera-vansh/ms-ltd";

GenderAgreement.agreeAdjective(adjStem, linga, vacana, lang)

Inflects an adjective to agree with its noun in gender, number, and language.

// Hindi / Marathi (lang: "hi" | "mr")
GenderAgreement.agreeAdjective("acchā", "m", "sg", "hi");  // → "acchā"
GenderAgreement.agreeAdjective("acchā", "f", "sg", "hi");  // → "acchī"
GenderAgreement.agreeAdjective("acchā", "m", "pl", "hi");  // → "acche"

// Sanskrit (lang: "sa")
GenderAgreement.agreeAdjective("sundarā", "m", "sg", "sa"); // → "sundaraḥ"
GenderAgreement.agreeAdjective("sundarā", "f", "sg", "sa"); // → "sundarā"
GenderAgreement.agreeAdjective("sundarā", "n", "sg", "sa"); // → "sundaram"

// Unsupported language — returns stem unchanged
GenderAgreement.agreeAdjective("acchā", "m", "sg", "ta");  // → "acchā"

GenderAgreement.isHonorificPronoun(word, lang)

Returns true if the word is the honorific 2nd-person pronoun in the given language.

GenderAgreement.isHonorificPronoun("आप",      "hi");  // → true   (Hindi)
GenderAgreement.isHonorificPronoun("aap",      "hi");  // → true   (romanized)
GenderAgreement.isHonorificPronoun("तुम",      "hi");  // → false
GenderAgreement.isHonorificPronoun("तपाईं",   "ne");  // → true   (Nepali)
GenderAgreement.isHonorificPronoun("आपण",      "mr");  // → true   (Marathi)
GenderAgreement.isHonorificPronoun("भवान्",   "sa");  // → true   (Sanskrit)
GenderAgreement.isHonorificPronoun("आप",       "ta");  // → false  (unsupported lang)

SovReorder

Heuristic English SVO → SOV word-order reordering, useful for generating Hindi-style training prompts from English sentences.

import { SovReorder } from "@mera-vansh/ms-ltd";

SovReorder.reorder(text)

Moves the auxiliary verb and any trailing negations to the end of the sentence (SOV order). Operates only on ASCII-Latin input; non-Latin scripts and mixed-script text are returned unchanged.

SovReorder.reorder("Ram is studying");        // → "Ram studying is"
SovReorder.reorder("She is eating rice");     // → "She eating rice is"
SovReorder.reorder("I have finished work");   // → "I finished work have"
SovReorder.reorder("They will go home");      // → "They go home will"
SovReorder.reorder("She is not eating");      // → "She eating is not"
SovReorder.reorder("They did not finish");    // → "They finish did not"

// Short sentences / no auxiliary → unchanged
SovReorder.reorder("Ram");                    // → "Ram"
SovReorder.reorder("I am happy");             // → "I am happy"  ("am" is not in AUXILIARIES)

// Non-Latin → unchanged
SovReorder.reorder("मेरा नाम राम है");        // → "मेरा नाम राम है"

Supported auxiliaries: is, are, was, were, be, been, being, have, has, had, do, does, did, will, would, shall, should, may, might, can, could, must, need, dare.

Note: "am" is intentionally excluded from the auxiliaries set — it is used as a prefix of other common words and is excluded to prevent false reorders.


Tokenizer

import { Tokenizer } from "@mera-vansh/ms-ltd";

// Normalise: NFKC + lowercase + strip zero-width chars
Tokenizer.normalize("NAMASTE\u200C");  // → "namaste"

// Tokenise: Unicode-aware word extraction (handles all scripts)
Tokenizer.tokenize("मेरा गोत्र भारद्वाज है");
// → ["मेरा", "गोत्र", "भारद्वाज", "है"]

Tokenizer.tokenize("மகிழ்ச்சி நன்றி");
// → ["மகிழ்ச்சி", "நன்றி"]

// Stem: lightweight suffix stripping
Tokenizer.stem("जाता");    // → "जा"   (Hindi -ता rule)
Tokenizer.stem("running"); // → "runn" (English -ing rule)
Tokenizer.stem("trees");   // → "tree" (English -s rule)

// Full pipeline: tokenise + stem
Tokenizer.tokenizeAndStem("running trees called");
// → ["runn", "tree", "call"]

Hindi stemming rules (longest-first): -ाना, -ता, -ते, -ती, -ना, -ने, -कर

English stemming rules (longest-first): -tion (len≥7), -ing (len≥6), -ed (len≥5), -s (len≥5)


TF-IDF Vectorizer

Use TFIDFVectorizer directly when you need vector representations outside of the full LTD pipeline.

import { TFIDFVectorizer, cosineSimilarity } from "@mera-vansh/ms-ltd";

const v = new TFIDFVectorizer();

// Fit on corpus
v.fit([
  "gotra bharadwaj family",
  "gotra kashyap family",
  "mata pita relation"
]);

// Transform a query
const queryVec = v.transform("gotra bharadwaj");

// Transform a document
const docVec = v.transform("gotra kashyap family");

// Compute similarity
const sim = cosineSimilarity(queryVec, docVec);
console.log(sim); // 0.0 – 1.0

// IDF inspection
v.getIDF("gotra");      // lower IDF (appears in 2/3 docs)
v.getIDF("bharadwaj");  // higher IDF (appears in 1/3 docs)
v.getIDF("xyz");        // null (OOV)

// Serialise
const state = v.exportState();
// { vocab: [[term, idx], ...], idf: [[term, score], ...], docCount: 3 }

const v2 = new TFIDFVectorizer();
v2.importState(state); // restore without re-fitting

MemoryStore

MemoryStore is the retrieval layer used internally by LTD. Use it directly for custom retrieval pipelines.

import { MemoryStore } from "@mera-vansh/ms-ltd";

const store = new MemoryStore();

store.ingest([
  { id: "d1", text: "gotra bharadwaj", metadata: { type: "gotra" } },
  { id: "d2", text: "mata pita relation", metadata: { type: "family" } },
]);

// Retrieve
const results = store.retrieve("gotra family", 3);
// results[0].id === "d1", results[0].score > 0

// Add incrementally
store.add({ id: "d3", text: "gotra kashyap" });

// Feedback
store.adjustWeight("d1", "positive"); // weight → 1.1
store.adjustWeight("d2", "negative"); // weight → 0.9

// Inspect
store.size();       // → 3
store.has("d1");    // → true

// Remove
store.remove("d3"); // → true

// Persist
const snapshot = store.export();
const json = JSON.stringify(snapshot);

// Restore
const store2 = new MemoryStore();
store2.import(JSON.parse(json));

Persistence & Serialisation

All state (vocabulary, document vectors, weights) is serialisable to plain JSON. This enables integration with any database.

MongoDB example

import { LTD } from "@mera-vansh/ms-ltd";
import { MongoClient } from "mongodb";

const client = new MongoClient(process.env.MONGODB_URI!);
const col = client.db("myapp").collection("ltd_brain");

// Save
const ltd = new LTD();
ltd.ingest(myDocs);
await col.replaceOne({ _id: "v1" }, { _id: "v1", ...ltd.export() }, { upsert: true });

// Load
const saved = await col.findOne({ _id: "v1" });
const ltd2 = new LTD();
if (saved) ltd2.import(saved);

File system example

import { writeFileSync, readFileSync } from "fs";

// Save
writeFileSync("brain.json", JSON.stringify(ltd.export(), null, 2));

// Load
const ltd2 = new LTD();
ltd2.import(JSON.parse(readFileSync("brain.json", "utf8")));

Recipes

Multilingual FAQ bot

import { LTD } from "@mera-vansh/ms-ltd";

const bot = new LTD({ defaultTopK: 3 });

bot.ingest([
  { id: "faq-gotra-en", text: "What is gotra? gotra is a clan lineage system", metadata: { answer: "Gotra is a patrilineal clan system in Hindu tradition." } },
  { id: "faq-gotra-hi", text: "गोत्र क्या है गोत्र वंश परंपरा", metadata: { answer: "गोत्र एक पितृवंशीय कुल परंपरा है।" } },
  { id: "faq-nakshatra", text: "nakshatra birth star lunar mansion", metadata: { answer: "Nakshatra is the lunar mansion at the time of birth." } },
]);

function ask(userInput: string) {
  const res = bot.call(userInput);

  if (res.confidence < 0.1) {
    return "Sorry, I don't have information on that yet.";
  }

  const top = res.candidates[0]!;

  // Positive feedback on use
  bot.feedback(top.id, "positive");

  return top.metadata["answer"] as string;
}

ask("gotra kya hota hai?");  // returns Hindi FAQ answer
ask("what is a nakshatra?"); // returns English FAQ answer

Emotion-aware response routing

import { LTD } from "@mera-vansh/ms-ltd";

const ltd = new LTD();
ltd.ingest(myKnowledgeBase);

function handleMessage(userText: string) {
  const { emotion, tone, candidates, confidence } = ltd.call(userText);

  if (emotion === "GRIEF") {
    return { type: "condolence", message: "I'm so sorry for your loss." };
  }
  if (emotion === "CONFUSION") {
    return { type: "clarify", message: "Let me explain that more clearly." };
  }
  if (tone === "URGENT") {
    return { type: "priority", answer: candidates[0] };
  }

  return { type: "standard", answer: candidates[0], confidence };
}

Lexicon-powered autocomplete

import { LTD } from "@mera-vansh/ms-ltd";

const ltd = new LTD();

// Suggest as user types (Devanagari)
ltd.suggest("नमस्", 5).map(r => r.entry.text);
// → ["नमस्ते", "नमस्कार", ...]

// Suggest from IAST romanization
ltd.suggest("rāmā", 3).map(r => ({
  text: r.entry.text,
  gloss: r.entry.gloss,
  lang: r.entry.lang,
}));
// → [{ text: "रामायण", gloss: "the Ramayana", lang: "hi" }, ...]

// Suggest kinship terms
ltd.suggest("dādā").map(r => r.entry.gloss);
// → ["paternal grandfather", ...]

Sanskrit inflection pipeline

import { VibhaktiEngine, GenderAgreement, Transliterator } from "@mera-vansh/ms-ltd";

// Inflect a noun
const nom = VibhaktiEngine.inflect("rām", "a_m", 1, "sg");
console.log(nom.form);    // → "rāmaḥ"
console.log(nom.kāraka);  // → "kartā"

// Generate a full paradigm
const forms = VibhaktiEngine.paradigm("sīt", "aa_f");
forms.forEach(f => console.log(`${f.vibhakti}/${f.vacana}: ${f.form}`));

// Agree an adjective
GenderAgreement.agreeAdjective("sundarā", "f", "sg", "sa"); // → "sundarā"

// Transliterate to Devanagari
Transliterator.iastToScript("rāmaḥ", "Devanagari"); // → "रआमअः"

Script-aware input routing

import { ScriptDetector, detectTone } from "@mera-vansh/ms-ltd";

function classifyInput(text: string) {
  const script = ScriptDetector.detectScript(text);
  const lang   = ScriptDetector.detectLanguage(text);
  const tone   = detectTone(text);

  return { script, lang, tone };
}

classifyInput("aap ka gotra kya hai?");
// → { script: "Latin", lang: "en", tone: "FORMAL" }

classifyInput("आप का गोत्र क्या है?");
// → { script: "Devanagari", lang: "hi", tone: "FORMAL" }

Building a domain classifier

import { LTD } from "@mera-vansh/ms-ltd";

const classifier = new LTD();

// Seed each domain with representative phrases
classifier.ingest([
  { id: "astro-1", text: "nakshatra rashi horoscope kundali", metadata: { domain: "astrology" } },
  { id: "astro-2", text: "नक्षत्र राशि कुंडली ज्योतिष",     metadata: { domain: "astrology" } },
  { id: "gotra-1", text: "gotra pravara rishi lineage clan",  metadata: { domain: "gotra" } },
  { id: "gotra-2", text: "गोत्र प्रवर ऋषि वंश कुल",          metadata: { domain: "gotra" } },
  { id: "ritl-1",  text: "vivah puja samskara ritual ceremony", metadata: { domain: "ritual" } },
]);

function classify(userInput: string) {
  const { candidates, confidence } = classifier.call(userInput);
  if (confidence < 0.15) return "unknown";
  return candidates[0]?.metadata["domain"] ?? "unknown";
}

classify("my nakshatra is Rohini");         // → "astrology"
classify("bharadwaj gotra mein vivah");     // → "gotra" or "ritual"
classify("random unrelated words xyz");     // → "unknown"

TypeScript Types

import type {
  // Core types
  LTDOptions,
  LTDResponse,
  LTDCandidate,
  LTDState,

  // Document types
  IngestDocument,
  VectorEntry,
  FeedbackSignal,       // "positive" | "negative" | "neutral"

  // Classification types
  Emotion,              // "REVERENCE" | "JOY" | "GRIEF" | "ANGER" | "CONFUSION" | "NEUTRAL"
  Tone,                 // "REVERENTIAL" | "FORMAL" | "URGENT" | "CURIOUS" | "INFORMAL" | "NEUTRAL"
  Script,               // "Devanagari" | "Tamil" | ... | "Mixed" | "Unknown"
  LangCode,             // "en" | "hi" | "mr" | "bn" | ... (18 codes)

  // Rule types (for custom extensions)
  EmotionRule,
  ToneRule,

  // Lexicon types
  LexiconEntry,         // { text, lang, romanized, gloss, category, subcategory? }
  LexiconCategory,      // "salutation" | "kinship" | "emotion_rasa" | "geography" | "literature" | "time" | "number"
  LexiconSuggestion,    // { entry: LexiconEntry, matchedPrefix: string }

  // Grammar types
  Vibhakti,             // 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8  (Sanskrit cases)
  Vacana,               // "sg" | "du" | "pl"
  Linga,                // "m" | "f" | "n"
  StemClass,            // "a_m" | "aa_f" | "i_m" | "ii_f" | "u_m" | "cons"
  InflectedForm,        // { form, vibhakti, vacana, linga, kāraka }
  TranslitScheme,       // "Devanagari" | "Bengali" | "Tamil" | "Telugu" | ...

  // Vector type
  SparseVector,         // Map<string, number>
} from "@mera-vansh/ms-ltd";

Design Principles

| Principle | Detail | |---|---| | Zero dependencies | No runtime npm packages; only @types/node as devDependency | | Deterministic | Every output is a direct function of rules; no sampling, no randomness | | Multilingual | Unicode-first; works correctly with all Indic vowel signs (Mc/Mn marks) | | Sparse representation | Map<string, number> — only non-zero terms stored | | Feedback-weighted | Retrieval scores multiplied by per-document weights [0.1 – 3.0] | | JSON-safe state | All Map instances serialised as [key, value][] arrays | | Stateless utilities | Tokenizer, ScriptDetector, Transliterator, VibhaktiEngine, GenderAgreement, SovReorder are fully static — no instantiation needed |

IDF Formula

Uses the scikit-learn smooth IDF variant to avoid zero-IDF for universal terms:

IDF(t) = log((N + 1) / (df(t) + 1)) + 1

Where N = corpus size, df(t) = number of documents containing term t. The +1 offset ensures IDF ≥ 1.0 for all terms.

Known Limitations

  • Stemming is suffix-stripping only (not full morphological analysis). Hindi and English are supported; other languages are tokenised but not stemmed.
  • Language detection is heuristic (seed-word based). Short inputs or texts without seed words may fall back to the script's default language.
  • Regex word boundaries (\b) are ASCII-only in standard JavaScript. Tone/emotion patterns targeting Devanagari-only text use keyword substring matching, not boundary-anchored regex.
  • Mixed-script inputs (script === "Mixed") return lang === null.
  • iastToScript outputs standalone vowel characters, not matra vowel signs. For properly-joined syllabic Devanagari, a syllabification pass over the output is required.
  • devanagariToIast handles standalone Devanagari characters only; matra vowel signs (U+093E–U+094C) pass through unchanged.

Development

This package is part of the mera-vansh monorepo.

# Build
pnpm build

# Type-check
pnpm type-check

# Lint
pnpm lint

# Test
pnpm test

# Test with coverage
pnpm test:coverage

# Dry-run publish check (verify no src/ leaked)
pnpm pack --dry-run

Testing

# Run full Vitest suite (~1179 test cases)
pnpm test

# Watch mode
pnpm test:watch

# Coverage report
pnpm test:coverage

# Legacy integration tests (112 assertions)
pnpm test:legacy

Test files live in test/ — one file per module:

test/
├── math.similarity.test.ts          cosineSimilarity
├── math.vectorizer.test.ts          TFIDFVectorizer
├── nlp.tokenizer.test.ts            Tokenizer
├── nlp.detector.test.ts             ScriptDetector (+ Sprint 1 whole-word tests)
├── nlp.trie.test.ts                 LexiconTrie
├── grammar.transliterator.test.ts   Transliterator
├── grammar.vibhakti.test.ts         VibhaktiEngine
├── grammar.agreement.test.ts        GenderAgreement
├── grammar.sovreorder.test.ts       SovReorder
├── rules.emotion.test.ts            detectEmotion + EMOTION_RULES (+ Sprint 2 devotional keywords)
├── rules.tone.test.ts               detectTone + TONE_RULES
├── rules.lexicon.test.ts            LEXICON
├── storage.memorystore.test.ts      MemoryStore
├── engine.ltd.test.ts               LTD (end-to-end)
├── engine.suggest.test.ts           LTD.suggest()
└── test-ltd.ts                      Legacy integration runner

Contributing

See CONTRIBUTING.md at the monorepo root.

  1. Fork and clone
  2. pnpm install from the monorepo root
  3. pnpm --filter @mera-vansh/ms-ltd test to run the test suite
  4. pnpm --filter @mera-vansh/ms-ltd type-check to type-check

All rules (emotion-rules.ts, tone-rules.ts) and seed keywords (detector.ts) are plain TypeScript arrays — easy to extend without touching core logic.


Changelog

2.0.0 (2026-03-24)

  • Deterministic NLP pipeline replacing probabilistic Naive Bayes
  • TF-IDF vectoriser with sklearn smooth IDF
  • Multilingual tokeniser with \p{L}\p{N}\p{M} Unicode regex
  • Script detection across 10 Unicode blocks

Sprint 1 — Language detection hardening

  • Whole-word tokenization in ScriptDetector.detectLanguage() — seeds are now matched as complete tokens (no false substring matches)
  • Devanagari disambiguation: Hindi / Marathi / Nepali / Sanskrit / Maithili / Konkani correctly separated
  • Bengali / Assamese and Urdu / Sindhi disambiguation
  • Removed "के" from Hindi seeds (shared with Nepali interrogative)

Sprint 2 — Devotional REVERENCE keywords

  • Added 8 Hindu devotional invocations to REVERENCE emotion: हरे कृष्ण, राम राम, जय श्री राम, हर हर महादेव, जय माता दी, ॐ नमः शिवाय, जय जय, ॐ

Sprint 6 — Sanskrit grammar tools

  • Transliterator — IAST ↔ Indic script mapping (9 scripts, phoneme-level)
  • VibhaktiEngine — Sanskrit nominal inflection (8 cases × 3 numbers × 6 stem classes = 24-form paradigms)
  • GenderAgreement — Hindi/Marathi/Sanskrit adjective agreement + honorific pronoun detection
  • SovReorder — English SVO → SOV word-order reordering

Sprint 7 — Lexicon + autocomplete

  • LEXICON — ~2200 curated entries across 7 categories (salutation, kinship, emotion_rasa, geography, literature, time, number) in 18 languages with IAST romanizations
  • LexiconTrie — Unicode BFS prefix trie for autocomplete over any LexiconEntry set
  • LTD.suggest() — lazily-built trie search over the built-in LEXICON

Test suite

  • 7 new test files covering all Sprint 6/7 additions
  • 2 existing test files augmented (detector + emotion)
  • Total: ~1179 Vitest test cases

License

GPL-3.0 © Mera Vanshdwivna

See LICENSE for the full license text.