npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

IтАЩve always been into building performant and accessible sites, but lately IтАЩve been taking it extremely seriously. So much so that IтАЩve been building a tool to help me optimize and monitor the sites that I build to make sure that IтАЩm making an attempt to offer the best experience to those who visit them. If youтАЩre into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, ЁЯСЛ, IтАЩm Ryan Hefner┬а and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If youтАЩre interested in other things IтАЩm working on, follow me on Twitter or check out the open source projects IтАЩve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soonтАУish.

Open Software & Tools

This site wouldnтАЩt be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you ЁЯЩП

┬й 2026 тАУ┬аPkg Stats / Ryan Hefner

nepali-nlp-pro-max

v1.0.0

Published

ЁЯУЪ Nepali natural language processing тАФ Devanagari normalizer, sentence and word tokenizer, curated stopwords, light stemmer, script detection, embedded number-word extraction.

Readme

nepali-nlp-pro-max

рдиреЗрдкрд╛рд▓реА рдПрдирдПрд▓рдкреА тАФ pro-max edition

Nepali natural language processing utilities. Devanagari normalizer, sentence and word tokenizer, curated stopwords (220+), light stemmer, script detection, and embedded number-word extraction.

npm version License: MIT TypeScript Zero deps


тЬи Highlights

  • ЁЯЗ│ЁЯЗ╡ Nepali-first тАФ every function is tuned for Devanagari + the realities of Nepali text (poorna virama, ZWNJ pollution, mixed-script content)
  • ЁЯз╣ Production-grade normalizer тАФ NFC + ZWNJ/ZWJ stripping + whitespace collapse, the three things every Nepali pipeline rewrites
  • тЬВя╕П Tokenizers тАФ sentences (ред / рее / ? / !), words, and Unicode-safe character iteration
  • ЁЯЫС Curated stopwords тАФ 220+ pronouns, particles, postpositions, auxiliaries; extendable / overridable
  • ЁЯкУ Light stemmer тАФ strips case markers (-рд▓реЗ, -рдХреЛ, -рдорд╛, -рд▓рд╛рдИ, -рдмрд╛рдЯ, тАж) and plural -рд╣рд░реВ with safety guards
  • ЁЯФН Script detection тАФ isDevanagari, containsDevanagari, mixedScriptRatio for routing decisions
  • ЁЯФв Number-word extraction тАФ find реи рд▓рд╛рдЦ рел рд╣рдЬрд╛рд░ inside arbitrary text and convert to BigInt
  • ЁЯУж Zero deps ┬╖ ESM + CJS ┬╖ TypeScript-first ┬╖ tree-shakeable

ЁЯУж Install

npm install nepali-nlp-pro-max
pnpm add nepali-nlp-pro-max
yarn add nepali-nlp-pro-max
bun add nepali-nlp-pro-max

тЪб Quick Start

import {
  normalize,
  tokenizeSentences,
  tokenizeWords,
  removeStopwords,
  stem,
  isDevanagari,
  detectScript,
  extractNumbers,
} from "nepali-nlp-pro-max";

const text = "рдо рдиреЗрдкрд╛рд▓рдорд╛ рдмрд╕реНрдЫреБред рддрдкрд╛рдИрдВрд▓рд╛рдИ рдХрд╕реНрддреЛ рдЫ?";

normalize(text);                                   // ZWNJ-stripped, NFC, whitespace-collapsed
tokenizeSentences(text);
// ["рдо рдиреЗрдкрд╛рд▓рдорд╛ рдмрд╕реНрдЫреБред", "рддрдкрд╛рдИрдВрд▓рд╛рдИ рдХрд╕реНрддреЛ рдЫ?"]

tokenizeWords("рдо рдиреЗрдкрд╛рд▓рдорд╛ рдмрд╕реНрдЫреБред");
// ["рдо", "рдиреЗрдкрд╛рд▓рдорд╛", "рдмрд╕реНрдЫреБ"]

removeStopwords(["рдо", "рдиреЗрдкрд╛рд▓рдорд╛", "рдмрд╕реНрдЫреБ"]);
// ["рдиреЗрдкрд╛рд▓рдорд╛", "рдмрд╕реНрдЫреБ"]

stem("рдиреЗрдкрд╛рд▓рдорд╛");        // "рдиреЗрдкрд╛рд▓"
stem("рдХрд┐рддрд╛рдмрд╣рд░реВ");       // "рдХрд┐рддрд╛рдм"
stem("рдорд╛рдирд┐рд╕рд╣рд░реВрд▓реЗ");     // "рдорд╛рдирд┐рд╕"

isDevanagari("рдиреЗрдкрд╛рд▓");   // true
detectScript("Hi рдирдорд╕реНрддреЗ");// "mixed"

extractNumbers("рдорд▓рд╛рдИ реи рд▓рд╛рдЦ рел рд╣рдЬрд╛рд░ рдЪрд╛рд╣рд┐рдиреНрдЫ");
// [{ value: 205000n, raw: "реи рд▓рд╛рдЦ рел рд╣рдЬрд╛рд░", start: 6, end: 18 }]

ЁЯза Mental Model

| Rule | Why | |---|---| | Always normalize first. ZWNJ/ZWJ + decomposed forms break exact-match search and indexing. | One normalize() call up-front avoids dozens of false misses. | | Sentence boundary = ред / рее / ? / ! followed by whitespace or EOS. | Latin-style . is too noisy in mixed text. | | Stemmer is suffix-strip, not full morphology. | A real Nepali morphological analyser is research-grade. The light stemmer covers the 90% case (case markers + plural). | | Stopwords are a starting point, not gospel. | Pass extra / exclude per app тАФ news search wants different filtering than chat moderation. | | Number extraction is greedy. | extractNumbers walks the text and captures the longest valid run starting at each position. |


ЁЯз░ Full API

| Function | Description | |---|---| | normalize(text, opts?) | NFC + strip ZWNJ/ZWJ + collapse whitespace | | stripZeroWidth(text) | Remove ZWNJ (U+200C) / ZWJ (U+200D) only | | toNFC(text) | Apply Unicode NFC only |

NormalizeOptions: { stripZeroWidth?, collapseWhitespace?, nfc? }

| Function | Description | |---|---| | isDevanagari(s) | Every non-whitespace, non-punct char is Devanagari | | containsDevanagari(s) | At least one Devanagari char | | containsLatin(s) | At least one Latin (A-Z / a-z) char | | detectScript(s) | "devanagari" \| "latin" \| "mixed" \| "none" | | mixedScriptRatio(s) | Devanagari fraction over (Devanagari + Latin) chars |

| Function | Description | |---|---| | tokenizeSentences(text) | Split on ред / рее / ? / ! | | tokenizeWords(text) | Split on whitespace + Devanagari/Latin punct | | tokenizeCharacters(text) | Iterate Unicode code points |

| Function / Constant | Description | |---|---| | STOPWORDS | Bundled ReadonlySet<string> (220+ entries) | | isStopword(word, opts?) | Membership check with extend/exclude support | | removeStopwords(tokens, opts?) | Filter tokens against the active set |

StopwordOptions: { stopwords?, extra?, exclude? }

| Function / Constant | Description | |---|---| | CASE_MARKERS | Default suffix list, longest-first | | stem(word, opts?) | Strip one matching suffix | | stemAll(tokens, opts?) | Batch stem |

StemOptions: { suffixes?, minResidue? } тАФ minResidue defaults to 2 (avoids over-stripping single-syllable roots).

| Function | Description | |---|---| | parseNumberWord(s) | Single number-word string тЖТ bigint \| null | | extractNumbers(text) | Find every embedded number run, return NumberMatch[] | | NUMBER_WORDS | The 0-99 + scale words map (Devanagari тЖТ BigInt) |

NumberMatch: { value: bigint, raw: string, start: number, end: number }


ЁЯОп Recipes

Pre-process for Elasticsearch / OpenSearch indexing

import { normalize, tokenizeWords, removeStopwords, stemAll } from "nepali-nlp-pro-max";

function indexable(text: string): string[] {
  const cleaned = normalize(text);
  const words = tokenizeWords(cleaned);
  const content = removeStopwords(words);
  return stemAll(content);
}

indexable("рдо рдиреЗрдкрд╛рд▓рдорд╛ рдмрд╕реНрдЫреБред рдиреЗрдкрд╛рд▓ рд░рд╛рдореНрд░реЛ рдЫред");
// ["рдиреЗрдкрд╛рд▓", "рдмрд╕реН", "рдиреЗрдкрд╛рд▓", "рд░рд╛рдореНрд░"]

Route mixed-script content

import { detectScript, mixedScriptRatio } from "nepali-nlp-pro-max";

function pickPipeline(text: string): "ne" | "en" | "both" {
  const script = detectScript(text);
  if (script === "devanagari") return "ne";
  if (script === "latin") return "en";
  if (mixedScriptRatio(text) > 0.5) return "ne";
  return "both";
}

Extract amounts from news articles

import { extractNumbers } from "nepali-nlp-pro-max";

const article = "рдмрдЬреЗрдЯрдорд╛ рд╕рд░рдХрд╛рд░рд▓реЗ рд╢рд┐рдХреНрд╖рд╛рдХрд╛ рд▓рд╛рдЧрд┐ рез рдЦрд░реНрдм реирел рдЕрд░реНрдм рдЫреБрдЯреНрдпрд╛рдПрдХреЛ рдЫред";
const matches = extractNumbers(article);
// [{ value: 125000000000n, raw: "рез рдЦрд░реНрдм реирел рдЕрд░реНрдм", ... }]

App-specific stopword tuning

import { removeStopwords } from "nepali-nlp-pro-max";

// News search wants "рд░" out (it's noisy) but keeps "рддрд░" (signals contrast)
removeStopwords(tokens, { exclude: ["рддрд░"] });

// Chat moderation: add domain stopwords
removeStopwords(tokens, { extra: ["рд╣реБрдиреНрдЫ", "рдард┐рдХреИ", "рд╣рдЬреБрд░"] });

ЁЯдЭ Contributing

PRs welcome. Common contributions:

  • More stopwords for specific domains (news, legal, technical)
  • Additional case-marker variants found in dialect / older text
  • Sentence-boundary edge cases (decimals, abbreviations in mixed text)
  • Bug fixes, type improvements
npm install
npm test
npm run typecheck
npm run build

ЁЯУЬ License

MIT ┬й 2026 l3lackcurtains


Made with тЭдя╕П for the Nepali developer community.

рдмрдирд╛рдЗрдПрдХреЛ рдиреЗрдкрд╛рд▓реА рдбреЗрднрд▓рдкрд░ рд╕рдореБрджрд╛рдпрдХреЛ рд▓рд╛рдЧрд┐ред