npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

language-detector-web

v2.0.0

Published

Efficient Language Detection for Multilingual Documents

Readme

Efficient Language Detection for Multilingual Documents

LanguageDetector is a TypeScript library designed to detect languages for web pages in 102 and languages. A research paper with more information will be published soon.

This library has no dependency. It's been tested on the server-side with nodejs. It should work in the browser.

Example

import LanguageDetector from 'language-detector-web';

const detector = new LanguageDetector();

const languages = detector.getSupportedLanguages();
console.log(languages); // ["af", "am", "ar", "as", "az", "be", "bg", "bn", "br", "bs", …]

const results = detector.getLanguages('This is an English text.'); // ['en']
console.log(`The main languages are ${results.join(' ')}.`); // The main languages are en.

Installation

npm install language-detector-web

Usage

Importing the Class

import LanguageDetector from 'language-detector-web';

Creating an Instance

const detector = new LanguageDetector();

Methods

LanguageDetector(mergeResults?, mergeDatasets?, skipSimilar?)

Creates an instance of LanguageDetector

  • mergeResults: Merge languages with different alphabets (simplified and traditional chinese, Bengali and Romanized Bengali, etc.). Example: { 'zh': ['zhs', 'zht'] , 'bn': ['bnr'], 'hi': ['hir'] }
  • mergeDatasets: Merge special datasets with a language. Example: {'code': 'en', 'misc': 'en'}
  • skipSimilar: Skip similar languages (for top result only). False by default

getSupportedLanguages()

Returns the list of supported languages as ISO 639-1 code: en (English), fr (French), nl (Dutch), etc.

getLanguagesWithScores(rawText)

Returns the score for each language supported:

{ 'en': 25.6, 'zh': -136.0', 'nl': 0, ...}

Scores can be 0 or negative. This library was designed and tested with the visible text of the web page, without any HTML content. This functions cleans up the text: emojis are removed, etc. Scores will likely increase with the length of the page.

getLanguages(rawText, minimumRatio?)

Returns the most likely language(s) used in the page from highest score to lowest score.

  • minimumRatio: minimum ratio to the highest score to be included, 0.0 to 1.0 - 0.8 by default

If the language with highest score has value of 100, only languages with a score of 80 (0.8 ratio) or more returned.

Configuration

The list of supported languages and their attributes (top letters and words) are contained in languages.json. This library is built with the top 10,000 words and letters for each language. Other datasets are available on GitHub: top 1k, 2k, 5k, 10k and 20k. See the research paper (coming soon) for the performances of each dataset.

Tests

To run tests, use this command:

npm test

Most test files were created with automated translation tools. Since the validity of the content has not been verified, failed tests (.txt.failed extension) have been disabled. To force these test files to be used, run this command:

FORCE_ALL_TESTS=true npm test

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

License

This project is licensed under the MIT License.