npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

en-es-detector

v1.0.0

Published

Production-ready English/Spanish language detector using hybrid ML, Heuristics, and Bloom Filter approach.

Downloads

102

Readme

en-es-detector

Production-Ready English / Spanish Language Detector

A high-performance, hybrid language detection library optimized for distinguishing English and Spanish text, specifically capable of handling:

  • CamelCase Technical Jargon (e.g., HelloWorld, AdjustableBanner)
  • Short Fragments & UI Labels
  • Mixed Content

It combines Bloom Filters (fast dictionary lookup), Linguistic Heuristics (stopword/pattern matching), and an N-Gram ML Model to achieve high accuracy where standard libraries fail.

Installation

npm install en-es-detector

Usage

Basic Detection

The main export detect runs the full pipeline (Dictionary + Heuristics + ML).

import { detect } from 'en-es-detector';

const result = detect("HelloWorld");
console.log(result);
// Output: { lang: 'en', confidence: 0.99, method: 'dictionary_strict' }

Low-Level ML Detection

If you want to bypass the dictionary/heuristic layer and use only the N-Gram model (faster but less accurate for technical jargon), use detectML.

import { detectML } from 'en-es-detector';

const result = detectML("HelloWorld");
// Likely less confident or incorrect for jargon

Advanced Configuration

You can override the default sensitivity thresholds by passing an options object as the second argument to detect().

| Option | Type | Default | Description | | :--- | :--- | :--- | :--- | | MIN_ENGLISH_RATIO_STRICT | Number | 0.8 | Minimum ratio of valid English words required to trigger a "Strict Dictionary" match (High Confidence). | | MIN_ENGLISH_RATIO_LOOSE | Number | 0.6 | Minimum ratio of English words to trigger a fallback "Loose Dictionary" match if no Spanish stopwords exist. | | MAX_ENGLISH_RATIO_FOR_SPANISH | Number | 0.3 | Maximum allowed ratio of English words when strong Spanish stopwords are present. Prevents false positives. | | CONF_DICT_STRICT | Number | 0.99 | Confidence score assigned when strict dictionary criteria are met. | | CONF_DICT_LOOSE | Number | 0.85 | Confidence score assigned when loose dictionary criteria are met. | | CONF_SPANISH_STOPWORD | Number | 0.95 | Confidence score boost for texts containing high-frequency Spanish stopwords. | | MIN_ML_CONFIDENCE_TO_OVERRIDE_DICT | Number | 0.95 | If the ML model is this confident in its prediction, it can override a "Loose Dictionary" match to prevent errors. |

Example

const customOptions = {
    // Make it stricter: 90% of words must be English
    MIN_ENGLISH_RATIO_STRICT: 0.9, 
    // Trust the dictionary less
    CONF_DICT_STRICT: 0.95 
};

detect("HelloWorld", customOptions);
// Confidence to assign when a Spanish stopword is found
CONF_SPANISH_STOPWORD: 1.0, 

};

const result = detect("SomeAmbiguousText", customOptions);


## How It Works

The detector uses a multi-stage pipeline:

1.  **Normalization**: Splits CamelCase (`ItemCarousel` → `Item Carousel`) and kebab-case.
2.  **Dictionary Check**: Checks tokens against a **Bloom Filter** containing ~155k English words.
3.  **Heuristics**: Scans for high-precision Spanish stopwords (e.g., `de`, `la`, `que`) and patterns (e.g., `cion`, `ñ`).
4.  **ML Inference**: Runs a pre-trained N-Gram hashing model (MurmurHash3 + Logistic Regression weights) as a fallback.
5.  **Ensemble Decision**: Combines all signals. For example, if the ML model predicts "Spanish" but the Dictionary sees 100% English words, the detector correctly overrides it to "English".

## License

ISC