npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

english-validator

v2.0.2

Published

Detect whether a sentence is English or non-English. Returns true/false with high accuracy using dictionary lookup and trigram analysis.

Downloads

431

Readme

english-validator

Detect whether a sentence is English or non-English. Returns true / false with high accuracy.

npm version license

Features

  • Dictionary-powered — 274k+ English word dictionary for accurate word-level checks
  • Trigram analysis — uses franc as a secondary signal for statistical language detection
  • Lightweight API — single function call, returns a boolean
  • Configurable — adjustable thresholds, minimum word length, number handling
  • Built-in caching — LRU-style memoization for fast repeated lookups
  • TypeScript support — ships with full type declarations and JSDoc
  • ESM & CJS — works with import and require (zero runtime dependencies)

Installation

npm install english-validator

Quick Start

ESM (React, Next.js, Vite, modern Node.js)

import { isEnglish, detectNonEnglishText } from "english-validator";

isEnglish("The quick brown fox jumps over the lazy dog");
// => true

isEnglish("Ceci est une phrase en français");
// => false

// Or use the inverse API:
detectNonEnglishText("Das ist ein deutscher Satz");
// => true  (it IS non-English)

detectNonEnglishText("Hello, how are you?");
// => false (it is NOT non-English)

CommonJS (Node.js)

const { isEnglish, detectNonEnglishText } = require("english-validator");

console.log(isEnglish("Hello world")); // true

TypeScript

The package ships with full type declarations. Import types directly:

import {
  isEnglish,
  detectNonEnglishText,
  matchesDocumentPattern,
  clearLanguageDetectorCaches,
} from "english-validator";
import type { DetectionOptions } from "english-validator";

// Use DetectionOptions for custom configuration
const options: DetectionOptions = {
  englishThreshold: 0.7,
  minWordLength: 3,
  allowNumbers: false,
};

const result: boolean = isEnglish("Check this text", options);

API

isEnglish(text, options?)

Returns true if the text is English, false otherwise.

| Parameter | Type | Description | | --------- | ------------------ | ---------------------------------- | | text | string \| null \| undefined | Text to analyse. Returns true for empty/null/undefined. | | options | DetectionOptions | Optional configuration (see below) |

isEnglish("Hello world");          // true
isEnglish("Bonjour le monde");     // false
isEnglish("", { englishThreshold: 0.5 }); // true (empty)

detectNonEnglishText(text, options?)

Returns true if the text is non-English, false if English. Inverse of isEnglish.

detectNonEnglishText("Das ist Deutsch");   // true
detectNonEnglishText("This is English");   // false

matchesDocumentPattern(text)

Returns true if the text matches document ID patterns like AEM01-WI-DSU06-SD01.

matchesDocumentPattern("AEM01-WI-DSU06-SD01"); // true
matchesDocumentPattern("Hello world");          // false

clearLanguageDetectorCaches()

Clears the internal LRU memoization caches. Call this in long-running applications to free memory or to reset state between independent detection sessions.

clearLanguageDetectorCaches(); // frees all cached results

DetectionOptions

Configuration object accepted by isEnglish and detectNonEnglishText:

| Option | Type | Default | Description | | ------------------- | ----------- | ------- | ---------------------------------------------------- | | englishThreshold | number | 0.8 | Ratio of English words needed to classify as English (0.0–1.0) | | minWordLength | number | 2 | Words shorter than this are skipped during analysis | | allowNumbers | boolean | true | Treat standalone numbers as valid English tokens | | allowAbbreviations| boolean | true | Treat uppercase abbreviations (e.g. NATO, FBI) as valid English tokens | | customPatterns | RegExp[] | — | Regex patterns to strip from text before validation | | excludeWords | string[] | — | Words to remove from text before validation (case-insensitive, whole-word) |

Note: Short texts (4 words or fewer) automatically use a relaxed threshold of 0.6 regardless of the configured englishThreshold, to avoid false positives on English fragments.

Quick Examples

import { isEnglish } from "english-validator";

// englishThreshold — lower it to allow mixed-language text
isEnglish("Hello mundo friend", { englishThreshold: 0.5 });       // true (50%+ English)

// minWordLength — skip short words like "a", "I" during analysis
isEnglish("I am a big fan of this", { minWordLength: 3 });         // true

// allowNumbers — treat "42" as a valid English token (default: true)
isEnglish("Order 42 is ready", { allowNumbers: true });            // true

// allowAbbreviations — treat "NATO", "FBI" as valid (default: true)
isEnglish("NATO signed the agreement", { allowAbbreviations: true }); // true

// customPatterns — strip JIRA IDs before validation
isEnglish("Fix bug PROJ-1234 in login flow", {
  customPatterns: [/[A-Z]+-\d+/g],
});                                                                 // true

// excludeWords — remove brand names / jargon before validation
isEnglish("Deploy Kubernetes pods and monitor dashboards", {
  excludeWords: ["Kubernetes"],
});                                                                 // true

Usage Examples

Custom Patterns — Strip Unwanted Tokens

Use customPatterns to remove regex-matched tokens (e.g. JIRA ticket IDs, codes) before validation:

import { isEnglish } from "english-validator";

// JIRA ticket IDs would normally fail the dictionary check
isEnglish("Fix bug PROJ-1234 in login flow", {
  customPatterns: [/PROJ-\d+/g],
});
// => true

// Multiple patterns
isEnglish("REF:ABC123 the system is operational CODE:XY99", {
  customPatterns: [/REF:\w+/g, /CODE:\w+/g],
});
// => true

Exclude Words — Remove Known Non-Dictionary Terms

Use excludeWords to drop specific words (brand names, internal jargon) before validation:

import { isEnglish } from "english-validator";

// "Kubernetes" and "Grafana" aren't in the dictionary
isEnglish("Deploy Kubernetes pods and monitor with Grafana dashboards", {
  excludeWords: ["Kubernetes", "Grafana"],
});
// => true

// Case-insensitive and whole-word only
isEnglish("The ACME widget is working fine", {
  excludeWords: ["acme"],
});
// => true  ("acme" removed, remaining text is English)

Combining Options

import { isEnglish } from "english-validator";
import type { DetectionOptions } from "english-validator";

const opts: DetectionOptions = {
  customPatterns: [/TKT-\d+/g],
  excludeWords: ["Datadog", "Terraform"],
  englishThreshold: 0.7,
  allowAbbreviations: true,
};

isEnglish("TKT-5678 Deploy Terraform stack monitored by Datadog", opts);
// => true

React Component

import { isEnglish } from "english-validator";

function LanguageCheck({ text }: { text: string }) {
  return (
    <div>
      {isEnglish(text) ? "✅ English" : "❌ Not English"}
    </div>
  );
}

Node.js API Middleware

import { detectNonEnglishText } from "english-validator";

app.post("/api/comment", (req, res) => {
  if (detectNonEnglishText(req.body.text)) {
    return res.status(400).json({ error: "Only English text is accepted" });
  }
  // proceed...
});

Custom Threshold

import { isEnglish } from "english-validator";
import type { DetectionOptions } from "english-validator";

// More lenient — allows mixed-language text
const lenient: DetectionOptions = { englishThreshold: 0.5 };
isEnglish("Hello mundo", lenient); // true (50%+ English)

// Stricter — requires almost all words to be English
const strict: DetectionOptions = { englishThreshold: 0.95 };
isEnglish("Hello mundo", strict);  // false

Use Cases

  • Chatbots & Virtual Assistants — validate that user messages are in English before routing to an English-only NLP pipeline or LLM
  • Content Moderation — reject or flag non-English submissions in forums, comment sections, or review platforms
  • Form Validation — ensure text fields (feedback, support tickets, descriptions) contain English input
  • Data Pipelines & ETL — filter English-only records from multilingual datasets during ingestion
  • CMS & Publishing — gate content uploads to English-only workflows
  • Search Indexing — tag or partition documents by language before indexing
  • Email / Notification Filtering — detect and route non-English inbound messages
  • API Gateways — enforce English-only payloads at the middleware layer

How It Works

  1. Preprocessing — strips document IDs, geographical terms, special characters, user-supplied customPatterns, and excludeWords
  2. Dictionary lookup — each word is checked against a 274k+ English word set
  3. Non-English screening — detects European characters (ä, ö, ü, ñ, etc.), word suffixes (-keit, -ción, -zione), and function words (le, la, der, die, das)
  4. Contraction resolution — splits contractions on apostrophes (e.g. don'tdon) and rechecks the base word against the dictionary
  5. English ratio — calculates the percentage of recognized English words
  6. Trigram fallback — if the ratio is below the threshold, franc provides a statistical language classification as a tiebreaker
  7. Result — returns a boolean

Supported Non-English Language Detection

The library detects non-English text across multiple language families using three complementary techniques: character analysis, suffix matching, and vocabulary/function-word detection.

| Language | Characters | Suffixes | Vocabulary / Function Words | |---|---|---|---| | German | ä ö ü ß | -keit, -schaft | und, oder, aber, wenn, weil, dass, nicht, kein · der, die, das, den, dem, ein, eine | | French | é è ê ë à â ç ù û ÿ æ œ | -eur | est, sont, être, avoir, faire, quand, où, pourquoi · le, la, les, du, des, dans, avec | | Spanish | ñ á í ó ú ¡ ¿ | -ción | que, como, porque, pero, cuando, donde, este, esta · el, los, las, del, al, con, sin, por | | Italian | ì ò | -zione | sono, essere, avere, fare, dire, come, quando, dove · il, lo, gli | | Dutch | — | -baar, -lijk | maar, want, omdat, hoewel, terwijl, dus · het, een, op, aan, voor, met, door | | Portuguese | ção | -agem | eu, tu, ele, ela, nós, isto, isso, aquilo · os, dos, das, nos, nas, um, uma | | Turkish | ş ğ ı | — | ben, sen, biz, siz, onlar, bana, sana, benim, senin | | Scandinavian | å ø æ | — | jeg, mig, min, mit, dig, din, han, hun, den, det, denne, dette | | Polish | ł ń ś ź ż ą ć ę | — | (character-level detection) |

Performance

| Aspect | Detail | |---|---| | Dictionary lookups | O(1) via Set (274k+ entries) | | Word cache | LRU with 5,000 entry limit | | Franc cache | LRU with 1,000 entry limit | | Regex patterns | Precompiled at module load — zero runtime compilation | | Geographical patterns | Built once from dictionary data at module initialisation |

Running Tests

npm test

Contributing

  1. Fork the repo
  2. Create a feature branch (git checkout -b feat/my-feature)
  3. Commit your changes (git commit -am 'Add my feature')
  4. Push to the branch (git push origin feat/my-feature)
  5. Open a Pull Request

License

MIT © nuvayutech