english-validator
v2.0.2
Published
Detect whether a sentence is English or non-English. Returns true/false with high accuracy using dictionary lookup and trigram analysis.
Downloads
431
Maintainers
Readme
english-validator
Detect whether a sentence is English or non-English. Returns
true/falsewith high accuracy.
Features
- Dictionary-powered — 274k+ English word dictionary for accurate word-level checks
- Trigram analysis — uses franc as a secondary signal for statistical language detection
- Lightweight API — single function call, returns a boolean
- Configurable — adjustable thresholds, minimum word length, number handling
- Built-in caching — LRU-style memoization for fast repeated lookups
- TypeScript support — ships with full type declarations and JSDoc
- ESM & CJS — works with
importandrequire(zero runtime dependencies)
Installation
npm install english-validatorQuick Start
ESM (React, Next.js, Vite, modern Node.js)
import { isEnglish, detectNonEnglishText } from "english-validator";
isEnglish("The quick brown fox jumps over the lazy dog");
// => true
isEnglish("Ceci est une phrase en français");
// => false
// Or use the inverse API:
detectNonEnglishText("Das ist ein deutscher Satz");
// => true (it IS non-English)
detectNonEnglishText("Hello, how are you?");
// => false (it is NOT non-English)CommonJS (Node.js)
const { isEnglish, detectNonEnglishText } = require("english-validator");
console.log(isEnglish("Hello world")); // trueTypeScript
The package ships with full type declarations. Import types directly:
import {
isEnglish,
detectNonEnglishText,
matchesDocumentPattern,
clearLanguageDetectorCaches,
} from "english-validator";
import type { DetectionOptions } from "english-validator";
// Use DetectionOptions for custom configuration
const options: DetectionOptions = {
englishThreshold: 0.7,
minWordLength: 3,
allowNumbers: false,
};
const result: boolean = isEnglish("Check this text", options);API
isEnglish(text, options?)
Returns true if the text is English, false otherwise.
| Parameter | Type | Description |
| --------- | ------------------ | ---------------------------------- |
| text | string \| null \| undefined | Text to analyse. Returns true for empty/null/undefined. |
| options | DetectionOptions | Optional configuration (see below) |
isEnglish("Hello world"); // true
isEnglish("Bonjour le monde"); // false
isEnglish("", { englishThreshold: 0.5 }); // true (empty)detectNonEnglishText(text, options?)
Returns true if the text is non-English, false if English. Inverse of isEnglish.
detectNonEnglishText("Das ist Deutsch"); // true
detectNonEnglishText("This is English"); // falsematchesDocumentPattern(text)
Returns true if the text matches document ID patterns like AEM01-WI-DSU06-SD01.
matchesDocumentPattern("AEM01-WI-DSU06-SD01"); // true
matchesDocumentPattern("Hello world"); // falseclearLanguageDetectorCaches()
Clears the internal LRU memoization caches. Call this in long-running applications to free memory or to reset state between independent detection sessions.
clearLanguageDetectorCaches(); // frees all cached resultsDetectionOptions
Configuration object accepted by isEnglish and detectNonEnglishText:
| Option | Type | Default | Description |
| ------------------- | ----------- | ------- | ---------------------------------------------------- |
| englishThreshold | number | 0.8 | Ratio of English words needed to classify as English (0.0–1.0) |
| minWordLength | number | 2 | Words shorter than this are skipped during analysis |
| allowNumbers | boolean | true | Treat standalone numbers as valid English tokens |
| allowAbbreviations| boolean | true | Treat uppercase abbreviations (e.g. NATO, FBI) as valid English tokens |
| customPatterns | RegExp[] | — | Regex patterns to strip from text before validation |
| excludeWords | string[] | — | Words to remove from text before validation (case-insensitive, whole-word) |
Note: Short texts (4 words or fewer) automatically use a relaxed threshold of
0.6regardless of the configuredenglishThreshold, to avoid false positives on English fragments.
Quick Examples
import { isEnglish } from "english-validator";
// englishThreshold — lower it to allow mixed-language text
isEnglish("Hello mundo friend", { englishThreshold: 0.5 }); // true (50%+ English)
// minWordLength — skip short words like "a", "I" during analysis
isEnglish("I am a big fan of this", { minWordLength: 3 }); // true
// allowNumbers — treat "42" as a valid English token (default: true)
isEnglish("Order 42 is ready", { allowNumbers: true }); // true
// allowAbbreviations — treat "NATO", "FBI" as valid (default: true)
isEnglish("NATO signed the agreement", { allowAbbreviations: true }); // true
// customPatterns — strip JIRA IDs before validation
isEnglish("Fix bug PROJ-1234 in login flow", {
customPatterns: [/[A-Z]+-\d+/g],
}); // true
// excludeWords — remove brand names / jargon before validation
isEnglish("Deploy Kubernetes pods and monitor dashboards", {
excludeWords: ["Kubernetes"],
}); // trueUsage Examples
Custom Patterns — Strip Unwanted Tokens
Use customPatterns to remove regex-matched tokens (e.g. JIRA ticket IDs, codes) before validation:
import { isEnglish } from "english-validator";
// JIRA ticket IDs would normally fail the dictionary check
isEnglish("Fix bug PROJ-1234 in login flow", {
customPatterns: [/PROJ-\d+/g],
});
// => true
// Multiple patterns
isEnglish("REF:ABC123 the system is operational CODE:XY99", {
customPatterns: [/REF:\w+/g, /CODE:\w+/g],
});
// => trueExclude Words — Remove Known Non-Dictionary Terms
Use excludeWords to drop specific words (brand names, internal jargon) before validation:
import { isEnglish } from "english-validator";
// "Kubernetes" and "Grafana" aren't in the dictionary
isEnglish("Deploy Kubernetes pods and monitor with Grafana dashboards", {
excludeWords: ["Kubernetes", "Grafana"],
});
// => true
// Case-insensitive and whole-word only
isEnglish("The ACME widget is working fine", {
excludeWords: ["acme"],
});
// => true ("acme" removed, remaining text is English)Combining Options
import { isEnglish } from "english-validator";
import type { DetectionOptions } from "english-validator";
const opts: DetectionOptions = {
customPatterns: [/TKT-\d+/g],
excludeWords: ["Datadog", "Terraform"],
englishThreshold: 0.7,
allowAbbreviations: true,
};
isEnglish("TKT-5678 Deploy Terraform stack monitored by Datadog", opts);
// => trueReact Component
import { isEnglish } from "english-validator";
function LanguageCheck({ text }: { text: string }) {
return (
<div>
{isEnglish(text) ? "✅ English" : "❌ Not English"}
</div>
);
}Node.js API Middleware
import { detectNonEnglishText } from "english-validator";
app.post("/api/comment", (req, res) => {
if (detectNonEnglishText(req.body.text)) {
return res.status(400).json({ error: "Only English text is accepted" });
}
// proceed...
});Custom Threshold
import { isEnglish } from "english-validator";
import type { DetectionOptions } from "english-validator";
// More lenient — allows mixed-language text
const lenient: DetectionOptions = { englishThreshold: 0.5 };
isEnglish("Hello mundo", lenient); // true (50%+ English)
// Stricter — requires almost all words to be English
const strict: DetectionOptions = { englishThreshold: 0.95 };
isEnglish("Hello mundo", strict); // falseUse Cases
- Chatbots & Virtual Assistants — validate that user messages are in English before routing to an English-only NLP pipeline or LLM
- Content Moderation — reject or flag non-English submissions in forums, comment sections, or review platforms
- Form Validation — ensure text fields (feedback, support tickets, descriptions) contain English input
- Data Pipelines & ETL — filter English-only records from multilingual datasets during ingestion
- CMS & Publishing — gate content uploads to English-only workflows
- Search Indexing — tag or partition documents by language before indexing
- Email / Notification Filtering — detect and route non-English inbound messages
- API Gateways — enforce English-only payloads at the middleware layer
How It Works
- Preprocessing — strips document IDs, geographical terms, special characters, user-supplied
customPatterns, andexcludeWords - Dictionary lookup — each word is checked against a 274k+ English word set
- Non-English screening — detects European characters (ä, ö, ü, ñ, etc.), word suffixes (-keit, -ción, -zione), and function words (le, la, der, die, das)
- Contraction resolution — splits contractions on apostrophes (e.g.
don't→don) and rechecks the base word against the dictionary - English ratio — calculates the percentage of recognized English words
- Trigram fallback — if the ratio is below the threshold, franc provides a statistical language classification as a tiebreaker
- Result — returns a boolean
Supported Non-English Language Detection
The library detects non-English text across multiple language families using three complementary techniques: character analysis, suffix matching, and vocabulary/function-word detection.
| Language | Characters | Suffixes | Vocabulary / Function Words | |---|---|---|---| | German | ä ö ü ß | -keit, -schaft | und, oder, aber, wenn, weil, dass, nicht, kein · der, die, das, den, dem, ein, eine | | French | é è ê ë à â ç ù û ÿ æ œ | -eur | est, sont, être, avoir, faire, quand, où, pourquoi · le, la, les, du, des, dans, avec | | Spanish | ñ á í ó ú ¡ ¿ | -ción | que, como, porque, pero, cuando, donde, este, esta · el, los, las, del, al, con, sin, por | | Italian | ì ò | -zione | sono, essere, avere, fare, dire, come, quando, dove · il, lo, gli | | Dutch | — | -baar, -lijk | maar, want, omdat, hoewel, terwijl, dus · het, een, op, aan, voor, met, door | | Portuguese | ção | -agem | eu, tu, ele, ela, nós, isto, isso, aquilo · os, dos, das, nos, nas, um, uma | | Turkish | ş ğ ı | — | ben, sen, biz, siz, onlar, bana, sana, benim, senin | | Scandinavian | å ø æ | — | jeg, mig, min, mit, dig, din, han, hun, den, det, denne, dette | | Polish | ł ń ś ź ż ą ć ę | — | (character-level detection) |
Performance
| Aspect | Detail |
|---|---|
| Dictionary lookups | O(1) via Set (274k+ entries) |
| Word cache | LRU with 5,000 entry limit |
| Franc cache | LRU with 1,000 entry limit |
| Regex patterns | Precompiled at module load — zero runtime compilation |
| Geographical patterns | Built once from dictionary data at module initialisation |
Running Tests
npm testContributing
- Fork the repo
- Create a feature branch (
git checkout -b feat/my-feature) - Commit your changes (
git commit -am 'Add my feature') - Push to the branch (
git push origin feat/my-feature) - Open a Pull Request
