langdetect-jsoo
v1.0.0
Published
OCaml/JS port of the Cybozu langdetect algorithm. Detects the natural language of text using n-gram frequency profiles. Supports 47 languages including English, Chinese, Japanese, Arabic, and many European languages.
Maintainers
Readme
langdetect-jsoo
Language detection for JavaScript/WebAssembly, compiled from OCaml using
js_of_ocaml/wasm_of_ocaml. This is via an OCaml port of the Cybozu langdetect algorithm that uses
n-gram frequency profiles to detect the natural language of text.
Supports 47 languages including English, Chinese, Japanese, Arabic, and many European languages.
Installation
npm install langdetect-jsooQuick Start
Browser (Script Tag)
Pure JavaScript Version (~7.6MB)
<script src="node_modules/langdetect-jsoo/langdetect.js"></script>
<script>
// Wait for library to load
document.addEventListener('langdetectReady', () => {
const lang = langdetect.detect("Hello, world!");
console.log(lang); // "en"
});
</script>WebAssembly Version (~7.5MB WASM + ~12KB loader)
The WASM version offers better performance for repeated detections:
<script src="node_modules/langdetect-jsoo/langdetect_js_main.bc.wasm.js"></script>
<script>
document.addEventListener('langdetectReady', () => {
const lang = langdetect.detect("Bonjour le monde!");
console.log(lang); // "fr"
});
</script>API Reference
langdetect.detect(text)
Detect the most likely language of the input text.
langdetect.detect("The quick brown fox jumps over the lazy dog.")
// Returns: "en"
langdetect.detect("こんにちは世界")
// Returns: "ja"
langdetect.detect("")
// Returns: null (text too short)Parameters:
text(string): The text to analyze
Returns:
string | null: ISO 639-1 language code (e.g., "en", "fr", "zh-cn") ornullif detection fails
langdetect.detectWithProb(text)
Detect the language with confidence score.
langdetect.detectWithProb("Bonjour le monde!")
// Returns: { lang: "fr", prob: 0.9999 }
langdetect.detectWithProb("a")
// Returns: null (text too short)Parameters:
text(string): The text to analyze
Returns:
{ lang: string, prob: number } | null: Object with language code and probability (0-1), ornullif detection fails
langdetect.detectAll(text)
Get all candidate languages with their probabilities.
langdetect.detectAll("Hello world")
// Returns: [
// { lang: "en", prob: 0.857 },
// { lang: "de", prob: 0.095 },
// { lang: "nl", prob: 0.023 },
// ...
// ]Parameters:
text(string): The text to analyze
Returns:
Array<{ lang: string, prob: number }>: Array of language candidates sorted by probability (highest first)
langdetect.languages()
Get the list of supported language codes.
langdetect.languages()
// Returns: ["ar", "bg", "bn", "ca", "cs", "da", "de", "el", "en", ...]Returns:
string[]: Array of ISO 639-1 language codes
Demo
Open langdetect.html in a browser to try the interactive demo. It supports switching between JavaScript and WebAssembly runtimes.
Events
The library dispatches a langdetectReady event on document when fully loaded:
document.addEventListener('langdetectReady', () => {
// langdetect API is now available
console.log('Loaded', langdetect.languages().length, 'languages');
});Algorithm
This library uses the Cybozu langdetect algorithm which:
- Extracts n-grams (1-3 characters) from the input text
- Compares against pre-computed frequency profiles for 47 languages
- Uses a probabilistic model with Bayesian inference
- Applies text normalization for consistent detection
The language profiles contain ~172,000 unique n-grams across all supported languages.
License
MIT
