ml-classify-text-js
v1.0.1
Published
Lightweight machine learning text classification for Node.js — Naive Bayes with no dependencies
Maintainers
Readme
ml-classify-text-js
Lightweight machine learning text classification for Node.js.
Implements a Multinomial Naive Bayes classifier with:
- Laplace (add-k) smoothing
- Stop word removal
- N-gram support (bigrams, trigrams, …)
- Model serialisation (
toJSON/fromJSON) - TypeScript types included
- Zero dependencies
Install
npm install ml-classify-text-jsQuick start
import createClassifier from 'ml-classify-text-js';
const clf = createClassifier();
// Train
clf.train('amazing wonderful fantastic', 'positive');
clf.train('terrible horrible awful', 'negative');
// Classify
const result = clf.classify('this film was absolutely wonderful');
console.log(result.label); // 'positive'
console.log(result.confidence); // e.g. 0.92API
createClassifier(options?) / new Classifier(options?)
Both are equivalent. Returns a Classifier instance.
| Option | Type | Default | Description |
|---|---|---|---|
| smoothing | number | 1 | Laplace smoothing factor k. Set to 0 to disable. |
| removeStopWords | boolean | true | Strip common English stop words before training/classifying |
| ngramSize | number | 1 | Generate n-grams up to this size (1 = unigrams only) |
clf.train(text, label) → this
Train with a single sample. Returns this for chaining.
clf.train('goal scored match football', 'sports')
.train('npm package javascript', 'tech');clf.trainAll(samples) → this
Train with an array of { text, label } objects.
clf.trainAll([
{ text: 'wonderful experience', label: 'positive' },
{ text: 'dreadful outcome', label: 'negative' },
]);clf.classify(text) → ClassifyResult
Returns the best label with confidence and full ranked scores.
const { label, confidence, scores } = clf.classify('great film');
// label: 'positive'
// confidence: 0.94 (softmax-normalised, sums to 1 across all labels)
// scores: [{ label, score, confidence }, ...] sorted best-firstclf.scores(text) → ScoreResult[]
Returns the full ranked list without the shorthand top result.
clf.getLabels() → string[]
Returns all labels the classifier has been trained on.
clf.documentCount → number
Total training documents seen.
clf.vocabularySize → number
Number of unique tokens in the vocabulary.
clf.topWords(label, n = 10) → Array<{ word, count }>
Most frequent words for a given label — useful for inspecting what the model learned.
clf.topWords('positive', 5);
// [{ word: 'amazing', count: 12 }, { word: 'great', count: 9 }, ...]Serialisation
// Save
const model = JSON.stringify(clf.toJSON());
localStorage.setItem('model', model); // or write to disk
// Restore
const clf2 = Classifier.fromJSON(JSON.parse(model));
clf2.classify('hello'); // works immediatelyN-grams
Enable bigrams to capture phrases like "not good" or "very bad":
const clf = createClassifier({ ngramSize: 2 });
clf.train('not good very bad', 'negative');
clf.train('not bad quite good', 'positive');
// Bigram tokens include: 'not_good', 'very_bad', 'not_bad', 'quite_good'
clf.classify('not good'); // picks up the 'not_good' bigramCommonJS
const createClassifier = require('ml-classify-text-js');
const { Classifier, tokenize } = require('ml-classify-text-js');How it works
Multinomial Naive Bayes computes:
P(label | text) ∝ P(label) × ∏ P(word | label)- Prior
P(label)— fraction of training documents with that label - Likelihood
P(word | label)— smoothed word frequency within the label - Log-probabilities are used to avoid floating-point underflow
- Laplace smoothing (
P(word|label) = (count + k) / (total + k × |V|)) prevents zero probabilities for unseen words - Confidence values are softmax-normalised log-scores
License
MIT
