ml-classify-text-js

v1.0.1

Published

16 days ago

Lightweight machine learning text classification for Node.js — Naive Bayes with no dependencies

0High
0Medium
0Low

artbrnv

machine-learning text-classification naive-bayes nlp natural-language-processing classifier sentiment ml text

ml-classify-text-js

Lightweight machine learning text classification for Node.js.

Implements a Multinomial Naive Bayes classifier with:

Laplace (add-k) smoothing
Stop word removal
N-gram support (bigrams, trigrams, …)
Model serialisation (toJSON / fromJSON)
TypeScript types included
Zero dependencies

Install

npm install ml-classify-text-js

Quick start

import createClassifier from 'ml-classify-text-js';

const clf = createClassifier();

// Train
clf.train('amazing wonderful fantastic', 'positive');
clf.train('terrible horrible awful',     'negative');

// Classify
const result = clf.classify('this film was absolutely wonderful');
console.log(result.label);      // 'positive'
console.log(result.confidence); // e.g. 0.92

API

`createClassifier(options?)` / `new Classifier(options?)`

Both are equivalent. Returns a Classifier instance.

| Option | Type | Default | Description | |---|---|---|---| | smoothing | number | 1 | Laplace smoothing factor k. Set to 0 to disable. | | removeStopWords | boolean | true | Strip common English stop words before training/classifying | | ngramSize | number | 1 | Generate n-grams up to this size (1 = unigrams only) |

`clf.train(text, label)` → `this`

Train with a single sample. Returns this for chaining.

clf.train('goal scored match football', 'sports')
   .train('npm package javascript',     'tech');

`clf.trainAll(samples)` → `this`

Train with an array of { text, label } objects.

clf.trainAll([
  { text: 'wonderful experience', label: 'positive' },
  { text: 'dreadful outcome',     label: 'negative' },
]);

`clf.classify(text)` → `ClassifyResult`

Returns the best label with confidence and full ranked scores.

const { label, confidence, scores } = clf.classify('great film');
// label:      'positive'
// confidence: 0.94   (softmax-normalised, sums to 1 across all labels)
// scores:     [{ label, score, confidence }, ...]  sorted best-first

`clf.scores(text)` → `ScoreResult[]`

Returns the full ranked list without the shorthand top result.

`clf.getLabels()` → `string[]`

Returns all labels the classifier has been trained on.

`clf.documentCount` → `number`

Total training documents seen.

`clf.vocabularySize` → `number`

Number of unique tokens in the vocabulary.

`clf.topWords(label, n = 10)` → `Array<{ word, count }>`

Most frequent words for a given label — useful for inspecting what the model learned.

clf.topWords('positive', 5);
// [{ word: 'amazing', count: 12 }, { word: 'great', count: 9 }, ...]

Serialisation

// Save
const model = JSON.stringify(clf.toJSON());
localStorage.setItem('model', model); // or write to disk

// Restore
const clf2 = Classifier.fromJSON(JSON.parse(model));
clf2.classify('hello'); // works immediately

N-grams

Enable bigrams to capture phrases like "not good" or "very bad":

const clf = createClassifier({ ngramSize: 2 });
clf.train('not good very bad', 'negative');
clf.train('not bad quite good', 'positive');

// Bigram tokens include: 'not_good', 'very_bad', 'not_bad', 'quite_good'
clf.classify('not good'); // picks up the 'not_good' bigram

CommonJS

const createClassifier = require('ml-classify-text-js');
const { Classifier, tokenize } = require('ml-classify-text-js');

How it works

Multinomial Naive Bayes computes:

P(label | text) ∝ P(label) × ∏ P(word | label)

Prior P(label) — fraction of training documents with that label
Likelihood P(word | label) — smoothed word frequency within the label
Log-probabilities are used to avoid floating-point underflow
Laplace smoothing (P(word|label) = (count + k) / (total + k × |V|)) prevents zero probabilities for unseen words
Confidence values are softmax-normalised log-scores

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ml-classify-text-js

Install

Quick start

API

createClassifier(options?) / new Classifier(options?)

clf.train(text, label) → this

clf.trainAll(samples) → this

clf.classify(text) → ClassifyResult

clf.scores(text) → ScoreResult[]

clf.getLabels() → string[]

clf.documentCount → number

clf.vocabularySize → number

clf.topWords(label, n = 10) → Array<{ word, count }>

Serialisation

N-grams

CommonJS

How it works

License

`createClassifier(options?)` / `new Classifier(options?)`

`clf.train(text, label)` → `this`

`clf.trainAll(samples)` → `this`

`clf.classify(text)` → `ClassifyResult`

`clf.scores(text)` → `ScoreResult[]`

`clf.getLabels()` → `string[]`

`clf.documentCount` → `number`

`clf.vocabularySize` → `number`

`clf.topWords(label, n = 10)` → `Array<{ word, count }>`