node-predict.js

v3.0.0

Published

6 days ago

Full lightweight and fast native offline, trainable text prediction engine for Node.js

0High
0Medium
0Low

ismailgidado

predict autocomplete text-prediction ngram trie tfidf predictjs

node-predict

A lightweight, fully offline, trainable native text prediction engine for Node.js. Provides word completions, next-word predictions, and sentence generation using n-gram language models with TF-IDF weighting.

Quick Start

Load a pre-trained model and make predictions:

const { PredictJS } = require('node-predict')

const predictor = new PredictJS()
predictor.loadIndex('./model-index.json')

// 1) Word completions
console.log(predictor.completeWord('app'))

// 2) Next-word predictions
console.log(predictor.nextWord('The best way to'))

// 3) Complete a sentence
console.log(predictor.complete('Learning new skills'))

// 4) Multiple completions
console.log(predictor.completions('In my opinion', { count: 3 }))

// 5) Combined suggestion (partial + next words)
console.log(predictor.suggest('I en'))

Training Your Own Model

To train the model on your custom dataset, edit dataset.txt with your text samples, then run node train.js. The training script reads dataset.txt, builds n-gram models, calculates TF-IDF weights, and saves the trained model to model-index.json.

Training Code Example

Here's how train.js trains the model:

const { PredictJS } = require('./predictjs')

const DATASET_PATH = './dataset.txt'
const INDEX_PATH = './model-index.json'

const predictor = new PredictJS({
    // N-gram range
    nMin: 2,
    nMax: 4,

    // Smoothing — keep low for large datasets, raise towards 0.3 for small ones
    smoothing: true,
    smoothingAlpha: 0.05,

    // Ensemble weights [bigram, trigram, 4-gram]
    // Higher weight = more influence on predictions
    ensembleWeights: [0.15, 0.35, 0.50],

    // TF-IDF blending
    useTFIDF: true,
    tfidfBlend: 0.5,

    // Word settings
    minWordLength: 2,
    maxSuggestions: 5,

    // Completion settings
    maxCompletionWords: 20,
    completionTemp: 0.6,

    // Loop prevention
    maxRepeatBigram: 2,
    penalizeRepeats: true,

    // Strip numbers, lowercase, no punctuation
    caseSensitive: false,
    keepPunctuation: false,
    keepNumbers: false,
})

console.log('Training...')

try {
    const start = Date.now()
    predictor.trainFile(DATASET_PATH)
    const elapsed = Date.now() - start
    const stats = predictor.getStats()

    console.log('✔ Training complete')
    console.log('Sentences:', stats.sentences)
    console.log('Total tokens:', stats.tokens)
    console.log('Unique words:', stats.uniqueWords)
    console.log('Time taken:', elapsed, 'ms')

    predictor.saveIndex(INDEX_PATH)
    console.log('✔ Model saved to', INDEX_PATH)

} catch (err) {
    console.error('✘ Training failed:', err.message)
    process.exit(1)
}

To train with your own data:

Edit dataset.txt with your text samples
Run node train.js
The trained model is saved to model-index.json
Use it as shown in the Quick Start section

Configuration Options

Edit train.js to customize training behavior. Common options:

nMin, nMax — n-gram range (default 2–4). Controls context window size for predictions.
smoothingAlpha — smoothing strength for unseen word pairs (default 0.05). Keeps low for large datasets, raise towards 0.3 for small ones.
ensembleWeights — weights for [bigram, trigram, 4-gram] predictions (default [0.15, 0.35, 0.50]). Higher weights give more influence.
tfidfBlend — blend between raw frequency and TF-IDF scoring (default 0.5). Higher values make results less common-word-heavy.
maxCompletionWords — maximum words to generate (default 20).
completionTemp — sampling temperature for completions (default 0.6).
minWordLength — minimum word length to consider (default 2).
maxSuggestions — maximum suggestions to return (default 5).

After editing options, re-run node train.js to retrain with new settings.

API Reference

`predictor.completeWord(prefix)`

Returns word completions for a given prefix.

`predictor.nextWord(context)`

Predicts the next word given preceding context.

`predictor.complete(seed)`

Generates a complete sentence starting from a seed phrase.

`predictor.completions(context, options)`

Returns multiple completion options with customizable count.

`predictor.suggest(partial)`

Combined suggestion: completes partial word + predicts next words.

Temperature (Sampling)

The completionTemp option controls creativity of generated suggestions:

0.0–0.3 — very predictable, deterministic
0.6 — balanced (default), good for most uses
0.8–1.0 — more creative and varied, may diverge from dataset style

Tips & Troubleshooting

Poor predictions? Add more in-style text to dataset.txt and retrain.
Too many common words? Increase dataset variety or raise tfidfBlend towards 1.0.
No matches found? Ensure your seed text matches your dataset language/style.
Slow training? Large datasets may take longer; consider using representative samples.
Memory usage? Larger n-gram ranges and bigger datasets consume more memory.

SHOW SOME LOVE: https://selar.com/showlove/bossprogrammer

Author: Ismail Gidado
License: MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme