node-predict.js
v3.0.0
Published
Full lightweight and fast native offline, trainable text prediction engine for Node.js
Maintainers
Readme
node-predict
A lightweight, fully offline, trainable native text prediction engine for Node.js. Provides word completions, next-word predictions, and sentence generation using n-gram language models with TF-IDF weighting.
Quick Start
Load a pre-trained model and make predictions:
const { PredictJS } = require('node-predict')
const predictor = new PredictJS()
predictor.loadIndex('./model-index.json')
// 1) Word completions
console.log(predictor.completeWord('app'))
// 2) Next-word predictions
console.log(predictor.nextWord('The best way to'))
// 3) Complete a sentence
console.log(predictor.complete('Learning new skills'))
// 4) Multiple completions
console.log(predictor.completions('In my opinion', { count: 3 }))
// 5) Combined suggestion (partial + next words)
console.log(predictor.suggest('I en'))Training Your Own Model
To train the model on your custom dataset, edit dataset.txt with your text samples, then run node train.js. The training script reads dataset.txt, builds n-gram models, calculates TF-IDF weights, and saves the trained model to model-index.json.
Training Code Example
Here's how train.js trains the model:
const { PredictJS } = require('./predictjs')
const DATASET_PATH = './dataset.txt'
const INDEX_PATH = './model-index.json'
const predictor = new PredictJS({
// N-gram range
nMin: 2,
nMax: 4,
// Smoothing — keep low for large datasets, raise towards 0.3 for small ones
smoothing: true,
smoothingAlpha: 0.05,
// Ensemble weights [bigram, trigram, 4-gram]
// Higher weight = more influence on predictions
ensembleWeights: [0.15, 0.35, 0.50],
// TF-IDF blending
useTFIDF: true,
tfidfBlend: 0.5,
// Word settings
minWordLength: 2,
maxSuggestions: 5,
// Completion settings
maxCompletionWords: 20,
completionTemp: 0.6,
// Loop prevention
maxRepeatBigram: 2,
penalizeRepeats: true,
// Strip numbers, lowercase, no punctuation
caseSensitive: false,
keepPunctuation: false,
keepNumbers: false,
})
console.log('Training...')
try {
const start = Date.now()
predictor.trainFile(DATASET_PATH)
const elapsed = Date.now() - start
const stats = predictor.getStats()
console.log('✔ Training complete')
console.log('Sentences:', stats.sentences)
console.log('Total tokens:', stats.tokens)
console.log('Unique words:', stats.uniqueWords)
console.log('Time taken:', elapsed, 'ms')
predictor.saveIndex(INDEX_PATH)
console.log('✔ Model saved to', INDEX_PATH)
} catch (err) {
console.error('✘ Training failed:', err.message)
process.exit(1)
}To train with your own data:
- Edit
dataset.txtwith your text samples - Run
node train.js - The trained model is saved to
model-index.json - Use it as shown in the Quick Start section
Configuration Options
Edit train.js to customize training behavior. Common options:
nMin,nMax— n-gram range (default2–4). Controls context window size for predictions.smoothingAlpha— smoothing strength for unseen word pairs (default0.05). Keeps low for large datasets, raise towards 0.3 for small ones.ensembleWeights— weights for [bigram, trigram, 4-gram] predictions (default[0.15, 0.35, 0.50]). Higher weights give more influence.tfidfBlend— blend between raw frequency and TF-IDF scoring (default0.5). Higher values make results less common-word-heavy.maxCompletionWords— maximum words to generate (default20).completionTemp— sampling temperature for completions (default0.6).minWordLength— minimum word length to consider (default2).maxSuggestions— maximum suggestions to return (default5).
After editing options, re-run node train.js to retrain with new settings.
API Reference
predictor.completeWord(prefix)
Returns word completions for a given prefix.
predictor.nextWord(context)
Predicts the next word given preceding context.
predictor.complete(seed)
Generates a complete sentence starting from a seed phrase.
predictor.completions(context, options)
Returns multiple completion options with customizable count.
predictor.suggest(partial)
Combined suggestion: completes partial word + predicts next words.
Temperature (Sampling)
The completionTemp option controls creativity of generated suggestions:
0.0–0.3— very predictable, deterministic0.6— balanced (default), good for most uses0.8–1.0— more creative and varied, may diverge from dataset style
Tips & Troubleshooting
- Poor predictions? Add more in-style text to
dataset.txtand retrain. - Too many common words? Increase dataset variety or raise
tfidfBlendtowards 1.0. - No matches found? Ensure your seed text matches your dataset language/style.
- Slow training? Large datasets may take longer; consider using representative samples.
- Memory usage? Larger n-gram ranges and bigger datasets consume more memory.
SHOW SOME LOVE: https://selar.com/showlove/bossprogrammer
Author: Ismail Gidado
License: MIT
