npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

node-predict.js

v3.0.0

Published

Full lightweight and fast native offline, trainable text prediction engine for Node.js

Readme

node-predict

A lightweight, fully offline, trainable native text prediction engine for Node.js. Provides word completions, next-word predictions, and sentence generation using n-gram language models with TF-IDF weighting.

Quick Start

Load a pre-trained model and make predictions:

const { PredictJS } = require('node-predict')

const predictor = new PredictJS()
predictor.loadIndex('./model-index.json')

// 1) Word completions
console.log(predictor.completeWord('app'))

// 2) Next-word predictions
console.log(predictor.nextWord('The best way to'))

// 3) Complete a sentence
console.log(predictor.complete('Learning new skills'))

// 4) Multiple completions
console.log(predictor.completions('In my opinion', { count: 3 }))

// 5) Combined suggestion (partial + next words)
console.log(predictor.suggest('I en'))

Training Your Own Model

To train the model on your custom dataset, edit dataset.txt with your text samples, then run node train.js. The training script reads dataset.txt, builds n-gram models, calculates TF-IDF weights, and saves the trained model to model-index.json.

Training Code Example

Here's how train.js trains the model:

const { PredictJS } = require('./predictjs')

const DATASET_PATH = './dataset.txt'
const INDEX_PATH = './model-index.json'

const predictor = new PredictJS({
    // N-gram range
    nMin: 2,
    nMax: 4,

    // Smoothing — keep low for large datasets, raise towards 0.3 for small ones
    smoothing: true,
    smoothingAlpha: 0.05,

    // Ensemble weights [bigram, trigram, 4-gram]
    // Higher weight = more influence on predictions
    ensembleWeights: [0.15, 0.35, 0.50],

    // TF-IDF blending
    useTFIDF: true,
    tfidfBlend: 0.5,

    // Word settings
    minWordLength: 2,
    maxSuggestions: 5,

    // Completion settings
    maxCompletionWords: 20,
    completionTemp: 0.6,

    // Loop prevention
    maxRepeatBigram: 2,
    penalizeRepeats: true,

    // Strip numbers, lowercase, no punctuation
    caseSensitive: false,
    keepPunctuation: false,
    keepNumbers: false,
})

console.log('Training...')

try {
    const start = Date.now()
    predictor.trainFile(DATASET_PATH)
    const elapsed = Date.now() - start
    const stats = predictor.getStats()

    console.log('✔ Training complete')
    console.log('Sentences:', stats.sentences)
    console.log('Total tokens:', stats.tokens)
    console.log('Unique words:', stats.uniqueWords)
    console.log('Time taken:', elapsed, 'ms')

    predictor.saveIndex(INDEX_PATH)
    console.log('✔ Model saved to', INDEX_PATH)

} catch (err) {
    console.error('✘ Training failed:', err.message)
    process.exit(1)
}

To train with your own data:

  1. Edit dataset.txt with your text samples
  2. Run node train.js
  3. The trained model is saved to model-index.json
  4. Use it as shown in the Quick Start section

Configuration Options

Edit train.js to customize training behavior. Common options:

  • nMin, nMax — n-gram range (default 2–4). Controls context window size for predictions.
  • smoothingAlpha — smoothing strength for unseen word pairs (default 0.05). Keeps low for large datasets, raise towards 0.3 for small ones.
  • ensembleWeights — weights for [bigram, trigram, 4-gram] predictions (default [0.15, 0.35, 0.50]). Higher weights give more influence.
  • tfidfBlend — blend between raw frequency and TF-IDF scoring (default 0.5). Higher values make results less common-word-heavy.
  • maxCompletionWords — maximum words to generate (default 20).
  • completionTemp — sampling temperature for completions (default 0.6).
  • minWordLength — minimum word length to consider (default 2).
  • maxSuggestions — maximum suggestions to return (default 5).

After editing options, re-run node train.js to retrain with new settings.


API Reference

predictor.completeWord(prefix)

Returns word completions for a given prefix.

predictor.nextWord(context)

Predicts the next word given preceding context.

predictor.complete(seed)

Generates a complete sentence starting from a seed phrase.

predictor.completions(context, options)

Returns multiple completion options with customizable count.

predictor.suggest(partial)

Combined suggestion: completes partial word + predicts next words.


Temperature (Sampling)

The completionTemp option controls creativity of generated suggestions:

  • 0.0–0.3 — very predictable, deterministic
  • 0.6 — balanced (default), good for most uses
  • 0.8–1.0 — more creative and varied, may diverge from dataset style

Tips & Troubleshooting

  • Poor predictions? Add more in-style text to dataset.txt and retrain.
  • Too many common words? Increase dataset variety or raise tfidfBlend towards 1.0.
  • No matches found? Ensure your seed text matches your dataset language/style.
  • Slow training? Large datasets may take longer; consider using representative samples.
  • Memory usage? Larger n-gram ranges and bigger datasets consume more memory.

SHOW SOME LOVE: https://selar.com/showlove/bossprogrammer

Author: Ismail Gidado
License: MIT