npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

next-token-prediction

v1.0.6

Published

Create a language model based on a body of text and get high-quality predictions (next word, next phrase, next pixel, etc.). With enough training data and a good chat interface, this can be used instead of well-known decoder-only models like GPT, Mistral,

Downloads

459

Readme

Next-Token Prediction

Create a language model based on a body of text and get high-quality predictions (next word, next phrase, next pixel, etc.). With enough training data and a good chat interface, this can be used instead of well-known decoder-only models like GPT, Mistral, etc.

Install

npm i next-token-prediction

Usage

Simple (from a built-in data bootstrap)

Put this /training/ directory in the root of your project.

Now you just need to create your app's index.js file and run it. Your model will start training on the .txt files located in /training/documents/. After training is complete it will run these 4 queries:

const { Language: LM } = require('next-token-prediction');

const MyLanguageModel = async () => {
  const agent = await LM({
    bootstrap: true
  });

  // Predict the next word

  agent.getTokenPrediction('what');

  // Predict the next 5 words

  agent.getTokenSequencePrediction('what is', 5);

  // Complete the phrase

  agent.complete('hopefully');

  // Get a top k sample of completion predictions

  agent.getCompletions('The sun');
};

MyLanguageModel();

Advanced (provide trainingData or create it from .txt files)

Put this /training/ directory in the root of your project.

Because training data was committed to this repo, you can optionally skip training, and just use the bootstrapped dataset and embeddings, like this:

const { dirname } = require('path');
const __root = dirname(require.main.filename);

const { Language: LM } = require('next-token-prediction');
const OpenSourceBooksDataset = require(`${__root}/training/datasets/OpenSourceBooks`);

const MyLanguageModel = async () => {
  const agent = await LM({
    dataset: OpenSourceBooksDataset
  });

  // Complete the phrase

  agent.complete('hopefully');
};

MyLanguageModel();

Or, train on your own provided text files:

const { dirname } = require('path');
const __root = dirname(require.main.filename);

const { Language: LM } = require('next-token-prediction');

const MyLanguageModel = () => {
  // The following .txt files should exist in a `/training/documents/`
  // directory in the root of your project

  const agent = await LM({
    files: [
      'marie-antoinette',
      'pride-and-prejudice',
      'to-kill-a-mockingbird',
      'basic-algebra',
      'a-history-of-war',
      'introduction-to-c-programming'
    ]
  });

  // Complete the phrase

  agent.complete('hopefully');
};

MyLanguageModel();

Run tests

npm test

Examples

Readline Completion

UI Autocomplete

Videos

https://github.com/bennyschmidt/next-token-prediction/assets/45407493/68c070bd-ee03-4b7e-8ba3-3885f77fd9f9

https://github.com/bennyschmidt/next-token-prediction/assets/45407493/cd4a1102-5a82-4a6f-abb8-e96805fa65fd

(the following is lower quality on GitHub because it's a couple minutes long - this is training & booting up the LM from 0-1)

https://github.com/bennyschmidt/next-token-prediction/assets/45407493/033e8260-6a8c-4627-9195-9a6c8bd843bd

Browser example: Fast autocomplete

With more training data you can get more suggestions, eventually hitting a tipping point where it can complete anything.

https://github.com/bennyschmidt/next-token-prediction/assets/45407493/59c2cd8e-3218-447b-aa33-ea91004a9fdd

Inspiration

3Blue1Brown video on YouTube:

YouTube

Watch: YouTube

Goals

  1. Provide a high-quality text prediction library for:
  • autocomplete
  • autocorrect
  • spell checking
  • search/lookup
  1. Create pixel and audio transformers for other prediction formats

  2. Demystify LLMs & simplify methodologies

  3. Make a high-quality, free/open chat-focused LLM in JavaScript, and an equally sophisticated image-focused diffusion model. Working on this here.