npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@jrc03c/js-nlp-tools

v0.0.13

Published

This is a little set of JS natural language processing tools.

Readme

Intro

This is a little set of JS natural language processing tools.

Installation

npm install --save @jrc03c/js-nlp-tools

Usage

import { Corpus, Document } from "@jrc03c/js-nlp-tools"
import fs from "node:fs"

const doc1 = new Document({
  name: "Frankenstein",
  raw: fs.readFileSync("path/to/frankenstein.txt", "utf8"),
})

const doc2 = new Document({
  name: "Pride & Prejudice",
  raw: fs.readFileSync("path/to/pride-and-prejudice.txt", "utf8"),
})

const doc3 = new Document({
  name: "Moby Dick",
  raw: fs.readFileSync("path/to/moby-dick.txt", "utf8"),
})

const corpus = new Corpus({ docs: [doc1, doc2, doc3] })

corpus.process().then(() => {
  console.log(corpus.computeTFIDFScore("Frankenstein", doc1))
})

API

Corpus

Methods

Corpus(data) (constructor)

Returns a new Corpus instance. Can optionally take a data argument, which is an object with properties corresponding to Corpus instance properties (e.g., docs).

computeIDFScore(word)

Returns the inverse document frequency score for a given word. Is computed as:

\text{IDF} = \text{log}(N / n_t)

Where:

  • $N$ = the total number of documents in the corpus
  • $n_t$ = the number of documents in which the word appears

computeTFScore(word, doc)

Returns the term frequency score for a given word and document. Is computed as:

\text{TF} = 0.5 + 0.5 \frac{f_{t, d}}{\text{max}_{\{t'∈d\}} f_{t',d}}

Where:

  • $f_{t, d}$ = the number of times the word appears in the document
  • $\text{max}_{{t'∈d}} f_{t',d}$ = the number of times the most frequently-occurring word appears in the document

computeTFIDFScore(word, doc)

Returns the tf-idf score for a given word and document. Is computed as the term frequency score multiplied by the inverse document frequency score.

process(progress)

Returns a Promise that resolves once all documents in the corpus have been processed. Can optionally take a callback function that is passed the progress through the documents as a value between 0 and 1.

Properties

docs

An array of Document instances.

hasBeenProcessed

A boolean indicating whether or not the instance's process method has been invoked (and completed).

Document

Methods

Document(data) (constructor)

Returns a new Document instance. Can optionally take a data object with properties corresponding to Document instance properties (e.g., wordCounts).

getWordCount(word)

Returns the number of times word (a string) appears in the document.

process()

Returns a Promise that resolves once the document has been processed (indexed).

Properties

hasBeenProcessed

A boolean representing whether or not the instance's process method has been invoked (and completed).

isCaseSensitive

A boolean representing whether or not case should matter when indexing words.

mostFrequentWord

A string representing the word that appears most frequently in the document.

name

A string representing the name of the document. If no name is assigned via the data object passed into the constructor, then a random string will be assigned as the document's name.

raw

A string representing the raw text on which the document is based.

totalWordCount

A non-negative integer representing the total number of words in the document.

wordCounts

A dictionary that maps words (as strings) to the numbers of times those words appear in the document (as non-negative integers).

Utility functions

clean(raw, shouldPreserveCase)

Given raw (a string) and optionally shouldPreserveCase (a boolean), returns a copy of raw in which all punctuation has been removed and all whitespace characters have been replaced with spaces. By default, shouldPreserveCase is false.

defineReadOnlyProperty(object, name, value)

Defines a read-only property called name on object with the value value. Returns object.

Note that any read-only properties defined this way will fail silently when new values are assigned to them. In other words, you won't be notified when any assignment attempts fail.