word-ngrams

v0.2.0

Published

4 years ago

A package for building and analyzing word nGrams

0High
0Medium
0Low

syeoryn

nGrams ngrams n grams text analysis

####Getting Started Install package with:

  npm install word-ngrams

####Features:

buildNGrams
listAllNGrams
getNGramsByFrequency
getMostCommonNGrams
listNGramsByCount

Documentation

buildNGrams: function(text, unit [, options])
- Maps all nGrams within input text with input unit length (1=unigram, 2=bigram, 3=trigram, ...)
- In constructing the nGram, terminal sentence punctuation (such as periods, question marks, and exclamation marks) and semicolons are considered words, as they also carry meaning. Apostrophes and compound word hyphens are ignored. To signify the end of a paragraph or body of text, null will be used.
- Options include caseSensitive and includePunctuation.
  - If includePunctuation is set to false, then terminal sentence punctuation and the end of the body of text are not included in the nGram.
  - Both caseSensitive and includePunctuation both default to false.
- Example:
```
  buildNGrams(“Hello, World!  How’s the world weather today? Hello, World!”, 2, {caseSensitive: true, includePunctuation: true})
  // returns { Hello: { ,: 2 },
               ,: { World: 2 },
               World: { !: 2 },
               !: { How’s: 1, null: 1},
               How’s: { the: 1 },
               the: { world: 1 },
               world: { weather: 1 },
               weather: { today: 1 },
               today: { ?: 1 },
               ?: { Hello: 1 }
             }
```

listAllNGrams: function(nGrams)

Given an input set of nGrams (of the same format as the buildNGrams output), listAllNGrams will return a list of unique nGrams found in the text.
Example:

  // Example input nGram for “Hello World.  Goodbye World!”, without punctuation
  listAllNGrams({ Hello: { World: 1 }, Goodbye: { world: 1 }})
  // returns [“hello world”, “goodbye world”]

getNGramsByFrequency: function(nGrams, frequency)
- Given an input set of nGrams (of the same format as the buildNGrams output), getNGramsByFrequency will return a list of all nGrams that occur that many times.
- Example:
```
  // Example input nGram for “Hello World”, without punctuation
  getNGramsByFrequency({ hello: { world: 1 }, 1)
  // returns [ “hello world”]
```

getMostCommonNGrams: function(nGrams)

Given an input set of nGrams (of the same format as the buildNGrams output), getMostCommonNGrams will return a list of the most common nGrams.
Example:

  // Example input nGram for “Hello World!  Goodbye World!”, with punctuation
  getMostCommonNGrams({ Hello: { World: 1 }, World: { !: 2 }, !: { Goodbye: 1, null: 1 }, Goodbye: { world: 1 }})
  // returns [“World!”]

listNGramsByCount: function(nGrams)

Given an input set of nGrams (of the same format as the buildNGrams output), listNGramsByCount will return all nGrams sorted into buckets by count.
Example:

  // Example input for “Hello, World!  How’s the weather?  Goodbye, World!”
  listNGramsByCount({ hello: 1, world: 2, “how’s”: 1, the: 1, weather: 1, goodbye: 1})
  // returns { 1: [“hello”, “how’s”, “the”, “weather”, “goodbye”], 2: [“world”]}

View the full specs and check out more text analysis in my Text Analysis Suite.

Pkg
Stats

Discover Tips

General search

Package details

User packages

Sponsor

About

Twitter

GitHub

Twitter

GitHub

Site

Open Software & Tools

Framework

Server

Data Store

Caching

CSS / Styling

Typeface

Avatars

Data Viz

Date formatting

Infinite scrolling

Markdown rendering

Repository url parsing

User data

Compiling

Types

Odds & Ends

word-ngrams

v0.2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Documentation