npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

synonym-optimizer

v5.3.0

Published

Finds the text which has the least number of repetitions

Downloads

648

Readme

synonym-optimizer

Gives a score to a string depending on the variety of the synonyms used.

For instance, let's compare The coffee is good. I love that coffee with The coffee is good. I love that bewerage. The second alternative is better because a synonym is used for coffee. This module will give a better score to the second alternative.

The lowest score the better.

Fully supported languages are French German English Italian and Spanish.

What it does / How it works:

  • single words are extracted thanks to a tokenizer wink-tokenizer
  • words are lowercased
  • stopwords are removed
    • for fully supported languages, a default stopwords list is included, which you can customize
    • for all other languages, no default list is included, but you can provide a custom stop words lists
  • for fully supported languages, words are stemmed using snowball-stemmer (for all other languages: no stemming)
  • when the same word appears multiples times, it raises the score depending on the distance of the two occurrences (if the occurrences are closes it raises the score a lot)

Designed primarly to test the output of a NLG (Natural Language Generation) system.

The stemmer is not perfect. For instance in Italian, cameriere and cameriera have the same stem (camerier), while camerieri and cameriera have a different one (camer and camerier).

Installation

npm install synonym-optimizer

Usage

var synOptimizer = require('synonym-optimizer');

alts = [
  'The coffee is good. I love that coffee.',
  'The coffee is good. I love that bewerage.'
]

/*
The coffee is good. I love that coffee.: 0.5
The coffee is good. I love that bewerage.: 0
*/
alts.forEach((alt) => {
  let score = synOptimizer.scoreAlternative('en_US', alt, null, null, null, null);
  console.log(`${alt}: ${score}`);
});

The main function is scoreAlternative. It takes a string and returns its score. Arguments are:

  • lang (string, mandatory): the language.
    • fully supported languages are fr_FR, en_US, de_DE, it_IT and es_ES
    • with any other language (for instance Dutch nl_NL) stemming is disabled and stopwords are not removed
  • alternative (string, mandatory): the string to score
  • stopWordsToAdd (string[], optional): list of stopwords to add to the standard stopwords list
  • stopWordsToRemove (string[], optional): list of stopwords to remove to the standard stopwords list
  • stopWordsOverride (string[], optional): replaces the standard stopword list
  • identicals (string[][], optional): list of words that should be considered as beeing identical, for instance [ ['phone', 'cellphone', 'smartphone'] ].

You can also use the getBest function. Most arguments are exactly the same, but instead of alternative, use alternatives (string[]). The output number will not be the score, but simply the index of the best alternative.

The tokenizer is wink-tokenizer, it does works with many languages (English, French, German, Hindi, Sanskrit, Marathi etc.) but not asian languages. Therefore the module will not work properly with Japanese, Chinese etc.

Adding new languages (for developpers / maintainers)

  • check for existence of stopwords module: stopwords-*
  • check for stemmer in snowball-stemmer collection (or plug another stemmer)
  • plug everything and add tests
  • find a proper tokenizer if wink-tokenizer does not work

Misc

The build writes stopwords a asciidoc in the rosaenlg-doc module.

Dependencies and licences

  • wink-tokenizer to tokenize sentences in multiple languages (MIT).
  • stopwords-en/de/fs/it/es for standard stopwords lists per language (MIT).
  • snowball-stemmer to stem words per language (MIT).