wink-embeddings-small-en-50d

v0.0.3

Published

9 months ago

Small English 50-dimensional word-embedding dataset compatible with wink-nlp.

0High
0Medium
0Low

cavani21

wink-nlp embeddings glove nlp word-embeddings wink-embeddings semantic vector natural-language-processing text-analysis

wink-embeddings-small-en-50d

Small English 50-dimension word-embedding dataset compatible with wink-nlp.

Package size: ≤ 10 MB
Vocabulary: ≈ 5 k–10 k most-common English words (you can regenerate with any size you like).

Installation

npm install wink-embeddings-small-en-50d

Usage

import winkNLP from 'wink-nlp';
import model from 'wink-eng-lite-web-model';
import embeddings from 'wink-embeddings-small-en-50d';

const nlp = winkNLP(model);

nlp.readDoc('hello world').tokens().each((t) => {
  const word = t.out();
  const vector = embeddings[word];
  console.log(word, vector);
});

Each vector is an array of 50 floats and can be used with cosine similarity, etc.

API

`import embeddings from 'wink-embeddings-small-en-50d'`

Returns a plain object mapping strings → number[50].

interface Vector extends ReadonlyArray<number> { length: 50; }
interface Embeddings { [word: string]: Vector }

Regenerating / Updating the Dataset

A conversion script is provided to build your own subset from any GloVe 50-dimension file.

# Example: download the GloVe 6B 50d file
curl -L https://nlp.stanford.edu/data/glove.6B.zip -o glove.zip
unzip glove.zip glove.6B.50d.txt

# Convert the first 10 000 lines → src/embeddings.json
npm run convert:glove -- ./glove.6B.50d.txt src/embeddings.json 10000

Commit the new embeddings.json, rebuild, and publish.

Development

npm install
npm test
npm run build

Testing

The test-suite validates that:

All keys are strings.
Every vector has length 50 and all elements are numbers.

npm test

Publishing

npm version patch   # or minor/major
npm publish --access public

🔗 Related

👉 Need to clean and normalize text before embedding it?
Check out text-prep-lite
👉 Need a simple and robust PDF text extraction utility with an quality interface? Check out [pdf-worker-package]https://www.npmjs.com/package/pdf-worker-package

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

wink-embeddings-small-en-50d

Installation

Usage

API

import embeddings from 'wink-embeddings-small-en-50d'

Regenerating / Updating the Dataset

Development

Testing

Publishing

🔗 Related

`import embeddings from 'wink-embeddings-small-en-50d'`