wink-embeddings-small-en-50d
v0.0.3
Published
Small English 50-dimensional word-embedding dataset compatible with wink-nlp.
Maintainers
Readme
wink-embeddings-small-en-50d
Small English 50-dimension word-embedding dataset compatible with wink-nlp.
Package size: ≤ 10 MB
Vocabulary: ≈ 5 k–10 k most-common English words (you can regenerate with any size you like).
Installation
npm install wink-embeddings-small-en-50dUsage
import winkNLP from 'wink-nlp';
import model from 'wink-eng-lite-web-model';
import embeddings from 'wink-embeddings-small-en-50d';
const nlp = winkNLP(model);
nlp.readDoc('hello world').tokens().each((t) => {
const word = t.out();
const vector = embeddings[word];
console.log(word, vector);
});Each vector is an array of 50 floats and can be used with cosine similarity, etc.
API
import embeddings from 'wink-embeddings-small-en-50d'
Returns a plain object mapping strings → number[50].
interface Vector extends ReadonlyArray<number> { length: 50; }
interface Embeddings { [word: string]: Vector }Regenerating / Updating the Dataset
A conversion script is provided to build your own subset from any GloVe 50-dimension file.
# Example: download the GloVe 6B 50d file
curl -L https://nlp.stanford.edu/data/glove.6B.zip -o glove.zip
unzip glove.zip glove.6B.50d.txt
# Convert the first 10 000 lines → src/embeddings.json
npm run convert:glove -- ./glove.6B.50d.txt src/embeddings.json 10000Commit the new embeddings.json, rebuild, and publish.
Development
npm install
npm test
npm run buildTesting
The test-suite validates that:
- All keys are strings.
- Every vector has length 50 and all elements are numbers.
npm testPublishing
npm version patch # or minor/major
npm publish --access public🔗 Related
👉 Need to clean and normalize text before embedding it?
Check outtext-prep-lite👉 Need a simple and robust PDF text extraction utility with an quality interface? Check out [
pdf-worker-package]https://www.npmjs.com/package/pdf-worker-package
© 2025 Cavani21/TheGreatBey – MIT License
