starlight-vec
v1.0.0
Published
A lightweight machine learning and vectorization library for Starlight.
Maintainers
Readme
Starlight Vectorizer (starlight-vec)
starlight-vec is a lightweight machine learning library for Starlight projects.
It provides tools for text vectorization, TF-IDF computation, and cosine similarity for natural language processing tasks.
Features
- Tokenize and remove stopwords using
starlight-ml - Fit a vectorizer on a list of documents
- Transform single or multiple documents into TF-IDF vectors
- Compute cosine similarity between vectors
- Normalize vectors for consistent comparison
Installation
npm install starlight-vecNote: Requires Node.js ≥ 14 and
starlight-mlinstalled in your project.
Usage
import * as ml from 'starlight-ml';
import { Vectorizer, vectorize, normalize } from 'starlight-vec';
// Sample documents
const docs = [
"I love machine learning",
"Starlight ML is amazing",
"Vectorization makes NLP tasks easier"
];
// Create and fit a vectorizer
const vec = vectorize(docs, ['is', 'a', 'the']); // optional stopwords
// Transform a new document
const docVector = vec.transform("I love NLP and Starlight");
console.log("TF-IDF vector:", docVector);
// Transform multiple documents
const batchVectors = vec.transformBatch(docs);
console.log("Batch TF-IDF vectors:", batchVectors);
// Compute cosine similarity between two documents
const similarity = Vectorizer.cosine(batchVectors[0], batchVectors[1]);
console.log(`Similarity: ${similarity}`);API
Vectorizer(stopwords = [])
Create a new vectorizer.
stopwords– optional array of words to ignore during tokenization.
fit(texts)
Fit the vectorizer to an array of documents.
texts– array of strings
transform(text)
Transform a single document into a TF-IDF vector.
- Returns a normalized array of numbers.
- Throws an error if the vectorizer is not fitted.
transformBatch(texts)
Transform multiple documents into vectors.
- Returns an array of normalized arrays.
Vectorizer.cosine(v1, v2)
Compute cosine similarity between two vectors.
- Returns a number between 0 and 1.
normalize(arr)
Normalize an array of numbers to the range [0, 1].
- Useful for consistent vector comparison.
vectorize(texts, stopwords)
Convenience function to create, fit, and return a Vectorizer.
License
MIT © Dominex Macedon
