npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

simple-ml

v1.0.4

Published

A simple, robust JavaScript machine learning library with regression, classification, clustering, and preprocessing algorithms

Readme

Simple-ML

A simple, robust JavaScript machine learning library built from scratch with no external dependencies. Simple-ML provides easy-to-use implementations of popular machine learning algorithms for regression, classification, clustering, and data preprocessing.

🎯 Philosophy

  • Simplicity: Intuitive and consistent API
  • Robustness: Rigorous input validation and edge case handling
  • Performance: Optimized pure JavaScript implementations
  • Modularity: Clear organizational structure

📦 Installation

Node.js / NPM

npm install simple-ml

Browser (via CDN)

<!-- Via unpkg CDN -->
<script src="https://unpkg.com/simple-ml/dist/simple-ml.umd.js"></script>
<script>
  const { LinearRegression } = SimpleML;
  const model = new LinearRegression();
</script>

ES Modules (Modern Browsers)

<script type="module">
  import { LinearRegression } from 'https://unpkg.com/simple-ml/dist/simple-ml.modern.js';
  const model = new LinearRegression();
</script>

🚀 Quick Start

import { LinearRegression, trainTestSplit } from 'simple-ml';

// Prepare your data (use at least 10-20 samples for reliable results)
const X = [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]];
const y = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20];

// Split into training and test sets
const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y, { testSize: 0.2 });

// Create and train model
const model = new LinearRegression();
model.fit(XTrain, yTrain);

// Make predictions
const predictions = model.predict(XTest);

// Evaluate model
const score = model.score(XTest, yTest);
console.log('R² Score:', score);  // Close to 1.0 for perfect fit

📚 Complete API Reference with Examples

1. Regression Algorithms

1.1 Linear Regression

Ordinary Least Squares Linear Regression.

import { LinearRegression } from 'simple-ml';

// Create model with options
const model = new LinearRegression({
  fitIntercept: true,  // Whether to calculate intercept (default: true)
  normalize: false      // Whether to normalize features (default: false)
});

// Training data
const X = [[1], [2], [3], [4], [5]];
const y = [2, 4, 6, 8, 10];

// Fit the model
model.fit(X, y);

// Access model parameters
console.log('Coefficients:', model.coefficients);  // [2.0]
console.log('Intercept:', model.intercept);        // 0.0

// Make predictions
const predictions = model.predict([[6], [7]]);
console.log('Predictions:', predictions);  // [12, 14]

// Evaluate model (R² score)
const score = model.score(X, y);
console.log('R² Score:', score);  // 1.0 (perfect fit)

Multiple Features Example:

// Multiple features
const X = [
  [1, 2],
  [2, 3],
  [3, 4],
  [4, 5]
];
const y = [5, 8, 11, 14];

const model = new LinearRegression();
model.fit(X, y);

console.log('Coefficients:', model.coefficients);  // [1.0, 2.0]
console.log('Intercept:', model.intercept);

const pred = model.predict([[5, 6]]);
console.log('Prediction:', pred);  // [17]

1.2 Ridge Regression

Linear Regression with L2 regularization.

import { RidgeRegression } from 'simple-ml';

// Create Ridge model
const ridge = new RidgeRegression({
  alpha: 1.0,          // Regularization strength (default: 1.0)
  fitIntercept: true,
  normalize: false
});

// Training data
const X = [[1], [2], [3], [4], [5]];
const y = [2.1, 3.9, 6.2, 7.8, 10.1];

// Fit model
ridge.fit(X, y);

console.log('Coefficients:', ridge.coefficients);
console.log('Intercept:', ridge.intercept);

// Make predictions
const predictions = ridge.predict([[6], [7]]);
console.log('Predictions:', predictions);

// Evaluate
const score = ridge.score(X, y);
console.log('R² Score:', score);

Tuning Alpha Example:

// Compare different alpha values
const alphas = [0.1, 1.0, 10.0, 100.0];

alphas.forEach(alpha => {
  const model = new RidgeRegression({ alpha });
  model.fit(X, y);
  const score = model.score(X, y);
  console.log(`Alpha ${alpha}: R² = ${score.toFixed(4)}`);
});

1.3 Lasso Regression

Linear Regression with L1 regularization (feature selection).

import { LassoRegression } from 'simple-ml';

// Create Lasso model
const lasso = new LassoRegression({
  alpha: 0.1,           // Regularization strength (default: 1.0)
  maxIterations: 1000,  // Max iterations for coordinate descent
  tolerance: 1e-4,      // Convergence tolerance
  fitIntercept: true
});

// Training data with correlated features
const X = [
  [1, 1],
  [2, 2],
  [3, 3],
  [4, 4],
  [5, 5]
];
const y = [2, 4, 6, 8, 10];

// Fit model
lasso.fit(X, y);

console.log('Coefficients:', lasso.coefficients);
console.log('Intercept:', lasso.intercept);

// Lasso may zero out some coefficients
console.log('Non-zero features:',
  lasso.coefficients.filter(c => Math.abs(c) > 1e-10).length
);

// Predictions
const predictions = lasso.predict([[6, 6]]);
console.log('Predictions:', predictions);

1.4 Logistic Regression

Binary and multiclass classification using logistic function.

import { LogisticRegression } from 'simple-ml';

// Binary classification
const logReg = new LogisticRegression({
  learningRate: 0.1,     // Learning rate for gradient descent
  maxIterations: 1000,   // Maximum iterations
  tolerance: 1e-4,       // Convergence tolerance
  penalty: 'l2',         // Regularization: 'l2', 'l1', or 'none'
  C: 1.0,                // Inverse regularization strength
  multiClass: 'ovr'      // 'ovr' (one-vs-rest) or 'multinomial' (softmax)
});

// Binary classification data
const X = [
  [1, 2], [2, 3], [3, 1],  // Class 0
  [6, 5], [7, 7], [8, 6]   // Class 1
];
const y = [0, 0, 0, 1, 1, 1];

// Fit model
logReg.fit(X, y);

console.log('Coefficients:', logReg.coefficients);
console.log('Intercept:', logReg.intercept);

// Predict classes
const predictions = logReg.predict([[2, 2], [7, 6]]);
console.log('Predictions:', predictions);  // [0, 1]

// Predict probabilities
const probabilities = logReg.predictProba([[2, 2], [7, 6]]);
console.log('Probabilities:', probabilities);
// [[0.95, 0.05], [0.02, 0.98]]

// Evaluate
const score = logReg.score(X, y);
console.log('Accuracy:', score);

Multiclass Example:

// Multiclass classification
const X = [
  [1, 1], [1, 2], [2, 1],  // Class 0
  [5, 5], [5, 6], [6, 5],  // Class 1
  [9, 9], [9, 10], [10, 9] // Class 2
];
const y = [0, 0, 0, 1, 1, 1, 2, 2, 2];

// Use 'multinomial' for ordered/continuous classes
const multiLogReg = new LogisticRegression({
  multiClass: 'multinomial',  // Better for this type of data
  learningRate: 0.1,
  maxIterations: 1000
});
multiLogReg.fit(X, y);

const pred = multiLogReg.predict([[2, 2], [6, 6], [10, 10]]);
console.log('Multiclass Predictions:', pred);  // [0, 1, 2]

const proba = multiLogReg.predictProba([[2, 2], [6, 6], [10, 10]]);
console.log('Class Probabilities:', proba);
// [[0.803, 0.195, 0.002],  // → class 0
//  [0.007, 0.708, 0.285],  // → class 1
//  [0.000, 0.057, 0.943]]  // → class 2

console.log('Accuracy:', multiLogReg.score(X, y));  // 1.0

Choosing multiClass Mode:

  • 'ovr' (One-vs-Rest, default): Fast and works well for independent categories

    • Use for: Animals (cat, dog, bird), Topics (sports, politics, tech)
    • Each class vs all others is trained separately
  • 'multinomial' (Softmax): More robust, handles ordered/continuous classes better

    • Use for: Ratings (low, medium, high), Sizes (S, M, L, XL)
    • Trains all classes simultaneously with softmax function
    • Recommended when classes have natural ordering
// Example: 'ovr' for independent categories
const categories = new LogisticRegression({ multiClass: 'ovr' });
const X_cat = [[1, 0], [0, 1], [1, 1]];
const y_cat = ['cat', 'dog', 'bird'];
categories.fit(X_cat, y_cat);

// Example: 'multinomial' for ordered classes
const ratings = new LogisticRegression({ multiClass: 'multinomial' });
const X_rating = [[1, 2], [5, 6], [9, 10]];
const y_rating = ['low', 'medium', 'high'];
ratings.fit(X_rating, y_rating);

1.5 Polynomial Regression

Regression with polynomial features.

import { PolynomialRegression } from 'simple-ml';

// Create polynomial model
const poly = new PolynomialRegression({
  degree: 2,           // Polynomial degree (default: 2)
  fitIntercept: true,
  normalize: false
});

// Non-linear data
const X = [[1], [2], [3], [4], [5]];
const y = [1, 4, 9, 16, 25];  // y = x²

// Fit model
poly.fit(X, y);

console.log('Coefficients:', poly.coefficients);
console.log('Intercept:', poly.intercept);

// Predictions
const predictions = poly.predict([[6], [7]]);
console.log('Predictions:', predictions);  // [36, 49]

// Evaluate
const score = poly.score(X, y);
console.log('R² Score:', score);  // Close to 1.0

Higher Degree Example:

// Cubic polynomial
const cubicPoly = new PolynomialRegression({ degree: 3 });

const X = [[1], [2], [3], [4]];
const y = [1, 8, 27, 64];  // y = x³

cubicPoly.fit(X, y);
const pred = cubicPoly.predict([[5]]);
console.log('Prediction for x=5:', pred);  // [125]

2. Classification Algorithms

2.1 K-Nearest Neighbors (KNN)

Non-parametric classification based on nearest neighbors.

import { KNeighborsClassifier } from 'simple-ml';

// Create KNN classifier
const knn = new KNeighborsClassifier({
  k: 3,                 // Number of neighbors (default: 5)
  weights: 'uniform'    // 'uniform' or 'distance'
});

// Training data
const X = [
  [1, 2], [2, 3], [3, 1],  // Class 'A'
  [6, 5], [7, 7], [8, 6]   // Class 'B'
];
const y = ['A', 'A', 'A', 'B', 'B', 'B'];

// Fit model (stores training data)
knn.fit(X, y);

// Predict
const predictions = knn.predict([[2, 2], [7, 6]]);
console.log('Predictions:', predictions);  // ['A', 'B']

// Predict with probabilities
const probabilities = knn.predictProba([[2, 2]]);
console.log('Probabilities:', probabilities);

// Evaluate
const score = knn.score(X, y);
console.log('Accuracy:', score);

Distance-Weighted KNN:

// Use distance weighting
const weightedKnn = new KNeighborsClassifier({
  k: 5,
  weights: 'distance'  // Closer neighbors have more influence
});

weightedKnn.fit(X, y);
const pred = weightedKnn.predict([[4, 4]]);
console.log('Distance-weighted prediction:', pred);

Finding Optimal K:

import { trainTestSplit, accuracy } from 'simple-ml';

const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y, { testSize: 0.3 });

// Test different k values
for (let k = 1; k <= 10; k++) {
  const model = new KNeighborsClassifier({ k });
  model.fit(XTrain, yTrain);
  const pred = model.predict(XTest);
  const acc = accuracy(yTest, pred);
  console.log(`k=${k}: Accuracy = ${acc.toFixed(3)}`);
}

2.2 Gaussian Naive Bayes

Probabilistic classifier assuming Gaussian distribution.

import { GaussianNaiveBayes } from 'simple-ml';

// Create model
const gnb = new GaussianNaiveBayes({
  priors: null  // Class priors (default: null = uniform)
});

// Training data
const X = [
  [1, 2], [2, 3], [3, 4],  // Class 0
  [6, 7], [7, 8], [8, 9]   // Class 1
];
const y = [0, 0, 0, 1, 1, 1];

// Fit model
gnb.fit(X, y);

// Access learned parameters
console.log('Class Priors:', gnb.classPrior);
console.log('Means:', gnb.theta);
console.log('Variances:', gnb.sigma);

// Predict
const predictions = gnb.predict([[2, 3], [7, 8]]);
console.log('Predictions:', predictions);  // [0, 1]

// Predict probabilities
const probabilities = gnb.predictProba([[4, 5]]);
console.log('Probabilities:', probabilities);

// Evaluate
const score = gnb.score(X, y);
console.log('Accuracy:', score);

2.3 Multinomial Naive Bayes

Naive Bayes for discrete/count features (e.g., text classification).

import { MultinomialNaiveBayes } from 'simple-ml';

// Create model
const mnb = new MultinomialNaiveBayes({
  alpha: 1.0  // Laplace smoothing parameter
});

// Training data (word counts)
const X = [
  [2, 1, 0],  // Document 1: "spam" words
  [1, 1, 0],  // Document 2: "spam" words
  [0, 0, 2],  // Document 3: "ham" words
  [0, 1, 2]   // Document 4: "ham" words
];
const y = ['spam', 'spam', 'ham', 'ham'];

// Fit model
mnb.fit(X, y);

// Predict
const predictions = mnb.predict([[2, 0, 1], [0, 0, 3]]);
console.log('Predictions:', predictions);

// Predict probabilities
const probabilities = mnb.predictProba([[1, 1, 1]]);
console.log('Probabilities:', probabilities);

2.4 Decision Tree Classifier

Tree-based classifier with interpretable rules.

import { DecisionTreeClassifier } from 'simple-ml';

// Create decision tree
const dt = new DecisionTreeClassifier({
  criterion: 'gini',      // 'gini' or 'entropy'
  maxDepth: 5,            // Maximum tree depth (default: Infinity)
  minSamplesSplit: 2,     // Min samples to split a node
  minSamplesLeaf: 1,      // Min samples in leaf node
  maxFeatures: null       // Max features to consider
});

// Training data
const X = [
  [2.5, 2.5], [3, 3], [2, 3],    // Class 0
  [7, 7], [8, 6], [7, 8],         // Class 1
  [3, 8], [4, 7], [3, 7]          // Class 2
];
const y = [0, 0, 0, 1, 1, 1, 2, 2, 2];

// Fit model
dt.fit(X, y);

// Predict
const predictions = dt.predict([[2.5, 2.5], [7.5, 7], [3.5, 7.5]]);
console.log('Predictions:', predictions);  // [0, 1, 2]

// Evaluate
const score = dt.score(X, y);
console.log('Accuracy:', score);

// Get feature importances (if available)
if (dt.featureImportances) {
  console.log('Feature Importances:', dt.featureImportances);
}

Using Entropy:

const entropyDT = new DecisionTreeClassifier({
  criterion: 'entropy',
  maxDepth: 3
});

entropyDT.fit(X, y);
const pred = entropyDT.predict([[5, 5]]);
console.log('Entropy-based prediction:', pred);

3. Clustering

3.1 K-Means Clustering

Centroid-based clustering algorithm.

import { KMeans } from 'simple-ml';

// Create K-Means model
const kmeans = new KMeans({
  nClusters: 3,           // Number of clusters (required)
  maxIterations: 300,     // Max iterations (default: 300)
  tolerance: 1e-4,        // Convergence tolerance
  initMethod: 'kmeans++', // 'kmeans++' or 'random'
  nInit: 10,              // Number of initializations
  randomState: 42         // Random seed for reproducibility
});

// Data to cluster
const X = [
  [1, 2], [1.5, 1.8], [1, 0.6],     // Cluster 1
  [5, 8], [6, 9], [5, 7],            // Cluster 2
  [10, 2], [9, 3], [10, 3]           // Cluster 3
];

// Fit model
kmeans.fit(X);

// Get cluster labels
console.log('Labels:', kmeans.labels);
// [0, 0, 0, 1, 1, 1, 2, 2, 2]

// Get cluster centroids
console.log('Centroids:', kmeans.centroids);
// [[1.17, 1.47], [5.33, 8.0], [9.67, 2.67]]

// Get inertia (sum of squared distances)
console.log('Inertia:', kmeans.inertia);

// Predict cluster for new data
const newData = [[1.2, 1.9], [5.5, 8.2], [9.5, 2.8]];
const predictions = kmeans.predict(newData);
console.log('Predictions:', predictions);  // [0, 1, 2]

Finding Optimal K (Elbow Method):

// Test different numbers of clusters
const inertias = [];

for (let k = 2; k <= 10; k++) {
  const model = new KMeans({ nClusters: k, nInit: 10 });
  model.fit(X);
  inertias.push(model.inertia);
  console.log(`K=${k}: Inertia = ${model.inertia.toFixed(2)}`);
}

// Plot inertias to find "elbow"

4. Preprocessing

4.1 StandardScaler

Z-score normalization (mean=0, std=1).

import { StandardScaler } from 'simple-ml';

// Create scaler
const scaler = new StandardScaler({
  withMean: true,  // Center data (default: true)
  withStd: true    // Scale to unit variance (default: true)
});

// Data to scale
const X = [
  [1, 2],
  [3, 4],
  [5, 6],
  [7, 8]
];

// Fit and transform
const XScaled = scaler.fitTransform(X);
console.log('Scaled data:', XScaled);

// Access learned parameters
console.log('Mean:', scaler.mean);      // [4, 5]
console.log('Std:', scaler.std);        // [2.236, 2.236]

// Transform new data
const newData = [[9, 10]];
const newScaled = scaler.transform(newData);
console.log('New data scaled:', newScaled);

// Inverse transform
const original = scaler.inverseTransform(XScaled);
console.log('Original data:', original);

4.2 MinMaxScaler

Scale features to a specified range.

import { MinMaxScaler } from 'simple-ml';

// Create scaler
const scaler = new MinMaxScaler({
  featureRange: [0, 1]  // Target range (default: [0, 1])
});

const X = [
  [1, 2],
  [3, 4],
  [5, 6]
];

// Fit and transform
const XScaled = scaler.fitTransform(X);
console.log('Scaled to [0,1]:', XScaled);
// [[0, 0], [0.5, 0.5], [1, 1]]

// Access min and max
console.log('Data min:', scaler.dataMin);
console.log('Data max:', scaler.dataMax);

// Transform new data
const newScaled = scaler.transform([[7, 8]]);
console.log('New data scaled:', newScaled);  // [[1.5, 1.5]]

// Inverse transform
const original = scaler.inverseTransform(XScaled);
console.log('Original:', original);

Custom Range Example:

// Scale to [-1, 1]
const customScaler = new MinMaxScaler({ featureRange: [-1, 1] });
const scaled = customScaler.fitTransform(X);
console.log('Scaled to [-1,1]:', scaled);

4.3 RobustScaler

Robust scaling using median and IQR (resistant to outliers).

import { RobustScaler } from 'simple-ml';

// Create scaler
const scaler = new RobustScaler({
  withCentering: true,    // Center using median
  withScaling: true,      // Scale using IQR
  quantileRange: [25, 75] // IQR percentiles
});

// Data with outliers
const X = [
  [1, 2],
  [2, 3],
  [3, 4],
  [100, 200]  // Outlier
];

// Fit and transform
const XScaled = scaler.fitTransform(X);
console.log('Robust scaled:', XScaled);

// Access median and IQR
console.log('Median:', scaler.center);
console.log('IQR:', scaler.scale);

// Transform and inverse
const newData = [[50, 60]];
const scaled = scaler.transform(newData);
const original = scaler.inverseTransform(scaled);

4.4 LabelEncoder

Encode categorical labels to integers.

import { LabelEncoder } from 'simple-ml';

// Create encoder
const le = new LabelEncoder();

// Categorical labels
const labels = ['cat', 'dog', 'cat', 'bird', 'dog', 'cat'];

// Fit and transform
const encoded = le.fitTransform(labels);
console.log('Encoded:', encoded);  // [0, 1, 0, 2, 1, 0]

// Access classes
console.log('Classes:', le.classes);  // ['cat', 'dog', 'bird']

// Transform new labels
const newEncoded = le.transform(['dog', 'cat']);
console.log('New encoded:', newEncoded);  // [1, 0]

// Inverse transform
const original = le.inverseTransform(encoded);
console.log('Original labels:', original);

Numeric Labels Example:

// Works with numbers too
const numLabels = [10, 20, 10, 30, 20];
const encoded = le.fitTransform(numLabels);
console.log('Encoded numbers:', encoded);  // [0, 1, 0, 2, 1]
console.log('Classes:', le.classes);       // [10, 20, 30]

4.5 OneHotEncoder

Convert categorical features to binary columns.

import { OneHotEncoder } from 'simple-ml';

// Create encoder
const ohe = new OneHotEncoder({
  dropFirst: false,  // Drop first column to avoid multicollinearity
  sparse: false      // Return dense array
});

// Categorical data
const X = [
  ['red'],
  ['blue'],
  ['green'],
  ['red'],
  ['blue']
];

// Fit and transform
const encoded = ohe.fitTransform(X);
console.log('One-hot encoded:', encoded);
// [[1, 0, 0],
//  [0, 1, 0],
//  [0, 0, 1],
//  [1, 0, 0],
//  [0, 1, 0]]

// Access categories
console.log('Categories:', ohe.categories);
// [['red', 'blue', 'green']]

// Transform new data
const newEncoded = ohe.transform([['green'], ['red']]);
console.log('New encoded:', newEncoded);

// Inverse transform
const original = ohe.inverseTransform(encoded);
console.log('Original:', original);

Multiple Features Example:

// Multiple categorical features
const X = [
  ['red', 'small'],
  ['blue', 'large'],
  ['red', 'large']
];

const encoder = new OneHotEncoder();
const encoded = encoder.fitTransform(X);
console.log('Encoded (multiple features):', encoded);
console.log('Categories:', encoder.categories);

4.6 SimpleImputer

Fill missing values in dataset.

import { SimpleImputer } from 'simple-ml';

// Create imputer
const imputer = new SimpleImputer({
  strategy: 'mean',    // 'mean', 'median', 'most_frequent', 'constant'
  fillValue: null      // Value for 'constant' strategy
});

// Data with missing values (null)
const X = [
  [1, 2],
  [null, 3],
  [7, null],
  [4, 5]
];

// Fit and transform
const XFilled = imputer.fitTransform(X);
console.log('Filled data:', XFilled);
// [[1, 2],
//  [4, 3],  // null filled with mean (4)
//  [7, 3.33],  // null filled with mean (3.33)
//  [4, 5]]

// Access learned statistics
console.log('Statistics:', imputer.statistics);

// Transform new data
const newData = [[null, 6]];
const filled = imputer.transform(newData);
console.log('New data filled:', filled);

Different Strategies:

// Median strategy
const medianImputer = new SimpleImputer({ strategy: 'median' });
const filled1 = medianImputer.fitTransform(X);

// Most frequent strategy
const modeImputer = new SimpleImputer({ strategy: 'most_frequent' });
const filled2 = modeImputer.fitTransform(X);

// Constant strategy
const constantImputer = new SimpleImputer({
  strategy: 'constant',
  fillValue: 0
});
const filled3 = constantImputer.fitTransform(X);
console.log('Filled with zeros:', filled3);

5. Metrics

5.1 Regression Metrics

import {
  meanAbsoluteError,
  meanSquaredError,
  rootMeanSquaredError,
  r2Score,
  meanAbsolutePercentageError,
  maxError
} from 'simple-ml';

const yTrue = [3, -0.5, 2, 7];
const yPred = [2.5, 0.0, 2, 8];

// Mean Absolute Error
const mae = meanAbsoluteError(yTrue, yPred);
console.log('MAE:', mae);  // 0.5

// Mean Squared Error
const mse = meanSquaredError(yTrue, yPred);
console.log('MSE:', mse);  // 0.375

// Root Mean Squared Error
const rmse = rootMeanSquaredError(yTrue, yPred);
console.log('RMSE:', rmse);  // 0.612

// R² Score (coefficient of determination)
const r2 = r2Score(yTrue, yPred);
console.log('R² Score:', r2);  // 0.948

// Mean Absolute Percentage Error
const mape = meanAbsolutePercentageError(yTrue, yPred);
console.log('MAPE:', mape);

// Maximum Error
const maxErr = maxError(yTrue, yPred);
console.log('Max Error:', maxErr);  // 1.0

5.2 Classification Metrics

import {
  accuracy,
  precision,
  recall,
  f1Score,
  confusionMatrix,
  classificationReport
} from 'simple-ml';

const yTrue = [0, 1, 2, 0, 1, 2, 0, 1, 2];
const yPred = [0, 2, 1, 0, 1, 2, 0, 2, 2];

// Accuracy
const acc = accuracy(yTrue, yPred);
console.log('Accuracy:', acc);  // 0.667

// Precision (per class or average)
const prec = precision(yTrue, yPred, { average: 'macro' });
console.log('Precision:', prec);

// Recall
const rec = recall(yTrue, yPred, { average: 'macro' });
console.log('Recall:', rec);

// F1 Score
const f1 = f1Score(yTrue, yPred, { average: 'macro' });
console.log('F1 Score:', f1);

// Confusion Matrix
const cm = confusionMatrix(yTrue, yPred);
console.log('Confusion Matrix:', cm);
// [[3, 0, 0],
//  [0, 1, 2],
//  [0, 1, 2]]

// Classification Report (comprehensive)
const report = classificationReport(yTrue, yPred);
console.log('Classification Report:', report);

Binary Classification Metrics:

const yTrue = [0, 0, 1, 1, 0, 1, 1, 0];
const yPred = [0, 1, 1, 1, 0, 0, 1, 0];

console.log('Binary Accuracy:', accuracy(yTrue, yPred));
console.log('Binary Precision:', precision(yTrue, yPred));
console.log('Binary Recall:', recall(yTrue, yPred));
console.log('Binary F1:', f1Score(yTrue, yPred));

5.3 Clustering Metrics

import {
  silhouetteScore,
  daviesBouldinScore,
  calinskiHarabaszScore
} from 'simple-ml';

const X = [
  [1, 2], [1.5, 1.8], [1, 0.6],
  [5, 8], [6, 9], [5, 7],
  [10, 2], [9, 3], [10, 3]
];
const labels = [0, 0, 0, 1, 1, 1, 2, 2, 2];

// Silhouette Score (higher is better, range: [-1, 1])
const silhouette = silhouetteScore(X, labels);
console.log('Silhouette Score:', silhouette);

// Davies-Bouldin Score (lower is better)
const db = daviesBouldinScore(X, labels);
console.log('Davies-Bouldin Score:', db);

// Calinski-Harabasz Score (higher is better)
const ch = calinskiHarabaszScore(X, labels);
console.log('Calinski-Harabasz Score:', ch);

6. Model Selection

6.1 Train-Test Split

import { trainTestSplit } from 'simple-ml';

const X = [[1], [2], [3], [4], [5], [6], [7], [8]];
const y = [2, 4, 6, 8, 10, 12, 14, 16];

// Basic split
const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y, {
  testSize: 0.25,      // 25% for testing (default: 0.25)
  shuffle: true,       // Shuffle before splitting (default: true)
  randomState: 42      // Random seed for reproducibility
});

console.log('Training samples:', XTrain.length);  // 6
console.log('Test samples:', XTest.length);       // 2

console.log('X Train:', XTrain);
console.log('y Train:', yTrain);
console.log('X Test:', XTest);
console.log('y Test:', yTest);

Stratified Split (for classification):

const X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7]];
const y = [0, 0, 0, 1, 1, 1];

const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y, {
  testSize: 0.33,
  stratify: y,  // Maintain class proportions
  randomState: 42
});

console.log('Train labels:', yTrain);
console.log('Test labels:', yTest);

6.2 Cross-Validation

import { crossValidate } from 'simple-ml';
import { LinearRegression } from 'simple-ml';

const X = [[1], [2], [3], [4], [5], [6], [7], [8]];
const y = [2, 4, 6, 8, 10, 12, 14, 16];

// 5-fold cross-validation
const model = new LinearRegression();
const cvResults = crossValidate(model, X, y, {
  cv: 5,              // Number of folds (default: 5)
  scoring: 'r2',      // Scoring method
  shuffle: true,
  randomState: 42
});

console.log('Fold Scores:', cvResults.scores);
console.log('Mean Score:', cvResults.meanScore);
console.log('Std Score:', cvResults.stdScore);
console.log('Fit Times:', cvResults.fitTimes);
console.log('Score Times:', cvResults.scoreTimes);

Cross-Validation for Classification:

import { KNeighborsClassifier } from 'simple-ml';

const X = [
  [1, 2], [2, 3], [3, 1], [6, 5], [7, 7], [8, 6]
];
const y = [0, 0, 0, 1, 1, 1];

const knn = new KNeighborsClassifier({ k: 3 });
const results = crossValidate(knn, X, y, {
  cv: 3,
  scoring: 'accuracy'
});

console.log('CV Accuracy:', results.meanScore);

7. Complete Pipeline Example

Combining multiple components in a machine learning pipeline:

import {
  LinearRegression,
  KNeighborsClassifier,
  StandardScaler,
  LabelEncoder,
  trainTestSplit,
  crossValidate,
  r2Score,
  accuracy,
  meanSquaredError
} from 'simple-ml';

// ========== REGRESSION PIPELINE ==========

// 1. Load and prepare data
const XRaw = [[1, 100], [2, 200], [3, 300], [4, 400], [5, 500]];
const yReg = [10, 20, 30, 40, 50];

// 2. Scale features
const scaler = new StandardScaler();
const XScaled = scaler.fitTransform(XRaw);

// 3. Split data
const { XTrain, XTest, yTrain, yTest } = trainTestSplit(
  XScaled, yReg, { testSize: 0.2 }
);

// 4. Train model
const regModel = new LinearRegression();
regModel.fit(XTrain, yTrain);

// 5. Evaluate
const yPred = regModel.predict(XTest);
console.log('Test R²:', r2Score(yTest, yPred));
console.log('Test MSE:', meanSquaredError(yTest, yPred));

// 6. Cross-validation
const cvResults = crossValidate(regModel, XScaled, yReg, { cv: 5 });
console.log('CV R²:', cvResults.meanScore);

// ========== CLASSIFICATION PIPELINE ==========

// 1. Prepare classification data
const XClass = [
  [5.1, 3.5], [4.9, 3.0], [7.0, 3.2],
  [6.4, 3.2], [5.9, 3.0], [6.3, 2.5]
];
const yClass = ['setosa', 'setosa', 'versicolor', 'versicolor', 'virginica', 'virginica'];

// 2. Encode labels
const labelEncoder = new LabelEncoder();
const yEncoded = labelEncoder.fitTransform(yClass);

// 3. Scale features
const classScaler = new StandardScaler();
const XClassScaled = classScaler.fitTransform(XClass);

// 4. Split data
const split = trainTestSplit(XClassScaled, yEncoded, { testSize: 0.33 });

// 5. Train classifier
const classifier = new KNeighborsClassifier({ k: 3 });
classifier.fit(split.XTrain, split.yTrain);

// 6. Evaluate
const predictions = classifier.predict(split.XTest);
console.log('Test Accuracy:', accuracy(split.yTest, predictions));

// 7. Predict new sample
const newSample = [[6.0, 3.0]];
const newScaled = classScaler.transform(newSample);
const pred = classifier.predict(newScaled);
const predLabel = labelEncoder.inverseTransform(pred);
console.log('Prediction for new sample:', predLabel);

⚠️ Best Practices & Important Notes

Dataset Size Recommendations

For reliable model evaluation:

  • Minimum: 10-20 samples total
  • Recommended: 50+ samples for simple models, 100+ for complex models
  • Test set: Use at least 5-10 samples in the test set

Why? With very small test sets (1-2 samples), metrics like R² may not be meaningful:

// ❌ Too small - test set has only 1 sample
const X = [[1], [2], [3], [4], [5]];
const y = [2, 4, 6, 8, 10];
const split = trainTestSplit(X, y, { testSize: 0.2 }); // Only 1 test sample!

// ✅ Better - reasonable test set size
const X = [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]];
const y = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20];
const split = trainTestSplit(X, y, { testSize: 0.2 }); // 2 test samples

// ✅ Recommended - adequate test set
const X = Array.from({length: 50}, (_, i) => [i + 1]);
const y = X.map(x => 2 * x[0] + Math.random());
const split = trainTestSplit(X, y, { testSize: 0.2 }); // 10 test samples

Feature Scaling

Always scale features when using distance-based algorithms (KNN) or gradient descent (Logistic Regression):

import { StandardScaler, KNeighborsClassifier } from 'simple-ml';

// Features with different scales
const X = [[1, 1000], [2, 2000], [3, 3000]];

// Scale before training
const scaler = new StandardScaler();
const XScaled = scaler.fitTransform(X);

const model = new KNeighborsClassifier({ k: 3 });
model.fit(XScaled, y);

Handling Missing Values

Use SimpleImputer before training any model:

import { SimpleImputer } from 'simple-ml';

const X = [[1, 2], [null, 3], [7, null], [4, 5]];

const imputer = new SimpleImputer({ strategy: 'mean' });
const XFilled = imputer.fitTransform(X);

// Now safe to use with any model
model.fit(XFilled, y);

Cross-Validation vs Train-Test Split

Use cross-validation for small datasets:

// For small datasets (<100 samples), use cross-validation
const cvResults = crossValidate(model, X, y, { cv: 5 });
console.log('Mean CV Score:', cvResults.meanScore);

// For large datasets, train-test split is faster
const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y);
model.fit(XTrain, yTrain);
console.log('Test Score:', model.score(XTest, yTest));

🌐 Browser Usage

Interactive Demos

# Start local server
npm run dev

# Access demos:
# - http://localhost:5000/examples/browser-example.html
# - http://localhost:5000/examples/simple-demo.html

Browser Example

<!DOCTYPE html>
<html>
<head>
    <title>Simple-ML Demo</title>
</head>
<body>
    <h1>Machine Learning in the Browser</h1>
    <button onclick="runDemo()">Run Demo</button>
    <pre id="output"></pre>

    <script src="https://unpkg.com/simple-ml/dist/simple-ml.umd.js"></script>

    <script>
        function runDemo() {
            const {
                LinearRegression,
                StandardScaler,
                trainTestSplit,
                r2Score
            } = SimpleML;

            // Data
            const X = [[1], [2], [3], [4], [5], [6]];
            const y = [2, 4, 6, 8, 10, 12];

            // Scale
            const scaler = new StandardScaler();
            const XScaled = scaler.fitTransform(X);

            // Split
            const { XTrain, XTest, yTrain, yTest } =
                trainTestSplit(XScaled, y, { testSize: 0.3 });

            // Train
            const model = new LinearRegression();
            model.fit(XTrain, yTrain);

            // Predict
            const predictions = model.predict(XTest);

            // Evaluate
            const score = r2Score(yTest, predictions);

            // Display
            document.getElementById('output').textContent = `
R² Score: ${score.toFixed(4)}
Predictions: ${predictions.map(p => p.toFixed(2))}
Actual: ${yTest}
            `;
        }
    </script>
</body>
</html>

🛠️ Build Formats

After running npm run build:

  • dist/simple-ml.umd.js - Browser global SimpleML
  • dist/simple-ml.modern.js - ES2017+ for modern browsers
  • dist/simple-ml.module.js - ES Modules for bundlers
  • dist/simple-ml.cjs - CommonJS for Node.js

🧪 Testing

npm test       # Run all tests
npm run build  # Build for browser
npm run dev    # Watch mode

📝 License

MIT


🤝 Contributing

Contributions welcome! Please:

  • Maintain consistent API patterns
  • Add comprehensive input validation
  • Include tests for new features
  • Follow existing code style

Built with ❤️ in pure JavaScript