simple-ml

v1.0.4

Published

4 months ago

A simple, robust JavaScript machine learning library with regression, classification, clustering, and preprocessing algorithms

0High
0Medium
0Low

vbfs

machine-learning ml regression classification clustering preprocessing statistics linear-regression logistic-regression knn naive-bayes k-means decision-tree

Simple-ML

A simple, robust JavaScript machine learning library built from scratch with no external dependencies. Simple-ML provides easy-to-use implementations of popular machine learning algorithms for regression, classification, clustering, and data preprocessing.

🎯 Philosophy

Simplicity: Intuitive and consistent API
Robustness: Rigorous input validation and edge case handling
Performance: Optimized pure JavaScript implementations
Modularity: Clear organizational structure

📦 Installation

Node.js / NPM

npm install simple-ml

Browser (via CDN)

<!-- Via unpkg CDN -->
<script src="https://unpkg.com/simple-ml/dist/simple-ml.umd.js"></script>
<script>
  const { LinearRegression } = SimpleML;
  const model = new LinearRegression();
</script>

ES Modules (Modern Browsers)

<script type="module">
  import { LinearRegression } from 'https://unpkg.com/simple-ml/dist/simple-ml.modern.js';
  const model = new LinearRegression();
</script>

🚀 Quick Start

import { LinearRegression, trainTestSplit } from 'simple-ml';

// Prepare your data (use at least 10-20 samples for reliable results)
const X = [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]];
const y = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20];

// Split into training and test sets
const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y, { testSize: 0.2 });

// Create and train model
const model = new LinearRegression();
model.fit(XTrain, yTrain);

// Make predictions
const predictions = model.predict(XTest);

// Evaluate model
const score = model.score(XTest, yTest);
console.log('R² Score:', score);  // Close to 1.0 for perfect fit

📚 Complete API Reference with Examples

1. Regression Algorithms

1.1 Linear Regression

Ordinary Least Squares Linear Regression.

import { LinearRegression } from 'simple-ml';

// Create model with options
const model = new LinearRegression({
  fitIntercept: true,  // Whether to calculate intercept (default: true)
  normalize: false      // Whether to normalize features (default: false)
});

// Training data
const X = [[1], [2], [3], [4], [5]];
const y = [2, 4, 6, 8, 10];

// Fit the model
model.fit(X, y);

// Access model parameters
console.log('Coefficients:', model.coefficients);  // [2.0]
console.log('Intercept:', model.intercept);        // 0.0

// Make predictions
const predictions = model.predict([[6], [7]]);
console.log('Predictions:', predictions);  // [12, 14]

// Evaluate model (R² score)
const score = model.score(X, y);
console.log('R² Score:', score);  // 1.0 (perfect fit)

Multiple Features Example:

// Multiple features
const X = [
  [1, 2],
  [2, 3],
  [3, 4],
  [4, 5]
];
const y = [5, 8, 11, 14];

const model = new LinearRegression();
model.fit(X, y);

console.log('Coefficients:', model.coefficients);  // [1.0, 2.0]
console.log('Intercept:', model.intercept);

const pred = model.predict([[5, 6]]);
console.log('Prediction:', pred);  // [17]

1.2 Ridge Regression

Linear Regression with L2 regularization.

import { RidgeRegression } from 'simple-ml';

// Create Ridge model
const ridge = new RidgeRegression({
  alpha: 1.0,          // Regularization strength (default: 1.0)
  fitIntercept: true,
  normalize: false
});

// Training data
const X = [[1], [2], [3], [4], [5]];
const y = [2.1, 3.9, 6.2, 7.8, 10.1];

// Fit model
ridge.fit(X, y);

console.log('Coefficients:', ridge.coefficients);
console.log('Intercept:', ridge.intercept);

// Make predictions
const predictions = ridge.predict([[6], [7]]);
console.log('Predictions:', predictions);

// Evaluate
const score = ridge.score(X, y);
console.log('R² Score:', score);

Tuning Alpha Example:

// Compare different alpha values
const alphas = [0.1, 1.0, 10.0, 100.0];

alphas.forEach(alpha => {
  const model = new RidgeRegression({ alpha });
  model.fit(X, y);
  const score = model.score(X, y);
  console.log(`Alpha ${alpha}: R² = ${score.toFixed(4)}`);
});

1.3 Lasso Regression

Linear Regression with L1 regularization (feature selection).

import { LassoRegression } from 'simple-ml';

// Create Lasso model
const lasso = new LassoRegression({
  alpha: 0.1,           // Regularization strength (default: 1.0)
  maxIterations: 1000,  // Max iterations for coordinate descent
  tolerance: 1e-4,      // Convergence tolerance
  fitIntercept: true
});

// Training data with correlated features
const X = [
  [1, 1],
  [2, 2],
  [3, 3],
  [4, 4],
  [5, 5]
];
const y = [2, 4, 6, 8, 10];

// Fit model
lasso.fit(X, y);

console.log('Coefficients:', lasso.coefficients);
console.log('Intercept:', lasso.intercept);

// Lasso may zero out some coefficients
console.log('Non-zero features:',
  lasso.coefficients.filter(c => Math.abs(c) > 1e-10).length
);

// Predictions
const predictions = lasso.predict([[6, 6]]);
console.log('Predictions:', predictions);

1.4 Logistic Regression

Binary and multiclass classification using logistic function.

import { LogisticRegression } from 'simple-ml';

// Binary classification
const logReg = new LogisticRegression({
  learningRate: 0.1,     // Learning rate for gradient descent
  maxIterations: 1000,   // Maximum iterations
  tolerance: 1e-4,       // Convergence tolerance
  penalty: 'l2',         // Regularization: 'l2', 'l1', or 'none'
  C: 1.0,                // Inverse regularization strength
  multiClass: 'ovr'      // 'ovr' (one-vs-rest) or 'multinomial' (softmax)
});

// Binary classification data
const X = [
  [1, 2], [2, 3], [3, 1],  // Class 0
  [6, 5], [7, 7], [8, 6]   // Class 1
];
const y = [0, 0, 0, 1, 1, 1];

// Fit model
logReg.fit(X, y);

console.log('Coefficients:', logReg.coefficients);
console.log('Intercept:', logReg.intercept);

// Predict classes
const predictions = logReg.predict([[2, 2], [7, 6]]);
console.log('Predictions:', predictions);  // [0, 1]

// Predict probabilities
const probabilities = logReg.predictProba([[2, 2], [7, 6]]);
console.log('Probabilities:', probabilities);
// [[0.95, 0.05], [0.02, 0.98]]

// Evaluate
const score = logReg.score(X, y);
console.log('Accuracy:', score);

Multiclass Example:

// Multiclass classification
const X = [
  [1, 1], [1, 2], [2, 1],  // Class 0
  [5, 5], [5, 6], [6, 5],  // Class 1
  [9, 9], [9, 10], [10, 9] // Class 2
];
const y = [0, 0, 0, 1, 1, 1, 2, 2, 2];

// Use 'multinomial' for ordered/continuous classes
const multiLogReg = new LogisticRegression({
  multiClass: 'multinomial',  // Better for this type of data
  learningRate: 0.1,
  maxIterations: 1000
});
multiLogReg.fit(X, y);

const pred = multiLogReg.predict([[2, 2], [6, 6], [10, 10]]);
console.log('Multiclass Predictions:', pred);  // [0, 1, 2]

const proba = multiLogReg.predictProba([[2, 2], [6, 6], [10, 10]]);
console.log('Class Probabilities:', proba);
// [[0.803, 0.195, 0.002],  // → class 0
//  [0.007, 0.708, 0.285],  // → class 1
//  [0.000, 0.057, 0.943]]  // → class 2

console.log('Accuracy:', multiLogReg.score(X, y));  // 1.0

Choosing multiClass Mode:

'ovr' (One-vs-Rest, default): Fast and works well for independent categories
- Use for: Animals (cat, dog, bird), Topics (sports, politics, tech)
- Each class vs all others is trained separately
'multinomial' (Softmax): More robust, handles ordered/continuous classes better
- Use for: Ratings (low, medium, high), Sizes (S, M, L, XL)
- Trains all classes simultaneously with softmax function
- Recommended when classes have natural ordering

// Example: 'ovr' for independent categories
const categories = new LogisticRegression({ multiClass: 'ovr' });
const X_cat = [[1, 0], [0, 1], [1, 1]];
const y_cat = ['cat', 'dog', 'bird'];
categories.fit(X_cat, y_cat);

// Example: 'multinomial' for ordered classes
const ratings = new LogisticRegression({ multiClass: 'multinomial' });
const X_rating = [[1, 2], [5, 6], [9, 10]];
const y_rating = ['low', 'medium', 'high'];
ratings.fit(X_rating, y_rating);

1.5 Polynomial Regression

Regression with polynomial features.

import { PolynomialRegression } from 'simple-ml';

// Create polynomial model
const poly = new PolynomialRegression({
  degree: 2,           // Polynomial degree (default: 2)
  fitIntercept: true,
  normalize: false
});

// Non-linear data
const X = [[1], [2], [3], [4], [5]];
const y = [1, 4, 9, 16, 25];  // y = x²

// Fit model
poly.fit(X, y);

console.log('Coefficients:', poly.coefficients);
console.log('Intercept:', poly.intercept);

// Predictions
const predictions = poly.predict([[6], [7]]);
console.log('Predictions:', predictions);  // [36, 49]

// Evaluate
const score = poly.score(X, y);
console.log('R² Score:', score);  // Close to 1.0

Higher Degree Example:

// Cubic polynomial
const cubicPoly = new PolynomialRegression({ degree: 3 });

const X = [[1], [2], [3], [4]];
const y = [1, 8, 27, 64];  // y = x³

cubicPoly.fit(X, y);
const pred = cubicPoly.predict([[5]]);
console.log('Prediction for x=5:', pred);  // [125]

2. Classification Algorithms

2.1 K-Nearest Neighbors (KNN)

Non-parametric classification based on nearest neighbors.

import { KNeighborsClassifier } from 'simple-ml';

// Create KNN classifier
const knn = new KNeighborsClassifier({
  k: 3,                 // Number of neighbors (default: 5)
  weights: 'uniform'    // 'uniform' or 'distance'
});

// Training data
const X = [
  [1, 2], [2, 3], [3, 1],  // Class 'A'
  [6, 5], [7, 7], [8, 6]   // Class 'B'
];
const y = ['A', 'A', 'A', 'B', 'B', 'B'];

// Fit model (stores training data)
knn.fit(X, y);

// Predict
const predictions = knn.predict([[2, 2], [7, 6]]);
console.log('Predictions:', predictions);  // ['A', 'B']

// Predict with probabilities
const probabilities = knn.predictProba([[2, 2]]);
console.log('Probabilities:', probabilities);

// Evaluate
const score = knn.score(X, y);
console.log('Accuracy:', score);

Distance-Weighted KNN:

// Use distance weighting
const weightedKnn = new KNeighborsClassifier({
  k: 5,
  weights: 'distance'  // Closer neighbors have more influence
});

weightedKnn.fit(X, y);
const pred = weightedKnn.predict([[4, 4]]);
console.log('Distance-weighted prediction:', pred);

Finding Optimal K:

import { trainTestSplit, accuracy } from 'simple-ml';

const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y, { testSize: 0.3 });

// Test different k values
for (let k = 1; k <= 10; k++) {
  const model = new KNeighborsClassifier({ k });
  model.fit(XTrain, yTrain);
  const pred = model.predict(XTest);
  const acc = accuracy(yTest, pred);
  console.log(`k=${k}: Accuracy = ${acc.toFixed(3)}`);
}

2.2 Gaussian Naive Bayes

Probabilistic classifier assuming Gaussian distribution.

import { GaussianNaiveBayes } from 'simple-ml';

// Create model
const gnb = new GaussianNaiveBayes({
  priors: null  // Class priors (default: null = uniform)
});

// Training data
const X = [
  [1, 2], [2, 3], [3, 4],  // Class 0
  [6, 7], [7, 8], [8, 9]   // Class 1
];
const y = [0, 0, 0, 1, 1, 1];

// Fit model
gnb.fit(X, y);

// Access learned parameters
console.log('Class Priors:', gnb.classPrior);
console.log('Means:', gnb.theta);
console.log('Variances:', gnb.sigma);

// Predict
const predictions = gnb.predict([[2, 3], [7, 8]]);
console.log('Predictions:', predictions);  // [0, 1]

// Predict probabilities
const probabilities = gnb.predictProba([[4, 5]]);
console.log('Probabilities:', probabilities);

// Evaluate
const score = gnb.score(X, y);
console.log('Accuracy:', score);

2.3 Multinomial Naive Bayes

Naive Bayes for discrete/count features (e.g., text classification).

import { MultinomialNaiveBayes } from 'simple-ml';

// Create model
const mnb = new MultinomialNaiveBayes({
  alpha: 1.0  // Laplace smoothing parameter
});

// Training data (word counts)
const X = [
  [2, 1, 0],  // Document 1: "spam" words
  [1, 1, 0],  // Document 2: "spam" words
  [0, 0, 2],  // Document 3: "ham" words
  [0, 1, 2]   // Document 4: "ham" words
];
const y = ['spam', 'spam', 'ham', 'ham'];

// Fit model
mnb.fit(X, y);

// Predict
const predictions = mnb.predict([[2, 0, 1], [0, 0, 3]]);
console.log('Predictions:', predictions);

// Predict probabilities
const probabilities = mnb.predictProba([[1, 1, 1]]);
console.log('Probabilities:', probabilities);

2.4 Decision Tree Classifier

Tree-based classifier with interpretable rules.

import { DecisionTreeClassifier } from 'simple-ml';

// Create decision tree
const dt = new DecisionTreeClassifier({
  criterion: 'gini',      // 'gini' or 'entropy'
  maxDepth: 5,            // Maximum tree depth (default: Infinity)
  minSamplesSplit: 2,     // Min samples to split a node
  minSamplesLeaf: 1,      // Min samples in leaf node
  maxFeatures: null       // Max features to consider
});

// Training data
const X = [
  [2.5, 2.5], [3, 3], [2, 3],    // Class 0
  [7, 7], [8, 6], [7, 8],         // Class 1
  [3, 8], [4, 7], [3, 7]          // Class 2
];
const y = [0, 0, 0, 1, 1, 1, 2, 2, 2];

// Fit model
dt.fit(X, y);

// Predict
const predictions = dt.predict([[2.5, 2.5], [7.5, 7], [3.5, 7.5]]);
console.log('Predictions:', predictions);  // [0, 1, 2]

// Evaluate
const score = dt.score(X, y);
console.log('Accuracy:', score);

// Get feature importances (if available)
if (dt.featureImportances) {
  console.log('Feature Importances:', dt.featureImportances);
}

Using Entropy:

const entropyDT = new DecisionTreeClassifier({
  criterion: 'entropy',
  maxDepth: 3
});

entropyDT.fit(X, y);
const pred = entropyDT.predict([[5, 5]]);
console.log('Entropy-based prediction:', pred);

3. Clustering

3.1 K-Means Clustering

Centroid-based clustering algorithm.

import { KMeans } from 'simple-ml';

// Create K-Means model
const kmeans = new KMeans({
  nClusters: 3,           // Number of clusters (required)
  maxIterations: 300,     // Max iterations (default: 300)
  tolerance: 1e-4,        // Convergence tolerance
  initMethod: 'kmeans++', // 'kmeans++' or 'random'
  nInit: 10,              // Number of initializations
  randomState: 42         // Random seed for reproducibility
});

// Data to cluster
const X = [
  [1, 2], [1.5, 1.8], [1, 0.6],     // Cluster 1
  [5, 8], [6, 9], [5, 7],            // Cluster 2
  [10, 2], [9, 3], [10, 3]           // Cluster 3
];

// Fit model
kmeans.fit(X);

// Get cluster labels
console.log('Labels:', kmeans.labels);
// [0, 0, 0, 1, 1, 1, 2, 2, 2]

// Get cluster centroids
console.log('Centroids:', kmeans.centroids);
// [[1.17, 1.47], [5.33, 8.0], [9.67, 2.67]]

// Get inertia (sum of squared distances)
console.log('Inertia:', kmeans.inertia);

// Predict cluster for new data
const newData = [[1.2, 1.9], [5.5, 8.2], [9.5, 2.8]];
const predictions = kmeans.predict(newData);
console.log('Predictions:', predictions);  // [0, 1, 2]

Finding Optimal K (Elbow Method):

// Test different numbers of clusters
const inertias = [];

for (let k = 2; k <= 10; k++) {
  const model = new KMeans({ nClusters: k, nInit: 10 });
  model.fit(X);
  inertias.push(model.inertia);
  console.log(`K=${k}: Inertia = ${model.inertia.toFixed(2)}`);
}

// Plot inertias to find "elbow"

4. Preprocessing

4.1 StandardScaler

Z-score normalization (mean=0, std=1).

import { StandardScaler } from 'simple-ml';

// Create scaler
const scaler = new StandardScaler({
  withMean: true,  // Center data (default: true)
  withStd: true    // Scale to unit variance (default: true)
});

// Data to scale
const X = [
  [1, 2],
  [3, 4],
  [5, 6],
  [7, 8]
];

// Fit and transform
const XScaled = scaler.fitTransform(X);
console.log('Scaled data:', XScaled);

// Access learned parameters
console.log('Mean:', scaler.mean);      // [4, 5]
console.log('Std:', scaler.std);        // [2.236, 2.236]

// Transform new data
const newData = [[9, 10]];
const newScaled = scaler.transform(newData);
console.log('New data scaled:', newScaled);

// Inverse transform
const original = scaler.inverseTransform(XScaled);
console.log('Original data:', original);

4.2 MinMaxScaler

Scale features to a specified range.

import { MinMaxScaler } from 'simple-ml';

// Create scaler
const scaler = new MinMaxScaler({
  featureRange: [0, 1]  // Target range (default: [0, 1])
});

const X = [
  [1, 2],
  [3, 4],
  [5, 6]
];

// Fit and transform
const XScaled = scaler.fitTransform(X);
console.log('Scaled to [0,1]:', XScaled);
// [[0, 0], [0.5, 0.5], [1, 1]]

// Access min and max
console.log('Data min:', scaler.dataMin);
console.log('Data max:', scaler.dataMax);

// Transform new data
const newScaled = scaler.transform([[7, 8]]);
console.log('New data scaled:', newScaled);  // [[1.5, 1.5]]

// Inverse transform
const original = scaler.inverseTransform(XScaled);
console.log('Original:', original);

Custom Range Example:

// Scale to [-1, 1]
const customScaler = new MinMaxScaler({ featureRange: [-1, 1] });
const scaled = customScaler.fitTransform(X);
console.log('Scaled to [-1,1]:', scaled);

4.3 RobustScaler

Robust scaling using median and IQR (resistant to outliers).

import { RobustScaler } from 'simple-ml';

// Create scaler
const scaler = new RobustScaler({
  withCentering: true,    // Center using median
  withScaling: true,      // Scale using IQR
  quantileRange: [25, 75] // IQR percentiles
});

// Data with outliers
const X = [
  [1, 2],
  [2, 3],
  [3, 4],
  [100, 200]  // Outlier
];

// Fit and transform
const XScaled = scaler.fitTransform(X);
console.log('Robust scaled:', XScaled);

// Access median and IQR
console.log('Median:', scaler.center);
console.log('IQR:', scaler.scale);

// Transform and inverse
const newData = [[50, 60]];
const scaled = scaler.transform(newData);
const original = scaler.inverseTransform(scaled);

4.4 LabelEncoder

Encode categorical labels to integers.

import { LabelEncoder } from 'simple-ml';

// Create encoder
const le = new LabelEncoder();

// Categorical labels
const labels = ['cat', 'dog', 'cat', 'bird', 'dog', 'cat'];

// Fit and transform
const encoded = le.fitTransform(labels);
console.log('Encoded:', encoded);  // [0, 1, 0, 2, 1, 0]

// Access classes
console.log('Classes:', le.classes);  // ['cat', 'dog', 'bird']

// Transform new labels
const newEncoded = le.transform(['dog', 'cat']);
console.log('New encoded:', newEncoded);  // [1, 0]

// Inverse transform
const original = le.inverseTransform(encoded);
console.log('Original labels:', original);

Numeric Labels Example:

// Works with numbers too
const numLabels = [10, 20, 10, 30, 20];
const encoded = le.fitTransform(numLabels);
console.log('Encoded numbers:', encoded);  // [0, 1, 0, 2, 1]
console.log('Classes:', le.classes);       // [10, 20, 30]

4.5 OneHotEncoder

Convert categorical features to binary columns.

import { OneHotEncoder } from 'simple-ml';

// Create encoder
const ohe = new OneHotEncoder({
  dropFirst: false,  // Drop first column to avoid multicollinearity
  sparse: false      // Return dense array
});

// Categorical data
const X = [
  ['red'],
  ['blue'],
  ['green'],
  ['red'],
  ['blue']
];

// Fit and transform
const encoded = ohe.fitTransform(X);
console.log('One-hot encoded:', encoded);
// [[1, 0, 0],
//  [0, 1, 0],
//  [0, 0, 1],
//  [1, 0, 0],
//  [0, 1, 0]]

// Access categories
console.log('Categories:', ohe.categories);
// [['red', 'blue', 'green']]

// Transform new data
const newEncoded = ohe.transform([['green'], ['red']]);
console.log('New encoded:', newEncoded);

// Inverse transform
const original = ohe.inverseTransform(encoded);
console.log('Original:', original);

Multiple Features Example:

// Multiple categorical features
const X = [
  ['red', 'small'],
  ['blue', 'large'],
  ['red', 'large']
];

const encoder = new OneHotEncoder();
const encoded = encoder.fitTransform(X);
console.log('Encoded (multiple features):', encoded);
console.log('Categories:', encoder.categories);

4.6 SimpleImputer

Fill missing values in dataset.

import { SimpleImputer } from 'simple-ml';

// Create imputer
const imputer = new SimpleImputer({
  strategy: 'mean',    // 'mean', 'median', 'most_frequent', 'constant'
  fillValue: null      // Value for 'constant' strategy
});

// Data with missing values (null)
const X = [
  [1, 2],
  [null, 3],
  [7, null],
  [4, 5]
];

// Fit and transform
const XFilled = imputer.fitTransform(X);
console.log('Filled data:', XFilled);
// [[1, 2],
//  [4, 3],  // null filled with mean (4)
//  [7, 3.33],  // null filled with mean (3.33)
//  [4, 5]]

// Access learned statistics
console.log('Statistics:', imputer.statistics);

// Transform new data
const newData = [[null, 6]];
const filled = imputer.transform(newData);
console.log('New data filled:', filled);

Different Strategies:

// Median strategy
const medianImputer = new SimpleImputer({ strategy: 'median' });
const filled1 = medianImputer.fitTransform(X);

// Most frequent strategy
const modeImputer = new SimpleImputer({ strategy: 'most_frequent' });
const filled2 = modeImputer.fitTransform(X);

// Constant strategy
const constantImputer = new SimpleImputer({
  strategy: 'constant',
  fillValue: 0
});
const filled3 = constantImputer.fitTransform(X);
console.log('Filled with zeros:', filled3);

5. Metrics

5.1 Regression Metrics

import {
  meanAbsoluteError,
  meanSquaredError,
  rootMeanSquaredError,
  r2Score,
  meanAbsolutePercentageError,
  maxError
} from 'simple-ml';

const yTrue = [3, -0.5, 2, 7];
const yPred = [2.5, 0.0, 2, 8];

// Mean Absolute Error
const mae = meanAbsoluteError(yTrue, yPred);
console.log('MAE:', mae);  // 0.5

// Mean Squared Error
const mse = meanSquaredError(yTrue, yPred);
console.log('MSE:', mse);  // 0.375

// Root Mean Squared Error
const rmse = rootMeanSquaredError(yTrue, yPred);
console.log('RMSE:', rmse);  // 0.612

// R² Score (coefficient of determination)
const r2 = r2Score(yTrue, yPred);
console.log('R² Score:', r2);  // 0.948

// Mean Absolute Percentage Error
const mape = meanAbsolutePercentageError(yTrue, yPred);
console.log('MAPE:', mape);

// Maximum Error
const maxErr = maxError(yTrue, yPred);
console.log('Max Error:', maxErr);  // 1.0

5.2 Classification Metrics

import {
  accuracy,
  precision,
  recall,
  f1Score,
  confusionMatrix,
  classificationReport
} from 'simple-ml';

const yTrue = [0, 1, 2, 0, 1, 2, 0, 1, 2];
const yPred = [0, 2, 1, 0, 1, 2, 0, 2, 2];

// Accuracy
const acc = accuracy(yTrue, yPred);
console.log('Accuracy:', acc);  // 0.667

// Precision (per class or average)
const prec = precision(yTrue, yPred, { average: 'macro' });
console.log('Precision:', prec);

// Recall
const rec = recall(yTrue, yPred, { average: 'macro' });
console.log('Recall:', rec);

// F1 Score
const f1 = f1Score(yTrue, yPred, { average: 'macro' });
console.log('F1 Score:', f1);

// Confusion Matrix
const cm = confusionMatrix(yTrue, yPred);
console.log('Confusion Matrix:', cm);
// [[3, 0, 0],
//  [0, 1, 2],
//  [0, 1, 2]]

// Classification Report (comprehensive)
const report = classificationReport(yTrue, yPred);
console.log('Classification Report:', report);

Binary Classification Metrics:

const yTrue = [0, 0, 1, 1, 0, 1, 1, 0];
const yPred = [0, 1, 1, 1, 0, 0, 1, 0];

console.log('Binary Accuracy:', accuracy(yTrue, yPred));
console.log('Binary Precision:', precision(yTrue, yPred));
console.log('Binary Recall:', recall(yTrue, yPred));
console.log('Binary F1:', f1Score(yTrue, yPred));

5.3 Clustering Metrics

import {
  silhouetteScore,
  daviesBouldinScore,
  calinskiHarabaszScore
} from 'simple-ml';

const X = [
  [1, 2], [1.5, 1.8], [1, 0.6],
  [5, 8], [6, 9], [5, 7],
  [10, 2], [9, 3], [10, 3]
];
const labels = [0, 0, 0, 1, 1, 1, 2, 2, 2];

// Silhouette Score (higher is better, range: [-1, 1])
const silhouette = silhouetteScore(X, labels);
console.log('Silhouette Score:', silhouette);

// Davies-Bouldin Score (lower is better)
const db = daviesBouldinScore(X, labels);
console.log('Davies-Bouldin Score:', db);

// Calinski-Harabasz Score (higher is better)
const ch = calinskiHarabaszScore(X, labels);
console.log('Calinski-Harabasz Score:', ch);

6. Model Selection

6.1 Train-Test Split

import { trainTestSplit } from 'simple-ml';

const X = [[1], [2], [3], [4], [5], [6], [7], [8]];
const y = [2, 4, 6, 8, 10, 12, 14, 16];

// Basic split
const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y, {
  testSize: 0.25,      // 25% for testing (default: 0.25)
  shuffle: true,       // Shuffle before splitting (default: true)
  randomState: 42      // Random seed for reproducibility
});

console.log('Training samples:', XTrain.length);  // 6
console.log('Test samples:', XTest.length);       // 2

console.log('X Train:', XTrain);
console.log('y Train:', yTrain);
console.log('X Test:', XTest);
console.log('y Test:', yTest);

Stratified Split (for classification):

const X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7]];
const y = [0, 0, 0, 1, 1, 1];

const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y, {
  testSize: 0.33,
  stratify: y,  // Maintain class proportions
  randomState: 42
});

console.log('Train labels:', yTrain);
console.log('Test labels:', yTest);

6.2 Cross-Validation

import { crossValidate } from 'simple-ml';
import { LinearRegression } from 'simple-ml';

const X = [[1], [2], [3], [4], [5], [6], [7], [8]];
const y = [2, 4, 6, 8, 10, 12, 14, 16];

// 5-fold cross-validation
const model = new LinearRegression();
const cvResults = crossValidate(model, X, y, {
  cv: 5,              // Number of folds (default: 5)
  scoring: 'r2',      // Scoring method
  shuffle: true,
  randomState: 42
});

console.log('Fold Scores:', cvResults.scores);
console.log('Mean Score:', cvResults.meanScore);
console.log('Std Score:', cvResults.stdScore);
console.log('Fit Times:', cvResults.fitTimes);
console.log('Score Times:', cvResults.scoreTimes);

Cross-Validation for Classification:

import { KNeighborsClassifier } from 'simple-ml';

const X = [
  [1, 2], [2, 3], [3, 1], [6, 5], [7, 7], [8, 6]
];
const y = [0, 0, 0, 1, 1, 1];

const knn = new KNeighborsClassifier({ k: 3 });
const results = crossValidate(knn, X, y, {
  cv: 3,
  scoring: 'accuracy'
});

console.log('CV Accuracy:', results.meanScore);

7. Complete Pipeline Example

Combining multiple components in a machine learning pipeline:

import {
  LinearRegression,
  KNeighborsClassifier,
  StandardScaler,
  LabelEncoder,
  trainTestSplit,
  crossValidate,
  r2Score,
  accuracy,
  meanSquaredError
} from 'simple-ml';

// ========== REGRESSION PIPELINE ==========

// 1. Load and prepare data
const XRaw = [[1, 100], [2, 200], [3, 300], [4, 400], [5, 500]];
const yReg = [10, 20, 30, 40, 50];

// 2. Scale features
const scaler = new StandardScaler();
const XScaled = scaler.fitTransform(XRaw);

// 3. Split data
const { XTrain, XTest, yTrain, yTest } = trainTestSplit(
  XScaled, yReg, { testSize: 0.2 }
);

// 4. Train model
const regModel = new LinearRegression();
regModel.fit(XTrain, yTrain);

// 5. Evaluate
const yPred = regModel.predict(XTest);
console.log('Test R²:', r2Score(yTest, yPred));
console.log('Test MSE:', meanSquaredError(yTest, yPred));

// 6. Cross-validation
const cvResults = crossValidate(regModel, XScaled, yReg, { cv: 5 });
console.log('CV R²:', cvResults.meanScore);

// ========== CLASSIFICATION PIPELINE ==========

// 1. Prepare classification data
const XClass = [
  [5.1, 3.5], [4.9, 3.0], [7.0, 3.2],
  [6.4, 3.2], [5.9, 3.0], [6.3, 2.5]
];
const yClass = ['setosa', 'setosa', 'versicolor', 'versicolor', 'virginica', 'virginica'];

// 2. Encode labels
const labelEncoder = new LabelEncoder();
const yEncoded = labelEncoder.fitTransform(yClass);

// 3. Scale features
const classScaler = new StandardScaler();
const XClassScaled = classScaler.fitTransform(XClass);

// 4. Split data
const split = trainTestSplit(XClassScaled, yEncoded, { testSize: 0.33 });

// 5. Train classifier
const classifier = new KNeighborsClassifier({ k: 3 });
classifier.fit(split.XTrain, split.yTrain);

// 6. Evaluate
const predictions = classifier.predict(split.XTest);
console.log('Test Accuracy:', accuracy(split.yTest, predictions));

// 7. Predict new sample
const newSample = [[6.0, 3.0]];
const newScaled = classScaler.transform(newSample);
const pred = classifier.predict(newScaled);
const predLabel = labelEncoder.inverseTransform(pred);
console.log('Prediction for new sample:', predLabel);

⚠️ Best Practices & Important Notes

Dataset Size Recommendations

For reliable model evaluation:

Minimum: 10-20 samples total
Recommended: 50+ samples for simple models, 100+ for complex models
Test set: Use at least 5-10 samples in the test set

Why? With very small test sets (1-2 samples), metrics like R² may not be meaningful:

// ❌ Too small - test set has only 1 sample
const X = [[1], [2], [3], [4], [5]];
const y = [2, 4, 6, 8, 10];
const split = trainTestSplit(X, y, { testSize: 0.2 }); // Only 1 test sample!

// ✅ Better - reasonable test set size
const X = [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]];
const y = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20];
const split = trainTestSplit(X, y, { testSize: 0.2 }); // 2 test samples

// ✅ Recommended - adequate test set
const X = Array.from({length: 50}, (_, i) => [i + 1]);
const y = X.map(x => 2 * x[0] + Math.random());
const split = trainTestSplit(X, y, { testSize: 0.2 }); // 10 test samples

Feature Scaling

Always scale features when using distance-based algorithms (KNN) or gradient descent (Logistic Regression):

import { StandardScaler, KNeighborsClassifier } from 'simple-ml';

// Features with different scales
const X = [[1, 1000], [2, 2000], [3, 3000]];

// Scale before training
const scaler = new StandardScaler();
const XScaled = scaler.fitTransform(X);

const model = new KNeighborsClassifier({ k: 3 });
model.fit(XScaled, y);

Handling Missing Values

Use SimpleImputer before training any model:

import { SimpleImputer } from 'simple-ml';

const X = [[1, 2], [null, 3], [7, null], [4, 5]];

const imputer = new SimpleImputer({ strategy: 'mean' });
const XFilled = imputer.fitTransform(X);

// Now safe to use with any model
model.fit(XFilled, y);

Cross-Validation vs Train-Test Split

Use cross-validation for small datasets:

// For small datasets (<100 samples), use cross-validation
const cvResults = crossValidate(model, X, y, { cv: 5 });
console.log('Mean CV Score:', cvResults.meanScore);

// For large datasets, train-test split is faster
const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y);
model.fit(XTrain, yTrain);
console.log('Test Score:', model.score(XTest, yTest));

🌐 Browser Usage

Interactive Demos

# Start local server
npm run dev

# Access demos:
# - http://localhost:5000/examples/browser-example.html
# - http://localhost:5000/examples/simple-demo.html

Browser Example

<!DOCTYPE html>
<html>
<head>
    <title>Simple-ML Demo</title>
</head>
<body>
    <h1>Machine Learning in the Browser</h1>
    <button onclick="runDemo()">Run Demo</button>
    <pre id="output"></pre>

    <script src="https://unpkg.com/simple-ml/dist/simple-ml.umd.js"></script>

    <script>
        function runDemo() {
            const {
                LinearRegression,
                StandardScaler,
                trainTestSplit,
                r2Score
            } = SimpleML;

            // Data
            const X = [[1], [2], [3], [4], [5], [6]];
            const y = [2, 4, 6, 8, 10, 12];

            // Scale
            const scaler = new StandardScaler();
            const XScaled = scaler.fitTransform(X);

            // Split
            const { XTrain, XTest, yTrain, yTest } =
                trainTestSplit(XScaled, y, { testSize: 0.3 });

            // Train
            const model = new LinearRegression();
            model.fit(XTrain, yTrain);

            // Predict
            const predictions = model.predict(XTest);

            // Evaluate
            const score = r2Score(yTest, predictions);

            // Display
            document.getElementById('output').textContent = `
R² Score: ${score.toFixed(4)}
Predictions: ${predictions.map(p => p.toFixed(2))}
Actual: ${yTest}
            `;
        }
    </script>
</body>
</html>

🛠️ Build Formats

After running npm run build:

dist/simple-ml.umd.js - Browser global SimpleML
dist/simple-ml.modern.js - ES2017+ for modern browsers
dist/simple-ml.module.js - ES Modules for bundlers
dist/simple-ml.cjs - CommonJS for Node.js

🧪 Testing

npm test       # Run all tests
npm run build  # Build for browser
npm run dev    # Watch mode

📝 License

MIT

🤝 Contributing

Contributions welcome! Please:

Maintain consistent API patterns
Add comprehensive input validation
Include tests for new features
Follow existing code style

Built with ❤️ in pure JavaScript