simple-ml
v1.0.4
Published
A simple, robust JavaScript machine learning library with regression, classification, clustering, and preprocessing algorithms
Maintainers
Readme
Simple-ML
A simple, robust JavaScript machine learning library built from scratch with no external dependencies. Simple-ML provides easy-to-use implementations of popular machine learning algorithms for regression, classification, clustering, and data preprocessing.
🎯 Philosophy
- Simplicity: Intuitive and consistent API
- Robustness: Rigorous input validation and edge case handling
- Performance: Optimized pure JavaScript implementations
- Modularity: Clear organizational structure
📦 Installation
Node.js / NPM
npm install simple-mlBrowser (via CDN)
<!-- Via unpkg CDN -->
<script src="https://unpkg.com/simple-ml/dist/simple-ml.umd.js"></script>
<script>
const { LinearRegression } = SimpleML;
const model = new LinearRegression();
</script>ES Modules (Modern Browsers)
<script type="module">
import { LinearRegression } from 'https://unpkg.com/simple-ml/dist/simple-ml.modern.js';
const model = new LinearRegression();
</script>🚀 Quick Start
import { LinearRegression, trainTestSplit } from 'simple-ml';
// Prepare your data (use at least 10-20 samples for reliable results)
const X = [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]];
const y = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20];
// Split into training and test sets
const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y, { testSize: 0.2 });
// Create and train model
const model = new LinearRegression();
model.fit(XTrain, yTrain);
// Make predictions
const predictions = model.predict(XTest);
// Evaluate model
const score = model.score(XTest, yTest);
console.log('R² Score:', score); // Close to 1.0 for perfect fit📚 Complete API Reference with Examples
1. Regression Algorithms
1.1 Linear Regression
Ordinary Least Squares Linear Regression.
import { LinearRegression } from 'simple-ml';
// Create model with options
const model = new LinearRegression({
fitIntercept: true, // Whether to calculate intercept (default: true)
normalize: false // Whether to normalize features (default: false)
});
// Training data
const X = [[1], [2], [3], [4], [5]];
const y = [2, 4, 6, 8, 10];
// Fit the model
model.fit(X, y);
// Access model parameters
console.log('Coefficients:', model.coefficients); // [2.0]
console.log('Intercept:', model.intercept); // 0.0
// Make predictions
const predictions = model.predict([[6], [7]]);
console.log('Predictions:', predictions); // [12, 14]
// Evaluate model (R² score)
const score = model.score(X, y);
console.log('R² Score:', score); // 1.0 (perfect fit)Multiple Features Example:
// Multiple features
const X = [
[1, 2],
[2, 3],
[3, 4],
[4, 5]
];
const y = [5, 8, 11, 14];
const model = new LinearRegression();
model.fit(X, y);
console.log('Coefficients:', model.coefficients); // [1.0, 2.0]
console.log('Intercept:', model.intercept);
const pred = model.predict([[5, 6]]);
console.log('Prediction:', pred); // [17]1.2 Ridge Regression
Linear Regression with L2 regularization.
import { RidgeRegression } from 'simple-ml';
// Create Ridge model
const ridge = new RidgeRegression({
alpha: 1.0, // Regularization strength (default: 1.0)
fitIntercept: true,
normalize: false
});
// Training data
const X = [[1], [2], [3], [4], [5]];
const y = [2.1, 3.9, 6.2, 7.8, 10.1];
// Fit model
ridge.fit(X, y);
console.log('Coefficients:', ridge.coefficients);
console.log('Intercept:', ridge.intercept);
// Make predictions
const predictions = ridge.predict([[6], [7]]);
console.log('Predictions:', predictions);
// Evaluate
const score = ridge.score(X, y);
console.log('R² Score:', score);Tuning Alpha Example:
// Compare different alpha values
const alphas = [0.1, 1.0, 10.0, 100.0];
alphas.forEach(alpha => {
const model = new RidgeRegression({ alpha });
model.fit(X, y);
const score = model.score(X, y);
console.log(`Alpha ${alpha}: R² = ${score.toFixed(4)}`);
});1.3 Lasso Regression
Linear Regression with L1 regularization (feature selection).
import { LassoRegression } from 'simple-ml';
// Create Lasso model
const lasso = new LassoRegression({
alpha: 0.1, // Regularization strength (default: 1.0)
maxIterations: 1000, // Max iterations for coordinate descent
tolerance: 1e-4, // Convergence tolerance
fitIntercept: true
});
// Training data with correlated features
const X = [
[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5]
];
const y = [2, 4, 6, 8, 10];
// Fit model
lasso.fit(X, y);
console.log('Coefficients:', lasso.coefficients);
console.log('Intercept:', lasso.intercept);
// Lasso may zero out some coefficients
console.log('Non-zero features:',
lasso.coefficients.filter(c => Math.abs(c) > 1e-10).length
);
// Predictions
const predictions = lasso.predict([[6, 6]]);
console.log('Predictions:', predictions);1.4 Logistic Regression
Binary and multiclass classification using logistic function.
import { LogisticRegression } from 'simple-ml';
// Binary classification
const logReg = new LogisticRegression({
learningRate: 0.1, // Learning rate for gradient descent
maxIterations: 1000, // Maximum iterations
tolerance: 1e-4, // Convergence tolerance
penalty: 'l2', // Regularization: 'l2', 'l1', or 'none'
C: 1.0, // Inverse regularization strength
multiClass: 'ovr' // 'ovr' (one-vs-rest) or 'multinomial' (softmax)
});
// Binary classification data
const X = [
[1, 2], [2, 3], [3, 1], // Class 0
[6, 5], [7, 7], [8, 6] // Class 1
];
const y = [0, 0, 0, 1, 1, 1];
// Fit model
logReg.fit(X, y);
console.log('Coefficients:', logReg.coefficients);
console.log('Intercept:', logReg.intercept);
// Predict classes
const predictions = logReg.predict([[2, 2], [7, 6]]);
console.log('Predictions:', predictions); // [0, 1]
// Predict probabilities
const probabilities = logReg.predictProba([[2, 2], [7, 6]]);
console.log('Probabilities:', probabilities);
// [[0.95, 0.05], [0.02, 0.98]]
// Evaluate
const score = logReg.score(X, y);
console.log('Accuracy:', score);Multiclass Example:
// Multiclass classification
const X = [
[1, 1], [1, 2], [2, 1], // Class 0
[5, 5], [5, 6], [6, 5], // Class 1
[9, 9], [9, 10], [10, 9] // Class 2
];
const y = [0, 0, 0, 1, 1, 1, 2, 2, 2];
// Use 'multinomial' for ordered/continuous classes
const multiLogReg = new LogisticRegression({
multiClass: 'multinomial', // Better for this type of data
learningRate: 0.1,
maxIterations: 1000
});
multiLogReg.fit(X, y);
const pred = multiLogReg.predict([[2, 2], [6, 6], [10, 10]]);
console.log('Multiclass Predictions:', pred); // [0, 1, 2]
const proba = multiLogReg.predictProba([[2, 2], [6, 6], [10, 10]]);
console.log('Class Probabilities:', proba);
// [[0.803, 0.195, 0.002], // → class 0
// [0.007, 0.708, 0.285], // → class 1
// [0.000, 0.057, 0.943]] // → class 2
console.log('Accuracy:', multiLogReg.score(X, y)); // 1.0Choosing multiClass Mode:
'ovr'(One-vs-Rest, default): Fast and works well for independent categories- Use for: Animals (cat, dog, bird), Topics (sports, politics, tech)
- Each class vs all others is trained separately
'multinomial'(Softmax): More robust, handles ordered/continuous classes better- Use for: Ratings (low, medium, high), Sizes (S, M, L, XL)
- Trains all classes simultaneously with softmax function
- Recommended when classes have natural ordering
// Example: 'ovr' for independent categories
const categories = new LogisticRegression({ multiClass: 'ovr' });
const X_cat = [[1, 0], [0, 1], [1, 1]];
const y_cat = ['cat', 'dog', 'bird'];
categories.fit(X_cat, y_cat);
// Example: 'multinomial' for ordered classes
const ratings = new LogisticRegression({ multiClass: 'multinomial' });
const X_rating = [[1, 2], [5, 6], [9, 10]];
const y_rating = ['low', 'medium', 'high'];
ratings.fit(X_rating, y_rating);1.5 Polynomial Regression
Regression with polynomial features.
import { PolynomialRegression } from 'simple-ml';
// Create polynomial model
const poly = new PolynomialRegression({
degree: 2, // Polynomial degree (default: 2)
fitIntercept: true,
normalize: false
});
// Non-linear data
const X = [[1], [2], [3], [4], [5]];
const y = [1, 4, 9, 16, 25]; // y = x²
// Fit model
poly.fit(X, y);
console.log('Coefficients:', poly.coefficients);
console.log('Intercept:', poly.intercept);
// Predictions
const predictions = poly.predict([[6], [7]]);
console.log('Predictions:', predictions); // [36, 49]
// Evaluate
const score = poly.score(X, y);
console.log('R² Score:', score); // Close to 1.0Higher Degree Example:
// Cubic polynomial
const cubicPoly = new PolynomialRegression({ degree: 3 });
const X = [[1], [2], [3], [4]];
const y = [1, 8, 27, 64]; // y = x³
cubicPoly.fit(X, y);
const pred = cubicPoly.predict([[5]]);
console.log('Prediction for x=5:', pred); // [125]2. Classification Algorithms
2.1 K-Nearest Neighbors (KNN)
Non-parametric classification based on nearest neighbors.
import { KNeighborsClassifier } from 'simple-ml';
// Create KNN classifier
const knn = new KNeighborsClassifier({
k: 3, // Number of neighbors (default: 5)
weights: 'uniform' // 'uniform' or 'distance'
});
// Training data
const X = [
[1, 2], [2, 3], [3, 1], // Class 'A'
[6, 5], [7, 7], [8, 6] // Class 'B'
];
const y = ['A', 'A', 'A', 'B', 'B', 'B'];
// Fit model (stores training data)
knn.fit(X, y);
// Predict
const predictions = knn.predict([[2, 2], [7, 6]]);
console.log('Predictions:', predictions); // ['A', 'B']
// Predict with probabilities
const probabilities = knn.predictProba([[2, 2]]);
console.log('Probabilities:', probabilities);
// Evaluate
const score = knn.score(X, y);
console.log('Accuracy:', score);Distance-Weighted KNN:
// Use distance weighting
const weightedKnn = new KNeighborsClassifier({
k: 5,
weights: 'distance' // Closer neighbors have more influence
});
weightedKnn.fit(X, y);
const pred = weightedKnn.predict([[4, 4]]);
console.log('Distance-weighted prediction:', pred);Finding Optimal K:
import { trainTestSplit, accuracy } from 'simple-ml';
const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y, { testSize: 0.3 });
// Test different k values
for (let k = 1; k <= 10; k++) {
const model = new KNeighborsClassifier({ k });
model.fit(XTrain, yTrain);
const pred = model.predict(XTest);
const acc = accuracy(yTest, pred);
console.log(`k=${k}: Accuracy = ${acc.toFixed(3)}`);
}2.2 Gaussian Naive Bayes
Probabilistic classifier assuming Gaussian distribution.
import { GaussianNaiveBayes } from 'simple-ml';
// Create model
const gnb = new GaussianNaiveBayes({
priors: null // Class priors (default: null = uniform)
});
// Training data
const X = [
[1, 2], [2, 3], [3, 4], // Class 0
[6, 7], [7, 8], [8, 9] // Class 1
];
const y = [0, 0, 0, 1, 1, 1];
// Fit model
gnb.fit(X, y);
// Access learned parameters
console.log('Class Priors:', gnb.classPrior);
console.log('Means:', gnb.theta);
console.log('Variances:', gnb.sigma);
// Predict
const predictions = gnb.predict([[2, 3], [7, 8]]);
console.log('Predictions:', predictions); // [0, 1]
// Predict probabilities
const probabilities = gnb.predictProba([[4, 5]]);
console.log('Probabilities:', probabilities);
// Evaluate
const score = gnb.score(X, y);
console.log('Accuracy:', score);2.3 Multinomial Naive Bayes
Naive Bayes for discrete/count features (e.g., text classification).
import { MultinomialNaiveBayes } from 'simple-ml';
// Create model
const mnb = new MultinomialNaiveBayes({
alpha: 1.0 // Laplace smoothing parameter
});
// Training data (word counts)
const X = [
[2, 1, 0], // Document 1: "spam" words
[1, 1, 0], // Document 2: "spam" words
[0, 0, 2], // Document 3: "ham" words
[0, 1, 2] // Document 4: "ham" words
];
const y = ['spam', 'spam', 'ham', 'ham'];
// Fit model
mnb.fit(X, y);
// Predict
const predictions = mnb.predict([[2, 0, 1], [0, 0, 3]]);
console.log('Predictions:', predictions);
// Predict probabilities
const probabilities = mnb.predictProba([[1, 1, 1]]);
console.log('Probabilities:', probabilities);2.4 Decision Tree Classifier
Tree-based classifier with interpretable rules.
import { DecisionTreeClassifier } from 'simple-ml';
// Create decision tree
const dt = new DecisionTreeClassifier({
criterion: 'gini', // 'gini' or 'entropy'
maxDepth: 5, // Maximum tree depth (default: Infinity)
minSamplesSplit: 2, // Min samples to split a node
minSamplesLeaf: 1, // Min samples in leaf node
maxFeatures: null // Max features to consider
});
// Training data
const X = [
[2.5, 2.5], [3, 3], [2, 3], // Class 0
[7, 7], [8, 6], [7, 8], // Class 1
[3, 8], [4, 7], [3, 7] // Class 2
];
const y = [0, 0, 0, 1, 1, 1, 2, 2, 2];
// Fit model
dt.fit(X, y);
// Predict
const predictions = dt.predict([[2.5, 2.5], [7.5, 7], [3.5, 7.5]]);
console.log('Predictions:', predictions); // [0, 1, 2]
// Evaluate
const score = dt.score(X, y);
console.log('Accuracy:', score);
// Get feature importances (if available)
if (dt.featureImportances) {
console.log('Feature Importances:', dt.featureImportances);
}Using Entropy:
const entropyDT = new DecisionTreeClassifier({
criterion: 'entropy',
maxDepth: 3
});
entropyDT.fit(X, y);
const pred = entropyDT.predict([[5, 5]]);
console.log('Entropy-based prediction:', pred);3. Clustering
3.1 K-Means Clustering
Centroid-based clustering algorithm.
import { KMeans } from 'simple-ml';
// Create K-Means model
const kmeans = new KMeans({
nClusters: 3, // Number of clusters (required)
maxIterations: 300, // Max iterations (default: 300)
tolerance: 1e-4, // Convergence tolerance
initMethod: 'kmeans++', // 'kmeans++' or 'random'
nInit: 10, // Number of initializations
randomState: 42 // Random seed for reproducibility
});
// Data to cluster
const X = [
[1, 2], [1.5, 1.8], [1, 0.6], // Cluster 1
[5, 8], [6, 9], [5, 7], // Cluster 2
[10, 2], [9, 3], [10, 3] // Cluster 3
];
// Fit model
kmeans.fit(X);
// Get cluster labels
console.log('Labels:', kmeans.labels);
// [0, 0, 0, 1, 1, 1, 2, 2, 2]
// Get cluster centroids
console.log('Centroids:', kmeans.centroids);
// [[1.17, 1.47], [5.33, 8.0], [9.67, 2.67]]
// Get inertia (sum of squared distances)
console.log('Inertia:', kmeans.inertia);
// Predict cluster for new data
const newData = [[1.2, 1.9], [5.5, 8.2], [9.5, 2.8]];
const predictions = kmeans.predict(newData);
console.log('Predictions:', predictions); // [0, 1, 2]Finding Optimal K (Elbow Method):
// Test different numbers of clusters
const inertias = [];
for (let k = 2; k <= 10; k++) {
const model = new KMeans({ nClusters: k, nInit: 10 });
model.fit(X);
inertias.push(model.inertia);
console.log(`K=${k}: Inertia = ${model.inertia.toFixed(2)}`);
}
// Plot inertias to find "elbow"4. Preprocessing
4.1 StandardScaler
Z-score normalization (mean=0, std=1).
import { StandardScaler } from 'simple-ml';
// Create scaler
const scaler = new StandardScaler({
withMean: true, // Center data (default: true)
withStd: true // Scale to unit variance (default: true)
});
// Data to scale
const X = [
[1, 2],
[3, 4],
[5, 6],
[7, 8]
];
// Fit and transform
const XScaled = scaler.fitTransform(X);
console.log('Scaled data:', XScaled);
// Access learned parameters
console.log('Mean:', scaler.mean); // [4, 5]
console.log('Std:', scaler.std); // [2.236, 2.236]
// Transform new data
const newData = [[9, 10]];
const newScaled = scaler.transform(newData);
console.log('New data scaled:', newScaled);
// Inverse transform
const original = scaler.inverseTransform(XScaled);
console.log('Original data:', original);4.2 MinMaxScaler
Scale features to a specified range.
import { MinMaxScaler } from 'simple-ml';
// Create scaler
const scaler = new MinMaxScaler({
featureRange: [0, 1] // Target range (default: [0, 1])
});
const X = [
[1, 2],
[3, 4],
[5, 6]
];
// Fit and transform
const XScaled = scaler.fitTransform(X);
console.log('Scaled to [0,1]:', XScaled);
// [[0, 0], [0.5, 0.5], [1, 1]]
// Access min and max
console.log('Data min:', scaler.dataMin);
console.log('Data max:', scaler.dataMax);
// Transform new data
const newScaled = scaler.transform([[7, 8]]);
console.log('New data scaled:', newScaled); // [[1.5, 1.5]]
// Inverse transform
const original = scaler.inverseTransform(XScaled);
console.log('Original:', original);Custom Range Example:
// Scale to [-1, 1]
const customScaler = new MinMaxScaler({ featureRange: [-1, 1] });
const scaled = customScaler.fitTransform(X);
console.log('Scaled to [-1,1]:', scaled);4.3 RobustScaler
Robust scaling using median and IQR (resistant to outliers).
import { RobustScaler } from 'simple-ml';
// Create scaler
const scaler = new RobustScaler({
withCentering: true, // Center using median
withScaling: true, // Scale using IQR
quantileRange: [25, 75] // IQR percentiles
});
// Data with outliers
const X = [
[1, 2],
[2, 3],
[3, 4],
[100, 200] // Outlier
];
// Fit and transform
const XScaled = scaler.fitTransform(X);
console.log('Robust scaled:', XScaled);
// Access median and IQR
console.log('Median:', scaler.center);
console.log('IQR:', scaler.scale);
// Transform and inverse
const newData = [[50, 60]];
const scaled = scaler.transform(newData);
const original = scaler.inverseTransform(scaled);4.4 LabelEncoder
Encode categorical labels to integers.
import { LabelEncoder } from 'simple-ml';
// Create encoder
const le = new LabelEncoder();
// Categorical labels
const labels = ['cat', 'dog', 'cat', 'bird', 'dog', 'cat'];
// Fit and transform
const encoded = le.fitTransform(labels);
console.log('Encoded:', encoded); // [0, 1, 0, 2, 1, 0]
// Access classes
console.log('Classes:', le.classes); // ['cat', 'dog', 'bird']
// Transform new labels
const newEncoded = le.transform(['dog', 'cat']);
console.log('New encoded:', newEncoded); // [1, 0]
// Inverse transform
const original = le.inverseTransform(encoded);
console.log('Original labels:', original);Numeric Labels Example:
// Works with numbers too
const numLabels = [10, 20, 10, 30, 20];
const encoded = le.fitTransform(numLabels);
console.log('Encoded numbers:', encoded); // [0, 1, 0, 2, 1]
console.log('Classes:', le.classes); // [10, 20, 30]4.5 OneHotEncoder
Convert categorical features to binary columns.
import { OneHotEncoder } from 'simple-ml';
// Create encoder
const ohe = new OneHotEncoder({
dropFirst: false, // Drop first column to avoid multicollinearity
sparse: false // Return dense array
});
// Categorical data
const X = [
['red'],
['blue'],
['green'],
['red'],
['blue']
];
// Fit and transform
const encoded = ohe.fitTransform(X);
console.log('One-hot encoded:', encoded);
// [[1, 0, 0],
// [0, 1, 0],
// [0, 0, 1],
// [1, 0, 0],
// [0, 1, 0]]
// Access categories
console.log('Categories:', ohe.categories);
// [['red', 'blue', 'green']]
// Transform new data
const newEncoded = ohe.transform([['green'], ['red']]);
console.log('New encoded:', newEncoded);
// Inverse transform
const original = ohe.inverseTransform(encoded);
console.log('Original:', original);Multiple Features Example:
// Multiple categorical features
const X = [
['red', 'small'],
['blue', 'large'],
['red', 'large']
];
const encoder = new OneHotEncoder();
const encoded = encoder.fitTransform(X);
console.log('Encoded (multiple features):', encoded);
console.log('Categories:', encoder.categories);4.6 SimpleImputer
Fill missing values in dataset.
import { SimpleImputer } from 'simple-ml';
// Create imputer
const imputer = new SimpleImputer({
strategy: 'mean', // 'mean', 'median', 'most_frequent', 'constant'
fillValue: null // Value for 'constant' strategy
});
// Data with missing values (null)
const X = [
[1, 2],
[null, 3],
[7, null],
[4, 5]
];
// Fit and transform
const XFilled = imputer.fitTransform(X);
console.log('Filled data:', XFilled);
// [[1, 2],
// [4, 3], // null filled with mean (4)
// [7, 3.33], // null filled with mean (3.33)
// [4, 5]]
// Access learned statistics
console.log('Statistics:', imputer.statistics);
// Transform new data
const newData = [[null, 6]];
const filled = imputer.transform(newData);
console.log('New data filled:', filled);Different Strategies:
// Median strategy
const medianImputer = new SimpleImputer({ strategy: 'median' });
const filled1 = medianImputer.fitTransform(X);
// Most frequent strategy
const modeImputer = new SimpleImputer({ strategy: 'most_frequent' });
const filled2 = modeImputer.fitTransform(X);
// Constant strategy
const constantImputer = new SimpleImputer({
strategy: 'constant',
fillValue: 0
});
const filled3 = constantImputer.fitTransform(X);
console.log('Filled with zeros:', filled3);5. Metrics
5.1 Regression Metrics
import {
meanAbsoluteError,
meanSquaredError,
rootMeanSquaredError,
r2Score,
meanAbsolutePercentageError,
maxError
} from 'simple-ml';
const yTrue = [3, -0.5, 2, 7];
const yPred = [2.5, 0.0, 2, 8];
// Mean Absolute Error
const mae = meanAbsoluteError(yTrue, yPred);
console.log('MAE:', mae); // 0.5
// Mean Squared Error
const mse = meanSquaredError(yTrue, yPred);
console.log('MSE:', mse); // 0.375
// Root Mean Squared Error
const rmse = rootMeanSquaredError(yTrue, yPred);
console.log('RMSE:', rmse); // 0.612
// R² Score (coefficient of determination)
const r2 = r2Score(yTrue, yPred);
console.log('R² Score:', r2); // 0.948
// Mean Absolute Percentage Error
const mape = meanAbsolutePercentageError(yTrue, yPred);
console.log('MAPE:', mape);
// Maximum Error
const maxErr = maxError(yTrue, yPred);
console.log('Max Error:', maxErr); // 1.05.2 Classification Metrics
import {
accuracy,
precision,
recall,
f1Score,
confusionMatrix,
classificationReport
} from 'simple-ml';
const yTrue = [0, 1, 2, 0, 1, 2, 0, 1, 2];
const yPred = [0, 2, 1, 0, 1, 2, 0, 2, 2];
// Accuracy
const acc = accuracy(yTrue, yPred);
console.log('Accuracy:', acc); // 0.667
// Precision (per class or average)
const prec = precision(yTrue, yPred, { average: 'macro' });
console.log('Precision:', prec);
// Recall
const rec = recall(yTrue, yPred, { average: 'macro' });
console.log('Recall:', rec);
// F1 Score
const f1 = f1Score(yTrue, yPred, { average: 'macro' });
console.log('F1 Score:', f1);
// Confusion Matrix
const cm = confusionMatrix(yTrue, yPred);
console.log('Confusion Matrix:', cm);
// [[3, 0, 0],
// [0, 1, 2],
// [0, 1, 2]]
// Classification Report (comprehensive)
const report = classificationReport(yTrue, yPred);
console.log('Classification Report:', report);Binary Classification Metrics:
const yTrue = [0, 0, 1, 1, 0, 1, 1, 0];
const yPred = [0, 1, 1, 1, 0, 0, 1, 0];
console.log('Binary Accuracy:', accuracy(yTrue, yPred));
console.log('Binary Precision:', precision(yTrue, yPred));
console.log('Binary Recall:', recall(yTrue, yPred));
console.log('Binary F1:', f1Score(yTrue, yPred));5.3 Clustering Metrics
import {
silhouetteScore,
daviesBouldinScore,
calinskiHarabaszScore
} from 'simple-ml';
const X = [
[1, 2], [1.5, 1.8], [1, 0.6],
[5, 8], [6, 9], [5, 7],
[10, 2], [9, 3], [10, 3]
];
const labels = [0, 0, 0, 1, 1, 1, 2, 2, 2];
// Silhouette Score (higher is better, range: [-1, 1])
const silhouette = silhouetteScore(X, labels);
console.log('Silhouette Score:', silhouette);
// Davies-Bouldin Score (lower is better)
const db = daviesBouldinScore(X, labels);
console.log('Davies-Bouldin Score:', db);
// Calinski-Harabasz Score (higher is better)
const ch = calinskiHarabaszScore(X, labels);
console.log('Calinski-Harabasz Score:', ch);6. Model Selection
6.1 Train-Test Split
import { trainTestSplit } from 'simple-ml';
const X = [[1], [2], [3], [4], [5], [6], [7], [8]];
const y = [2, 4, 6, 8, 10, 12, 14, 16];
// Basic split
const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y, {
testSize: 0.25, // 25% for testing (default: 0.25)
shuffle: true, // Shuffle before splitting (default: true)
randomState: 42 // Random seed for reproducibility
});
console.log('Training samples:', XTrain.length); // 6
console.log('Test samples:', XTest.length); // 2
console.log('X Train:', XTrain);
console.log('y Train:', yTrain);
console.log('X Test:', XTest);
console.log('y Test:', yTest);Stratified Split (for classification):
const X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7]];
const y = [0, 0, 0, 1, 1, 1];
const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y, {
testSize: 0.33,
stratify: y, // Maintain class proportions
randomState: 42
});
console.log('Train labels:', yTrain);
console.log('Test labels:', yTest);6.2 Cross-Validation
import { crossValidate } from 'simple-ml';
import { LinearRegression } from 'simple-ml';
const X = [[1], [2], [3], [4], [5], [6], [7], [8]];
const y = [2, 4, 6, 8, 10, 12, 14, 16];
// 5-fold cross-validation
const model = new LinearRegression();
const cvResults = crossValidate(model, X, y, {
cv: 5, // Number of folds (default: 5)
scoring: 'r2', // Scoring method
shuffle: true,
randomState: 42
});
console.log('Fold Scores:', cvResults.scores);
console.log('Mean Score:', cvResults.meanScore);
console.log('Std Score:', cvResults.stdScore);
console.log('Fit Times:', cvResults.fitTimes);
console.log('Score Times:', cvResults.scoreTimes);Cross-Validation for Classification:
import { KNeighborsClassifier } from 'simple-ml';
const X = [
[1, 2], [2, 3], [3, 1], [6, 5], [7, 7], [8, 6]
];
const y = [0, 0, 0, 1, 1, 1];
const knn = new KNeighborsClassifier({ k: 3 });
const results = crossValidate(knn, X, y, {
cv: 3,
scoring: 'accuracy'
});
console.log('CV Accuracy:', results.meanScore);7. Complete Pipeline Example
Combining multiple components in a machine learning pipeline:
import {
LinearRegression,
KNeighborsClassifier,
StandardScaler,
LabelEncoder,
trainTestSplit,
crossValidate,
r2Score,
accuracy,
meanSquaredError
} from 'simple-ml';
// ========== REGRESSION PIPELINE ==========
// 1. Load and prepare data
const XRaw = [[1, 100], [2, 200], [3, 300], [4, 400], [5, 500]];
const yReg = [10, 20, 30, 40, 50];
// 2. Scale features
const scaler = new StandardScaler();
const XScaled = scaler.fitTransform(XRaw);
// 3. Split data
const { XTrain, XTest, yTrain, yTest } = trainTestSplit(
XScaled, yReg, { testSize: 0.2 }
);
// 4. Train model
const regModel = new LinearRegression();
regModel.fit(XTrain, yTrain);
// 5. Evaluate
const yPred = regModel.predict(XTest);
console.log('Test R²:', r2Score(yTest, yPred));
console.log('Test MSE:', meanSquaredError(yTest, yPred));
// 6. Cross-validation
const cvResults = crossValidate(regModel, XScaled, yReg, { cv: 5 });
console.log('CV R²:', cvResults.meanScore);
// ========== CLASSIFICATION PIPELINE ==========
// 1. Prepare classification data
const XClass = [
[5.1, 3.5], [4.9, 3.0], [7.0, 3.2],
[6.4, 3.2], [5.9, 3.0], [6.3, 2.5]
];
const yClass = ['setosa', 'setosa', 'versicolor', 'versicolor', 'virginica', 'virginica'];
// 2. Encode labels
const labelEncoder = new LabelEncoder();
const yEncoded = labelEncoder.fitTransform(yClass);
// 3. Scale features
const classScaler = new StandardScaler();
const XClassScaled = classScaler.fitTransform(XClass);
// 4. Split data
const split = trainTestSplit(XClassScaled, yEncoded, { testSize: 0.33 });
// 5. Train classifier
const classifier = new KNeighborsClassifier({ k: 3 });
classifier.fit(split.XTrain, split.yTrain);
// 6. Evaluate
const predictions = classifier.predict(split.XTest);
console.log('Test Accuracy:', accuracy(split.yTest, predictions));
// 7. Predict new sample
const newSample = [[6.0, 3.0]];
const newScaled = classScaler.transform(newSample);
const pred = classifier.predict(newScaled);
const predLabel = labelEncoder.inverseTransform(pred);
console.log('Prediction for new sample:', predLabel);⚠️ Best Practices & Important Notes
Dataset Size Recommendations
For reliable model evaluation:
- Minimum: 10-20 samples total
- Recommended: 50+ samples for simple models, 100+ for complex models
- Test set: Use at least 5-10 samples in the test set
Why? With very small test sets (1-2 samples), metrics like R² may not be meaningful:
// ❌ Too small - test set has only 1 sample
const X = [[1], [2], [3], [4], [5]];
const y = [2, 4, 6, 8, 10];
const split = trainTestSplit(X, y, { testSize: 0.2 }); // Only 1 test sample!
// ✅ Better - reasonable test set size
const X = [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]];
const y = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20];
const split = trainTestSplit(X, y, { testSize: 0.2 }); // 2 test samples
// ✅ Recommended - adequate test set
const X = Array.from({length: 50}, (_, i) => [i + 1]);
const y = X.map(x => 2 * x[0] + Math.random());
const split = trainTestSplit(X, y, { testSize: 0.2 }); // 10 test samplesFeature Scaling
Always scale features when using distance-based algorithms (KNN) or gradient descent (Logistic Regression):
import { StandardScaler, KNeighborsClassifier } from 'simple-ml';
// Features with different scales
const X = [[1, 1000], [2, 2000], [3, 3000]];
// Scale before training
const scaler = new StandardScaler();
const XScaled = scaler.fitTransform(X);
const model = new KNeighborsClassifier({ k: 3 });
model.fit(XScaled, y);Handling Missing Values
Use SimpleImputer before training any model:
import { SimpleImputer } from 'simple-ml';
const X = [[1, 2], [null, 3], [7, null], [4, 5]];
const imputer = new SimpleImputer({ strategy: 'mean' });
const XFilled = imputer.fitTransform(X);
// Now safe to use with any model
model.fit(XFilled, y);Cross-Validation vs Train-Test Split
Use cross-validation for small datasets:
// For small datasets (<100 samples), use cross-validation
const cvResults = crossValidate(model, X, y, { cv: 5 });
console.log('Mean CV Score:', cvResults.meanScore);
// For large datasets, train-test split is faster
const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y);
model.fit(XTrain, yTrain);
console.log('Test Score:', model.score(XTest, yTest));🌐 Browser Usage
Interactive Demos
# Start local server
npm run dev
# Access demos:
# - http://localhost:5000/examples/browser-example.html
# - http://localhost:5000/examples/simple-demo.htmlBrowser Example
<!DOCTYPE html>
<html>
<head>
<title>Simple-ML Demo</title>
</head>
<body>
<h1>Machine Learning in the Browser</h1>
<button onclick="runDemo()">Run Demo</button>
<pre id="output"></pre>
<script src="https://unpkg.com/simple-ml/dist/simple-ml.umd.js"></script>
<script>
function runDemo() {
const {
LinearRegression,
StandardScaler,
trainTestSplit,
r2Score
} = SimpleML;
// Data
const X = [[1], [2], [3], [4], [5], [6]];
const y = [2, 4, 6, 8, 10, 12];
// Scale
const scaler = new StandardScaler();
const XScaled = scaler.fitTransform(X);
// Split
const { XTrain, XTest, yTrain, yTest } =
trainTestSplit(XScaled, y, { testSize: 0.3 });
// Train
const model = new LinearRegression();
model.fit(XTrain, yTrain);
// Predict
const predictions = model.predict(XTest);
// Evaluate
const score = r2Score(yTest, predictions);
// Display
document.getElementById('output').textContent = `
R² Score: ${score.toFixed(4)}
Predictions: ${predictions.map(p => p.toFixed(2))}
Actual: ${yTest}
`;
}
</script>
</body>
</html>🛠️ Build Formats
After running npm run build:
dist/simple-ml.umd.js- Browser globalSimpleMLdist/simple-ml.modern.js- ES2017+ for modern browsersdist/simple-ml.module.js- ES Modules for bundlersdist/simple-ml.cjs- CommonJS for Node.js
🧪 Testing
npm test # Run all tests
npm run build # Build for browser
npm run dev # Watch mode📝 License
MIT
🤝 Contributing
Contributions welcome! Please:
- Maintain consistent API patterns
- Add comprehensive input validation
- Include tests for new features
- Follow existing code style
Built with ❤️ in pure JavaScript
