npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

bun-scikit

v0.1.26

Published

A scikit-learn-inspired machine learning library for Bun/TypeScript.

Readme

bun-scikit

CI Benchmark Snapshot

Scikit-learn-inspired machine learning for Bun + TypeScript, with native Zig acceleration for core training paths.

Install

bun add bun-scikit

Verify Zig Backend (Post-Install Smoke Test)

Create index.ts:

import { DecisionTreeClassifier, RandomForestClassifier } from "bun-scikit";

const X = [
  [0, 0],
  [0, 1],
  [1, 0],
  [2, 2],
  [2, 3],
  [3, 2],
];
const y = [0, 0, 0, 1, 1, 1];

const tree = new DecisionTreeClassifier({ maxDepth: 3, randomState: 42 });
tree.fit(X, y);
console.log("DecisionTree fit backend:", tree.fitBackend_, tree.fitBackendLibrary_);

const forest = new RandomForestClassifier({ nEstimators: 25, maxDepth: 4, randomState: 42 });
forest.fit(X, y);
console.log("RandomForest fit backend:", forest.fitBackend_, forest.fitBackendLibrary_);

Run:

bun run index.ts

Expected output includes fit backend: zig for both models.

Repo example: examples/zig-backend-smoke.ts

Quick Start

import {
  LinearRegression,
  LogisticRegression,
  StandardScaler,
  trainTestSplit,
  meanSquaredError,
  accuracyScore,
} from "bun-scikit";

const X = [[1], [2], [3], [4], [5], [6]];
const yReg = [3, 5, 7, 9, 11, 13];
const yCls = [0, 0, 0, 1, 1, 1];

const scaler = new StandardScaler();
const Xs = scaler.fitTransform(X);

const { XTrain, XTest, yTrain, yTest } = trainTestSplit(Xs, yReg, {
  testSize: 0.33,
  randomState: 42,
});

const reg = new LinearRegression({ solver: "normal" });
reg.fit(XTrain, yTrain);
console.log("MSE:", meanSquaredError(yTest, reg.predict(XTest)));

const clf = new LogisticRegression({
  solver: "gd",
  learningRate: 0.8,
  maxIter: 100,
  tolerance: 1e-5,
});
clf.fit(Xs, yCls);
console.log("Accuracy:", accuracyScore(yCls, clf.predict(Xs)));

Included APIs

  • Models: LinearRegression, LogisticRegression, LogisticRegressionCV, Ridge, RidgeClassifier, RidgeClassifierCV, Lasso, ElasticNet, RidgeCV, LassoCV, ElasticNetCV, BayesianRidge, ARDRegression, Perceptron, PassiveAggressiveClassifier, PassiveAggressiveRegressor, HuberRegressor, PoissonRegressor, GammaRegressor, QuantileRegressor, RANSACRegressor, TweedieRegressor, MultiTaskLasso, MultiTaskElasticNet, MultiTaskLassoCV, MultiTaskElasticNetCV, KNeighborsClassifier, DecisionTreeClassifier, RandomForestClassifier, plus additional parity models (LinearSVC, GaussianNB, SGDClassifier, SGDRegressor, regressors for tree/forest, OneClassSVM).
  • Clustering / decomposition / manifold: KMeans, MiniBatchKMeans, DBSCAN, AgglomerativeClustering, SpectralClustering, Birch, OPTICS, MeanShift, AffinityPropagation, PCA, IncrementalPCA, TruncatedSVD, FastICA, NMF, MiniBatchNMF, FactorAnalysis, KernelPCA, PLSSVD, PLSRegression, PLSCanonical, CCA, TSNE, Isomap, LocallyLinearEmbedding, MDS.
  • Anomaly detection: IsolationForest, LocalOutlierFactor, OneClassSVM.
  • Calibration / meta-estimators: CalibratedClassifierCV, VotingClassifier, VotingRegressor, StackingClassifier, StackingRegressor, BaggingClassifier.
  • Gaussian process / isotonic: GaussianProcessRegressor, GaussianProcessClassifier, IsotonicRegression.
  • Multioutput: MultiOutputClassifier, MultiOutputRegressor, ClassifierChain, RegressorChain.
  • Boosting: AdaBoostClassifier, GradientBoostingClassifier, GradientBoostingRegressor, HistGradientBoostingClassifier, HistGradientBoostingRegressor.
  • Baselines: DummyClassifier, DummyRegressor.
  • Neighbors / Bayes additions: BallTree, KDTree, KNeighborsTransformer, RadiusNeighborsTransformer, NearestCentroid, NeighborhoodComponentsAnalysis, BernoulliNB, MultinomialNB, ComplementNB, CategoricalNB.
  • Covariance additions: EllipticEnvelope, GraphicalLasso, GraphicalLassoCV.
  • Preprocessing: StandardScaler, MinMaxScaler, RobustScaler, MaxAbsScaler, Normalizer, Binarizer, LabelEncoder, LabelBinarizer, MultiLabelBinarizer, PolynomialFeatures, SimpleImputer, OneHotEncoder, FunctionTransformer, KernelCenterer, plus functional helpers (addDummyFeature, binarize, scale, minmaxScale, maxabsScale, robustScale).
  • Feature extraction: DictVectorizer, FeatureHasher.
  • Composition: Pipeline, ColumnTransformer, FeatureUnion.
  • Feature selection: VarianceThreshold, SelectKBest, SelectPercentile, SelectFromModel, RFE, RFECV, chi2, f_classif, f_regression, mutualInfoClassif, mutualInfoRegression.
  • Model selection: trainTestSplit, KFold, GroupKFold, GroupShuffleSplit, ShuffleSplit, StratifiedKFold, StratifiedGroupKFold, StratifiedShuffleSplit, RepeatedKFold, RepeatedStratifiedKFold, LeaveOneOut, LeavePOut, LeaveOneGroupOut, LeavePGroupsOut, PredefinedSplit, TimeSeriesSplit, crossValScore, crossValidate, crossValPredict, learningCurve, validationCurve, GridSearchCV, RandomizedSearchCV, ParameterGrid, ParameterSampler.
  • Metrics: regression/classification metrics plus ranking and curve helpers (auc, averagePrecisionScore, classLikelihoodRatios) and clustering metrics (silhouetteScore, calinskiHarabaszScore, daviesBouldinScore, adjustedRandScore).
  • Inspection: permutationImportance, partialDependence, permutationTestScore.

Scikit Parity Matrix

| Area | Status | | --- | --- | | Linear models | LinearRegression, LogisticRegression, SGDClassifier, SGDRegressor, LinearSVC | | Tree/ensemble | DecisionTreeClassifier, DecisionTreeRegressor, RandomForestClassifier, RandomForestRegressor, AdaBoostClassifier, GradientBoostingClassifier, GradientBoostingRegressor, HistGradientBoostingClassifier, HistGradientBoostingRegressor | | Neighbors / Bayes | KNeighborsClassifier, KNeighborsRegressor, GaussianNB | | Clustering | KMeans, MiniBatchKMeans, DBSCAN, AgglomerativeClustering, SpectralClustering, Birch, OPTICS, MeanShift, AffinityPropagation | | Decomposition / Manifold | PCA, IncrementalPCA, TruncatedSVD, FastICA, NMF, MiniBatchNMF, FactorAnalysis, KernelPCA, PLSSVD, PLSRegression, PLSCanonical, CCA, TSNE, Isomap, LocallyLinearEmbedding, MDS | | Anomaly detection | IsolationForest, LocalOutlierFactor, OneClassSVM | | Calibration / Meta | CalibratedClassifierCV, VotingClassifier, VotingRegressor, StackingClassifier, StackingRegressor, BaggingClassifier, BaggingRegressor, OneVsRestClassifier, OneVsOneClassifier | | Multioutput | MultiOutputClassifier, MultiOutputRegressor, ClassifierChain, RegressorChain | | Baselines | DummyClassifier, DummyRegressor | | Preprocessing | StandardScaler, MinMaxScaler, RobustScaler, MaxAbsScaler, Normalizer, Binarizer, LabelEncoder, PolynomialFeatures, SimpleImputer, OneHotEncoder | | Feature selection | VarianceThreshold, SelectKBest, SelectPercentile, SelectFromModel, RFE, RFECV, chi2, f_classif, f_regression, mutualInfoClassif, mutualInfoRegression | | Model selection | trainTestSplit, KFold, StratifiedKFold, StratifiedShuffleSplit, RepeatedKFold, RepeatedStratifiedKFold, crossValScore, crossValidate, crossValPredict, learningCurve, validationCurve, GridSearchCV, RandomizedSearchCV, ParameterGrid, ParameterSampler | | Metrics (regression) | meanSquaredError, meanAbsoluteError, r2Score, meanAbsolutePercentageError, explainedVarianceScore | | Metrics (classification) | accuracyScore, precisionScore, recallScore, f1Score, balancedAccuracyScore, matthewsCorrcoef, logLoss, brierScoreLoss, rocAucScore, confusionMatrix, classificationReport | | Metrics (clustering) | silhouetteScore, calinskiHarabaszScore, daviesBouldinScore, adjustedRandScore | | Inspection | permutationImportance, partialDependence, permutationTestScore |

Parity Coverage vs README

Parity status is aligned across runtime exports, matrix contracts, and docs coverage.

Source of required runtime surface: docs/parity-matrix.json (209 total runtime exports).

Tracked parity status (latest check):

  • API surface parity: 209 / 209 required exports (100%).
  • API/class/interface contract parity: 0 failures (100% pass).
  • API docs coverage: 470 / 470 exported symbols referenced in docs/api.md (100%).
  • sklearn snapshot parity gate metrics: 34 / 34 pass (100%).
  • Full sklearn public-symbol coverage (non-strict inventory gate): 203 / 454 (44.71%).

Artifacts:

  • bench/results/parity/parity-report-latest.md
  • bench/results/parity/parity-matrix-report.json
  • bench/results/parity/parity-sklearn-report.json
  • bench/results/parity/parity-full-report.json
  • docs/sklearn-public-api.json

Commands:

  • Regenerate sklearn inventory: bun run parity:inventory:generate
  • Check runtime export + contract matrix parity: bun run parity:matrix:check
  • Check docs symbol coverage: bun run docs:coverage:check
  • Check full symbol coverage (report only): bun run parity:full:check
  • Enforce strict full-symbol gate: PARITY_FULL_STRICT=1 bun run parity:full:check

Beyond the tracked matrix, remaining gaps to full scikit-learn-wide one-to-one behavior are mainly untracked modules and APIs, including:

  • Feature extraction families (for example text/image vectorizers and hashing/vectorization utilities).
  • Additional decomposition/manifold variants and solvers not currently exposed.
  • Additional covariance and gaussian-process families.
  • Additional inspection/display/reporting utilities and plotting-oriented helpers.
  • Dataset utilities and other sklearn ecosystem helpers outside this runtime-focused library surface.

Multiclass support is available for GaussianNB, KNeighborsClassifier, LogisticRegression, SGDClassifier, LinearSVC, DecisionTreeClassifier, RandomForestClassifier, VotingClassifier, StackingClassifier, BaggingClassifier, and CalibratedClassifierCV.

DecisionTreeClassifier and RandomForestClassifier now support multiclass native Zig fit/predict paths (up to 256 encoded classes) when BUN_SCIKIT_TREE_BACKEND=zig.

Native Runtime

  • Prebuilt binaries are bundled in the npm package for:
    • linux-x64
    • windows-x64
  • No bun pm trust step is required for standard install/use.
  • macOS prebuilt binaries are not published yet.

Optional env vars:

  • BUN_SCIKIT_NATIVE_BRIDGE=node-api|ffi
  • BUN_SCIKIT_NODE_ADDON=/absolute/path/to/bun_scikit_node_addon.node
  • BUN_SCIKIT_ZIG_LIB=/absolute/path/to/bun_scikit_kernels.<ext>
  • BUN_SCIKIT_TREE_BACKEND=zig|js (default is zig; set js to force JS tree/forest fallback)

Parity checks are enforced in CI using:

  • API parity matrix coverage (bun run parity:matrix:check)
  • sklearn snapshot fixtures with multi-seed drift checks (bun run parity:check)
  • versioned parity report artifacts (bun run parity:report)

Performance Snapshot

Benchmark snapshot source: bench/results/heart-ci-latest.json (generated in CI workflow Benchmark Snapshot). Dataset: test_data/heart.csv (1025 samples, 13 features, test fraction 0.2).

Summary

  • Regression: fit 1.235x, predict 1.977x (MSE delta 6.363e-14, R2 delta -2.540e-13)
  • Classification: fit 1.143x, predict 2.870x (accuracy delta 0.000e+0, F1 delta 1.106e-3)
  • DecisionTree (js-fast): fit 1.508x, predict 5.484x
  • RandomForest (js-fast): fit 2.610x, predict 2.679x
  • Tree backend matrix: DecisionTree zig/js fit 0.790x, predict 0.435x; RandomForest zig/js fit 1.164x, predict 1.884x
  • Snapshot generated at 2026-03-02T09:49:08.763Z

Regression

| Implementation | Model | Fit median (ms) | Predict median (ms) | MSE | R2 | |---|---|---:|---:|---:|---:| | bun-scikit | StandardScaler + LinearRegression(normal) | 0.4384 | 0.0343 | 0.117545 | 0.529539 | | python-scikit-learn | StandardScaler + LinearRegression | 0.5416 | 0.0677 | 0.117545 | 0.529539 |

Bun fit speedup vs scikit-learn: 1.235x Bun predict speedup vs scikit-learn: 1.977x MSE delta (bun - sklearn): 6.363e-14 R2 delta (bun - sklearn): -2.540e-13

Classification

| Implementation | Model | Fit median (ms) | Predict median (ms) | Accuracy | F1 | |---|---|---:|---:|---:|---:| | bun-scikit | StandardScaler + LogisticRegression(gd,zig) | 1.5080 | 0.0408 | 0.863415 | 0.876106 | | python-scikit-learn | StandardScaler + LogisticRegression(lbfgs) | 1.7238 | 0.1170 | 0.863415 | 0.875000 |

Bun fit speedup vs scikit-learn: 1.143x Bun predict speedup vs scikit-learn: 2.870x Accuracy delta (bun - sklearn): 0.000e+0 F1 delta (bun - sklearn): 1.106e-3

Tree Classification

| Model | Implementation | Fit median (ms) | Predict median (ms) | Accuracy | F1 | |---|---|---:|---:|---:|---:| | DecisionTreeClassifier(maxDepth=8) [js-fast] | bun-scikit | 1.3074 | 0.0234 | 0.936585 | 0.937799 | | DecisionTreeClassifier | python-scikit-learn | 1.9711 | 0.1285 | 0.931707 | 0.933962 | | RandomForestClassifier(nEstimators=80,maxDepth=8) [js-fast] | bun-scikit | 41.0094 | 2.0022 | 0.995122 | 0.995261 | | RandomForestClassifier | python-scikit-learn | 107.0371 | 5.3645 | 0.995122 | 0.995261 |

DecisionTree fit speedup vs scikit-learn: 1.508x DecisionTree predict speedup vs scikit-learn: 5.484x DecisionTree accuracy delta (bun - sklearn): 4.878e-3 DecisionTree f1 delta (bun - sklearn): 3.837e-3

RandomForest fit speedup vs scikit-learn: 2.610x RandomForest predict speedup vs scikit-learn: 2.679x RandomForest accuracy delta (bun - sklearn): 0.000e+0 RandomForest f1 delta (bun - sklearn): 1.110e-16

Tree Backend Modes (Bun vs Bun vs sklearn)

| Model | Backend | Fit median (ms) | Predict median (ms) | Accuracy | F1 | |---|---|---:|---:|---:|---:| | DecisionTreeClassifier(maxDepth=8) | js-fast | 1.3074 | 0.0234 | 0.936585 | 0.937799 | | DecisionTreeClassifier(maxDepth=8) | zig-tree | 1.6545 | 0.0539 | 0.936585 | 0.937799 | | DecisionTreeClassifier | python-scikit-learn | 1.9711 | 0.1285 | 0.931707 | 0.933962 | | RandomForestClassifier(nEstimators=80,maxDepth=8) | js-fast | 41.0094 | 2.0022 | 0.995122 | 0.995261 | | RandomForestClassifier(nEstimators=80,maxDepth=8) | zig-tree | 35.2326 | 1.0628 | 1.000000 | 1.000000 | | RandomForestClassifier | python-scikit-learn | 107.0371 | 5.3645 | 0.995122 | 0.995261 |

DecisionTree zig/js fit speedup: 0.790x DecisionTree zig/js predict speedup: 0.435x RandomForest zig/js fit speedup: 1.164x RandomForest zig/js predict speedup: 1.884x

Snapshot generated at: 2026-03-02T09:49:08.763Z

Documentation

  • Getting started: docs/getting-started.md
  • API reference: docs/api.md
  • Benchmarking: docs/benchmarking.md
  • Zig acceleration: docs/zig-acceleration.md
  • Native ABI: docs/native-abi.md
  • Release checklist: docs/release-checklist.md
  • Release notes draft automation: bun run release:notes (updates docs/release-notes/v*.md + parity block in CHANGELOG.md)

Contributing / Project Files

  • Changelog: CHANGELOG.md
  • Contributing: CONTRIBUTING.md
  • Security: SECURITY.md
  • Code of Conduct: CODE_OF_CONDUCT.md
  • Support: SUPPORT.md