@wlearn/rf
v0.4.1
Published
Random Forest from scratch in C11, compiled to WebAssembly -- classification, regression, ExtraTrees, quantile RF, conformal prediction, histogram binning, JARF rotation, sample weights, proximity matrix
Maintainers
Readme
@wlearn/rf
Random Forest classifier and regressor, written in C11 from scratch. No upstream dependencies.
Part of wlearn (GitHub, all packages). Runs as native C, WASM (browser + Node), with JS and Python wrappers.
Features
- Classification (Gini, entropy, Hellinger) and regression (MSE, MAE)
- Bootstrap aggregating with adjustable sample rate
- ExtraTrees mode (random threshold per feature)
- Heterogeneous RF (depth-dependent feature weighting)
- OOB scoring with optional per-tree weighting
- MDI feature importances and permutation importance
- Cost-complexity pruning
- Local linear leaf models (regression)
- Missing value handling (learned NaN direction per split)
- Monotonic constraints (bound propagation)
- Quantile regression forests (per-leaf sample storage)
- Conformal prediction intervals (Jackknife+-after-Bootstrap)
- Sample weights (per-sample, or
classWeight: 'balanced') - Histogram binning (5-50x speedup on large datasets)
- JARF rotation (Jacobian Aligned Random Forests)
- Proximity matrix (co-leaf fraction across trees)
- Deterministic (LCG PRNG, fixed seed)
- Binary serialization (RF01-RF04, save/load, backward-compatible)
- 137 C tests, 43 JS tests, 31 Python tests, 87 parity tests
Default parameters match sklearn's RandomForestClassifier / RandomForestRegressor exactly.
Installation
JavaScript (npm)
npm install @wlearn/rfPython
# Build the C library
mkdir build && cd build && cmake .. && make
# Set library path
export RF_LIB_PATH=build/librf.soC
mkdir build && cd build
cmake .. -DBUILD_TESTING=ON
make
./test_rf # run testsQuick Start
JavaScript
const { RFModel } = require('@wlearn/rf')
// Classification
const clf = await RFModel.create({ nEstimators: 100, seed: 42 })
clf.fit(X_train, y_train)
const predictions = clf.predict(X_test)
const accuracy = clf.score(X_test, y_test)
const probabilities = clf.predictProba(X_test)
clf.dispose()
// Regression
const reg = await RFModel.create({
nEstimators: 100, task: 'regression', seed: 42
})
reg.fit(X_train, y_train)
const r2 = reg.score(X_test, y_test)
// Quantile predictions
const median = reg.predictQuantile(X_test, 0.5)
const [q10, q90] = reg.predictQuantile(X_test, [0.1, 0.9])
// Prediction intervals (90% coverage)
const { lower, upper } = reg.predictInterval(X_test, 0.1)
reg.dispose()Python
from wlearn_rf import RFModel
# Classification
clf = RFModel({'n_estimators': 100, 'seed': 42})
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
accuracy = clf.score(X_test, y_test)
probabilities = clf.predict_proba(X_test)
clf.dispose()
# Regression
reg = RFModel({'n_estimators': 100, 'task': 'regression', 'seed': 42})
reg.fit(X_train, y_train)
r2 = reg.score(X_test, y_test)
# Quantile predictions
median = reg.predict_quantile(X_test, quantile=0.5)
# Prediction intervals (90% coverage)
lower, upper = reg.predict_interval(X_test, alpha=0.1)
reg.dispose()C
#include "rf.h"
rf_params_t params;
rf_params_init(¶ms);
params.n_estimators = 100;
params.seed = 42;
params.task = 0; // 0 = classification, 1 = regression
rf_forest_t *forest = rf_fit(X, nrow, ncol, y, ¶ms);
double *preds = malloc(nrow * sizeof(double));
rf_predict(forest, X, nrow, ncol, preds);
// Quantile predictions (requires store_leaf_samples=1)
params.store_leaf_samples = 1;
rf_forest_t *qrf = rf_fit(X, nrow, ncol, y, ¶ms);
double quantiles[] = {0.1, 0.5, 0.9};
double *qout = malloc(nrow * 3 * sizeof(double));
rf_predict_quantile(qrf, X_test, ntest, ncol, quantiles, 3, qout);
// Conformal prediction intervals (requires bootstrap + OOB)
double *lower = malloc(ntest * sizeof(double));
double *upper = malloc(ntest * sizeof(double));
rf_predict_interval(forest, X_test, ntest, ncol, 0.1, lower, upper);
rf_free(forest);
rf_free(qrf);API Reference
Parameters
All parameters have defaults that match sklearn. New features are opt-in and do not change default behavior.
| Parameter | JS name | Python name | C field | Default | Description |
|-----------|---------|-------------|---------|---------|-------------|
| Number of trees | nEstimators | n_estimators | n_estimators | 100 | Number of trees in the forest |
| Max depth | maxDepth | max_depth | max_depth | 0 (unlimited) | Maximum tree depth. 0 = grow until pure or min_samples |
| Min samples split | minSamplesSplit | min_samples_split | min_samples_split | 2 | Minimum samples to split an internal node |
| Min samples leaf | minSamplesLeaf | min_samples_leaf | min_samples_leaf | 1 | Minimum samples per leaf |
| Max features | maxFeatures | max_features | max_features | 0 (auto) | Features per split. 0 = sqrt(n) for cls, n/3 for reg. Also: 'sqrt', 'log2', 'third' |
| Max leaf nodes | maxLeafNodes | max_leaf_nodes | max_leaf_nodes | 0 (unlimited) | Maximum number of leaf nodes per tree |
| Bootstrap | bootstrap | bootstrap | bootstrap | 1 (true) | Whether to bootstrap sample |
| Compute OOB | computeOob | compute_oob | compute_oob | 1 (true) | Whether to compute OOB score |
| ExtraTrees | extraTrees | extra_trees | extra_trees | 0 (false) | ExtraTrees mode (random threshold) |
| Seed | seed | seed | seed | 42 | Random seed for reproducibility |
| Task | task | task | task | 'classification' / 0 | 'classification' or 'regression' (JS); 0 or 1 (C/Python) |
v0.2 Parameters
| Parameter | JS name | Python name | C field | Default | Description |
|-----------|---------|-------------|---------|---------|-------------|
| Criterion | criterion | criterion | criterion | 0 / 'gini' | Split criterion. Classification: 'gini'(0), 'entropy'(1), 'hellinger'(2). Regression: 'mse'(0), 'mae'(1) |
| Sample rate | sampleRate | sample_rate | sample_rate | 1.0 | Fraction of samples per tree. Like sklearn max_samples |
| Heterogeneous | heterogeneous | heterogeneous | heterogeneous | 0 | Depth-dependent feature weighting. 1 = enabled |
| OOB weighting | oobWeighting | oob_weighting | oob_weighting | 0 | Weight tree votes by OOB performance. 1 = enabled |
| Alpha trim | alphaTrim | alpha_trim | alpha_trim | 0.0 | Cost-complexity pruning penalty. 0 = no pruning |
| Leaf model | leafModel | leaf_model | leaf_model | 0 | Leaf prediction model. 0 = constant (mean), 1 = local linear (regression only) |
v0.3 Parameters
| Parameter | JS name | Python name | C field | Default | Description |
|-----------|---------|-------------|---------|---------|-------------|
| Store leaf samples | storeLeafSamples | store_leaf_samples | store_leaf_samples | auto | Store per-leaf sample indices for quantile prediction. Auto-enabled for regression |
| Monotonic constraints | monotonicCst | monotonic_cst | monotonic_cst | null | Per-feature constraints: -1 = decreasing, 0 = none, +1 = increasing. Array of length n_features |
v0.4 Parameters
| Parameter | JS name | Python name | C field | Default | Description |
|-----------|---------|-------------|---------|---------|-------------|
| Sample weight | sampleWeight | sample_weight | sample_weight | null (uniform) | Per-sample weights for splits, leaves, bootstrap, and OOB |
| Class weight | classWeight | class_weight | -- | null | 'balanced' auto-computes n / (K * n_k) per class (JS/Python wrapper only) |
| Histogram binning | histogramBinning | histogram_binning | histogram | 0 | Pre-bin features into up to 256 uint8 buckets. 1 = enabled |
| Max bins | maxBins | max_bins | max_bins | 256 | Maximum number of histogram bins (2-256) |
| JARF | jarf | jarf | jarf | 0 | Jacobian Aligned Random Forests. 1 = enabled |
| JARF estimators | jarfNEstimators | jarf_n_estimators | jarf_n_estimators | 50 | Number of trees in JARF surrogate forest |
| JARF max depth | jarfMaxDepth | jarf_max_depth | jarf_max_depth | 6 | Max depth of JARF surrogate trees |
Methods
JavaScript (RFModel)
// Construction
const model = await RFModel.create(params)
// Training
model.fit(X, y) // X: number[][] or Float64Array, y: number[]
// Prediction
model.predict(X) // Returns Float64Array
model.predictProba(X) // Classification only. Returns Float64Array (flat, row-major)
model.score(X, y) // Accuracy (cls) or R2 (reg)
// Quantile prediction (regression, requires store_leaf_samples)
model.predictQuantile(X, 0.5) // Single quantile: returns Float64Array
model.predictQuantile(X, [0.1, 0.9]) // Multiple: returns [Float64Array, Float64Array]
// Conformal prediction intervals (regression, requires bootstrap + OOB)
model.predictInterval(X, alpha) // Returns { lower: Float64Array, upper: Float64Array }
// Inspection
model.featureImportances() // Returns Float64Array (normalized MDI)
model.permutationImportance(X, y, { nRepeats: 5, seed: 42 }) // Model-agnostic importance
model.oobScore() // OOB accuracy (cls) or R2 (reg)
model.proximity(X) // Returns Float64Array (nrow*nrow, symmetric)
model.nFeatures // Number of features
model.nClasses // Number of classes (0 for regression)
model.nTrees // Number of trees
model.isFitted // Whether model is fitted
model.capabilities // { classifier, regressor, predictProba, featureImportances, ... }
// Persistence
const bundle = model.save() // Returns Uint8Array (wlearn bundle format)
const loaded = await RFModel.load(bundle)
// Cleanup (required)
model.dispose()
// AutoML
RFModel.defaultSearchSpace() // Returns search space IR for hyperparameter tuning
// Params
model.getParams() // Returns current params
model.setParams({ maxDepth: 5 })Python (RFModel)
# Construction
model = RFModel(params_dict)
# Training
model.fit(X, y) # X: numpy 2D array, y: numpy 1D array
# Prediction
model.predict(X) # Returns numpy array
model.predict_proba(X) # Classification only. Returns (n, n_classes) array
model.score(X, y) # Accuracy (cls) or R2 (reg)
# Quantile prediction (regression)
model.predict_quantile(X, quantile=0.5) # Returns numpy array
# Conformal prediction intervals (regression)
lower, upper = model.predict_interval(X, alpha=0.1) # Returns (lower, upper) arrays
# Inspection
model.feature_importances() # Returns numpy array
model.permutation_importance(X, y, n_repeats=5, seed=42) # Model-agnostic importance
model.oob_score() # OOB score
model.proximity(X) # Returns (nrow, nrow) numpy array
model.n_features # Number of features
model.n_classes # Number of classes
model.n_trees # Number of trees
model.is_fitted # Whether model is fitted
# Persistence
blob = model.save_raw() # Returns bytes (RF01/RF02/RF03 binary)
loaded = RFModel.load_raw(blob, params={'task': 'regression'})
# Cleanup
model.dispose()C API
// Initialize params with defaults
void rf_params_init(rf_params_t *params);
// Fit a forest. Returns NULL on error.
rf_forest_t *rf_fit(const double *X, int32_t nrow, int32_t ncol,
const double *y, const rf_params_t *params);
// Predict. out must be pre-allocated (nrow doubles). Returns 0 on success.
int rf_predict(const rf_forest_t *forest, const double *X,
int32_t nrow, int32_t ncol, double *out);
// Predict probabilities (classification). out: nrow * n_classes doubles.
int rf_predict_proba(const rf_forest_t *forest, const double *X,
int32_t nrow, int32_t ncol, double *out);
// Predict quantiles (requires store_leaf_samples=1).
// out: nrow * n_quantiles doubles, row-major.
int rf_predict_quantile(const rf_forest_t *forest, const double *X,
int32_t nrow, int32_t ncol,
const double *quantiles, int32_t n_quantiles,
double *out);
// Predict conformal intervals (requires bootstrap + OOB).
// out_lower, out_upper: nrow doubles each.
int rf_predict_interval(const rf_forest_t *forest, const double *X,
int32_t nrow, int32_t ncol, double alpha,
double *out_lower, double *out_upper);
// Permutation importance (model-agnostic).
// out: ncol doubles, mean score drop per feature.
int rf_permutation_importance(const rf_forest_t *forest,
const double *X, int32_t nrow, int32_t ncol,
const double *y, int32_t n_repeats,
uint32_t seed, double *out);
// Proximity matrix. out: nrow * nrow doubles, row-major, symmetric.
int rf_proximity(const rf_forest_t *forest, const double *X,
int32_t nrow, int32_t ncol, double *out);
// Serialize / deserialize
int rf_save(const rf_forest_t *forest, char **out_buf, int32_t *out_len);
rf_forest_t *rf_load(const char *buf, int32_t len);
// Free resources
void rf_free(rf_forest_t *forest);
void rf_free_buffer(void *ptr);
// Error message
const char *rf_get_error(void);Advanced Usage Examples
Missing values (NaN)
NaN values are handled automatically. At each split, both NaN directions (left/right) are tried and the best is learned. At prediction time, NaN values follow the learned direction.
// JS: NaN values work transparently
const X = [[1.0, NaN], [2.0, 3.0], [NaN, 4.0]]
const y = [0, 1, 0]
model.fit(X, y)
model.predict([[NaN, 2.0]]) // NaN follows learned direction# Python: NaN values work transparently
X = np.array([[1.0, np.nan], [2.0, 3.0], [np.nan, 4.0]])
y = np.array([0, 1, 0])
model.fit(X, y)
model.predict(np.array([[np.nan, 2.0]]))Monotonic constraints
Enforce monotonic relationships between features and predictions. Useful when domain knowledge dictates that increasing a feature should never decrease the prediction.
const model = await RFModel.create({
task: 'regression',
monotonicCst: [1, -1, 0], // feature 0: increasing, 1: decreasing, 2: unconstrained
seed: 42
})
model.fit(X, y)model = RFModel({
'task': 'regression',
'monotonic_cst': [1, -1, 0],
'seed': 42
})
model.fit(X, y)Quantile regression forests
Predict any quantile of the conditional distribution. A single fitted model serves all quantiles.
const model = await RFModel.create({
task: 'regression', nEstimators: 200, seed: 42
})
model.fit(X_train, y_train)
// 90% prediction interval via quantiles
const [q05, q95] = model.predictQuantile(X_test, [0.05, 0.95])
model.dispose()model = RFModel({'task': 'regression', 'n_estimators': 200, 'seed': 42})
model.fit(X_train, y_train)
q05 = model.predict_quantile(X_test, quantile=0.05)
q95 = model.predict_quantile(X_test, quantile=0.95)Conformal prediction intervals
Distribution-free prediction intervals with finite-sample coverage guarantees via Jackknife+-after-Bootstrap. Uses OOB residuals computed during training.
const model = await RFModel.create({
task: 'regression', nEstimators: 200, seed: 42
})
model.fit(X_train, y_train)
// 90% prediction intervals (alpha=0.1)
const { lower, upper } = model.predictInterval(X_test, 0.1)
model.dispose()model = RFModel({'task': 'regression', 'n_estimators': 200, 'seed': 42})
model.fit(X_train, y_train)
lower, upper = model.predict_interval(X_test, alpha=0.1)Sample weights
Per-sample weights affect split criteria, leaf predictions, bootstrap sampling, and OOB scoring.
const model = await RFModel.create({
nEstimators: 100, seed: 42,
sampleWeight: [1.0, 2.0, 1.0, 0.5, ...] // per-sample weights
})
model.fit(X, y)
// Or use classWeight: 'balanced' for auto-computed weights
const balanced = await RFModel.create({
nEstimators: 100, classWeight: 'balanced', seed: 42
})
balanced.fit(X_imbalanced, y_imbalanced)model = RFModel({
'n_estimators': 100, 'seed': 42,
'sample_weight': [1.0, 2.0, 1.0, 0.5, ...]
})
model.fit(X, y)
# Or class_weight='balanced'
balanced = RFModel({'n_estimators': 100, 'class_weight': 'balanced', 'seed': 42})
balanced.fit(X_imbalanced, y_imbalanced)Histogram binning
Pre-bins features into up to 256 quantile-spaced uint8 buckets. Split search becomes O(B) per feature instead of O(n). 5-50x speedup on large datasets. Auto-disabled for ExtraTrees mode.
const model = await RFModel.create({
nEstimators: 200, histogramBinning: 1, seed: 42
})
model.fit(X_large, y_large) // much faster for n > 10Kmodel = RFModel({'n_estimators': 200, 'histogram_binning': 1, 'seed': 42})
model.fit(X_large, y_large)JARF rotation
Jacobian Aligned Random Forests: trains a lightweight surrogate forest, computes finite-difference Jacobian per sample, forms EJOP matrix, eigendecomposes, and rotates features before training the main forest. Improves accuracy on rotated/diagonal decision boundaries.
const model = await RFModel.create({
nEstimators: 200, jarf: 1, seed: 42
})
model.fit(X, y) // rotation learned and applied automatically
model.predict(X_test) // rotation applied at prediction time toomodel = RFModel({'n_estimators': 200, 'jarf': 1, 'seed': 42})
model.fit(X, y)
model.predict(X_test)Permutation importance
Model-agnostic, unbiased feature importance. Shuffles each feature, re-predicts, measures score drop. Unlike MDI, unbiased toward high-cardinality features.
const imp = model.permutationImportance(X_test, y_test, { nRepeats: 10, seed: 42 })
// imp is Float64Array of length n_featuresimp = model.permutation_importance(X_test, y_test, n_repeats=10, seed=42)
# imp is numpy array of length n_featuresProximity matrix
Computes the fraction of trees where each pair of samples lands in the same leaf. Useful for outlier detection, clustering visualization, and missing value imputation.
const prox = model.proximity(X) // Float64Array of length nrow*nrow
// prox[i * nrow + j] = fraction of trees where samples i and j share a leafprox = model.proximity(X) # (nrow, nrow) numpy arrayEntropy criterion with sample rate
const model = await RFModel.create({
nEstimators: 200,
criterion: 'entropy',
sampleRate: 0.7,
seed: 42
})
model.fit(X, y)Heterogeneous RF for high-dimensional data
Depth-dependent feature weighting promotes diversity. Features overused at shallow depths are downweighted at deeper levels, giving more features a chance to contribute.
model = RFModel({
'n_estimators': 200,
'heterogeneous': 1,
'max_features': 0, # auto (sqrt)
'task': 'regression',
'seed': 42
})
model.fit(X, y)Local linear leaves for regression
Instead of predicting the leaf mean, fit a ridge regression per leaf.
Best combined with shallow trees (max_depth=3-7) so each leaf has enough
samples for a meaningful linear fit.
model = RFModel({
'n_estimators': 100,
'task': 'regression',
'leaf_model': 1, # local linear
'max_depth': 5, # shallow trees
'seed': 42
})
model.fit(X, y)
r2 = model.score(X_test, y_test)Hellinger distance for imbalanced classification
Hellinger distance is robust to class imbalance -- it measures divergence between class distributions at each split rather than impurity reduction.
const model = await RFModel.create({
nEstimators: 200,
criterion: 'hellinger',
seed: 42
})
model.fit(X_imbalanced, y_imbalanced)Cost-complexity pruning
Post-hoc pruning reduces overfitting by collapsing subtrees whose impurity gain
is less than alpha per leaf.
const model = await RFModel.create({
nEstimators: 100,
alphaTrim: 0.01,
seed: 42
})
model.fit(X, y)OOB-weighted voting
Weight each tree's vote by its individual OOB performance. Trees that generalize better get more influence.
model = RFModel({
'n_estimators': 200,
'oob_weighting': 1,
'seed': 42
})
model.fit(X, y)Combined configuration
const model = await RFModel.create({
nEstimators: 200,
task: 'regression',
criterion: 'mse',
leafModel: 1, // local linear leaves
maxDepth: 5, // shallow trees for linear leaves
heterogeneous: 1, // depth-dependent feature weighting
oobWeighting: 1, // weight trees by OOB performance
sampleRate: 0.8, // subsample 80%
monotonicCst: [1, 0, -1, 0, 0], // monotonic constraints
seed: 42
})
model.fit(X, y)Benchmark / Ablation Study
Benchmark on synthetic data (n=500, 100 trees, seed=42). OOB scores are the most meaningful metric since they estimate generalization without a separate test set.
Classification (10 features, 3 classes, overlapping distributions)
| Configuration | Train Acc | OOB Acc | Fit (ms) | | ------------------------------ | --------: | -------: | -------: | | Default (Gini) | 1.0000 | 0.9700 | 54 | | Entropy | 1.0000 | 0.9680 | 29 | | Hellinger | 1.0000 | 0.9980 | 23 | | Sample rate 0.7 | 1.0000 | 0.9740 | 15 | | Heterogeneous RF | 1.0000 | 0.9820 | 22 | | OOB weighting | 1.0000 | 0.9700 | 22 | | Alpha trim 0.01 | 1.0000 | 0.9700 | 22 | | ExtraTrees | 1.0000 | 0.9820 | 4 | | Entropy + HRF + OOB wt | 1.0000 | 0.9680 | 24 |
Hellinger distance achieves the best OOB accuracy (0.998) on this multiclass problem. ExtraTrees and HRF also improve over the default.
Regression (10 features, linear signal with noise)
| Configuration | Train R2 | OOB R2 | Fit (ms) | | ------------------------------ | --------: | -------: | -------: | | Default (MSE) | 0.9762 | 0.8236 | 86 | | MAE | 0.9731 | 0.8011 | 178 | | Sample rate 0.7 | 0.9536 | 0.8140 | 57 | | Heterogeneous RF | 0.9687 | 0.7664 | 91 | | OOB weighting | 0.9771 | 0.8236 | 88 | | Alpha trim 0.01 | 0.9684 | 0.8144 | 89 | | Linear leaves | 0.9762 | 0.8236 | 87 | | Linear leaves (depth=5) | 0.9913 | 0.9677 | 55 | | ExtraTrees | 0.9744 | 0.8083 | 25 | | Linear + HRF + OOB wt | 0.9706 | 0.7664 | 95 |
Linear leaves with shallow trees (depth=5) dramatically improve OOB R2 from 0.82 to 0.97 on this linear signal, confirming that local linear models capture within-leaf trends that constant leaves miss. The fit time is also lower due to fewer nodes.
Serialization
Models are serialized in a compact binary format (RF01 magic, format version 4). The format is backward-compatible: v4 readers can load v1/v2/v3 files (new fields get defaults). v3 adds 1 byte per node for learned NaN direction. v4 adds histogram flag, JARF flag, max_bins, and JARF rotation matrix storage.
// JS: wlearn bundle format (wraps RF binary in WLRN container)
const bundle = model.save() // Uint8Array
const loaded = await RFModel.load(bundle)# Python: raw RF binary format
blob = model.save_raw() # bytes
loaded = RFModel.load_raw(blob)// C: raw RF binary format
char *buf; int32_t len;
rf_save(forest, &buf, &len);
rf_forest_t *loaded = rf_load(buf, len);
rf_free_buffer(buf);Building
Native (C)
mkdir build && cd build
cmake .. -DBUILD_TESTING=ON
make
./test_rf # 137 testsWASM
Requires Emscripten.
bash scripts/build-wasm.sh # outputs wasm/rf.js
bash scripts/verify-exports.sh # verifies 24 exportsJS Tests
node test/test.js # 43 testsPython Tests
RF_LIB_PATH=build/librf.so python test/test_python.py # 31 tests
RF_LIB_PATH=build/librf.so python -m pytest test/test_parity.py -v # 87 testsBenchmark
gcc -O2 -std=c11 -o build/benchmark test/benchmark.c csrc/rf.c -Icsrc -lm
./build/benchmarkReferences
- Meinshausen (2006). "Quantile Regression Forests." JMLR, 7(35):983-999.
- Barber et al. (2021). "Predictive Inference with the Jackknife+." Annals of Statistics, 49(1):486-507.
- Bartley et al. (2020). "An Improved Method for Monotone Constraints in Gradient Boosting." arXiv:2011.00986.
- Cieslak et al. (2012). "Hellinger distance decision trees are robust and skew-insensitive." Data Mining and Knowledge Discovery, 24(1), 136-158.
- Klusowski & Tian (2024). "Heterogeneous Random Forests." arXiv:2410.19022.
- Mentch & Zhou (2024). "Randomized Trees and Applications." arXiv:2410.04297.
- Correia & Lecue (2024). "Adaptive Random Forests with Cost-Complexity Pruning." arXiv:2408.07151.
- Horvath et al. (2025). "RaFFLE: Random Forest with Local Linear Extensions." arXiv:2502.10185.
- Ke et al. (2017). "LightGBM: A Highly Efficient Gradient Boosting Decision Tree." NeurIPS.
- Rauniyar et al. (2025). "Jacobian Aligned Random Forests." arXiv:2512.08306.
License
MIT
