starlight-dataset
v1.0.0
Published
Dataset utilities for batching, shuffling, and splitting data in Starlight ML
Maintainers
Readme
starlight-dataset
A lightweight dataset utility library for the Starlight Machine Learning ecosystem. It provides a clean abstraction for handling data, batching, shuffling, and train/test splitting—designed to work seamlessly with other Starlight ML packages.
Features
- Dataset abstraction (
Datasetclass) - Immutable operations (
map,filter,shuffle, etc.) - Deterministic shuffling
- Batch generation
- Train / test split
- Works with regression, classification, clustering, and pipelines
Installation
npm install starlight-datasetOr import directly in your Starlight environment:
import { Dataset, dataset } from "starlight-dataset";Basic Usage
Create a Dataset
import { dataset } from "starlight-dataset";
const ds = dataset([1, 2, 3, 4, 5]);Map & Filter
const processed = ds
.map(x => x * 2)
.filter(x => x > 5);
processed.toArray();
// [6, 8, 10]Shuffling
const shuffled = ds.shuffle();Deterministic shuffle with seed:
const shuffled = ds.shuffle(0.42);Batching
const batches = ds.batch(2);
batches.toArray();
// [ [1, 2], [3, 4], [5] ]Train / Test Split
const { train, test } = ds.split(0.8);
train.size(); // 4
test.size(); // 1Disable shuffle if needed:
ds.split(0.8, false);Pairing Features & Labels
import { fromPairs } from "starlight-dataset";
const X = [[1], [2], [3]];
const y = [2, 4, 6];
const paired = fromPairs(X, y);
paired.toArray();
// [ { x: [1], y: 2 }, { x: [2], y: 4 }, { x: [3], y: 6 } ]Dataset API
Dataset
| Method | Description |
| ------------------------ | ----------------------- |
| map(fn) | Transform each element |
| filter(fn) | Filter elements |
| shuffle(seed?) | Shuffle dataset |
| batch(size) | Create batches |
| split(ratio, shuffle?) | Train/test split |
| take(n) | Take first n elements |
| skip(n) | Skip first n elements |
| repeat(times) | Repeat dataset |
| size() | Dataset size |
| toArray() | Convert to array |
Designed for Starlight ML
This package integrates naturally with:
- starlight-ml
- starlight-vec
- starlight-classifier
- starlight-regression
- starlight-pipeline
- starlight-train (future)
Philosophy
- Simple over clever
- Immutable over mutable
- Readable over magical
- Educational yet production-ready
License
MIT © Dominex Macedon
