dcf
v0.2.9
Published
> Early development stage: this project was still under early development, many necessery feature was not done yet, use it on your own risk.
Downloads
47
Readme
Distributed Computing Framework for Node.js
Early development stage: this project was still under early development, many necessery feature was not done yet, use it on your own risk.
A node.js version of Spark, without hadoop or jvm.
You should read tutorial first, then you can learn Spark but use this project instead.
Async API & deferred API
Any api that requires a RDD and generate a result is async, like count, take, max ...
Any api that creates a RDD is deferred API, which is not async, so you can chain them like this:
await dcc
.parallelize([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
.map(v => v + 1)
.filter(v => v % 2 === 0)
.take(10); // take is not deferred api but asyncMilestones
0.1.x: Basic
- [x] local master.
- [x] rdd & partition creation & release.
- [x] map & reduce
- [x] repartition & reduceByKey
- [x] disk storage partitions
- [x] cache
- [x] file loader & saver
- [x] export module to npm
- [x] decompresser & compresser
- [x] use debug module for information/error
- [x] provide a progress bar.
- [ ] sampler
- [x] sort
- [ ] object hash(for key) method
- [ ] storage MEMORY_OR_DISK, and use it in sort
- [ ] storage MEMORY_SER,storage in memory but off v8 heap.
- [ ] config default partition count.
0.2.x: Remote mode
- [ ] distributed master
- [ ] runtime sandbox
- [ ] plugin system
- [ ] remote dependency management
- [ ] aliyun oss loader
- [ ] hdfs loader
How to use
Install from npm(shell only)
npm install -g dcf
#or
yarn global add dcfThen you can use command: dcf-shell
Install from npm(as dependency)
npm install --save dcf
#or
yarn add dcfThen you can use dcf with javascript or typescript.
Run samples & cli
download this repo, install dependencies
npm install
# or
yarnRun samples:
npm run ts-node src/samples/tutorial-0.ts
npm run ts-node src/samples/repartition.tsRun interactive cli:
npm start