@spark-connect-js/core
v0.4.0
Published
Pure TypeScript core: logical DataFrame API and Spark Connect plan builder, with no runtime dependencies
Downloads
182
Maintainers
Readme
@spark-connect-js/core
DataFrame API and logical plan builder for Spark Connect, in pure TypeScript with zero runtime dependencies.
Note: This project is in early development (v0.4.0) and is not recommended for production usage, but feedback is very welcome on GitHub.
Install
npm install @spark-connect-js/coreMost applications install @spark-connect-js/node instead, which re-exports this package and adds a transport. Install core directly only if you're writing your own runtime adapter (Bun, Deno, browser, custom RPC).
Quick example
import { SparkSession, col, lit, sum, desc, type Transport } from "@spark-connect-js/core";
const spark = new SparkSession(transport);
const df = spark
.table("events")
.filter(col("ts").gt(lit("2025-01-01")))
.groupBy("category")
.agg(sum("amount").alias("total"))
.sort(desc("total"));Provides SparkSession, DataFrame, Column, Catalog, WindowSpec, DataFrameWriter, DataFrameWriterV2, GroupedData, DataFrameStat, the typed error hierarchy, and the built-in function set. Plans are serialized to Spark Connect protobuf, but no I/O happens here; you supply the Transport.
The Transport interface
A runtime adapter implements Transport (one method per Spark Connect RPC) and hands it to SparkSession. The full interface and the contract for each method are in the architecture guide. @spark-connect-js/node is the reference implementation built on @grpc/grpc-js and apache-arrow.
Documentation
Full docs at prustic.github.io/spark-connect-js.
- Architecture: the plan pipeline, types over the wire, sessions
- SQL and DataFrame guide: transformations, actions, the Column DSL
- Error handling: the typed error hierarchy
- Roadmap: what's shipped, what's planned
