fast_csv_parse
v0.1.0
Published
High-performance CSV parsing using Rust & WebAssembly for optimized speed in both Rust and JavaScript environments.
Readme
fast-csv-parse
Introduction
fast-csv-parse is a high-performance CSV parsing library that leverages Rust with WebAssembly for optimized performance. It provides a seamless interface for both Rust and JavaScript developers to efficiently process CSV data, using SIMD acceleration where possible.
Usage
Building the Project
- To compile the project into WebAssembly, run:
This uses wasm-pack to generate the WebAssembly module along with a JavaScript glue layer.npm run build
Testing
- To run tests (using Jest), execute:
npm run test
Running the JavaScript Wrapper
- Import the module in your JavaScript project:
import { parse_csv } from 'fast-csv-parse'; async function run() { const result = parse_csv('col1,col2\nval1,val2', ',', '"'); console.log(result); } run();
Streaming Variant
fast-csv-parse also provides a streaming API via the parse_csv_stream function for processing large CSV files efficiently without loading the entire file into memory. Its signature is:
parse_csv_stream(input: string, delimiter: string | null, quote: string | null, callback: (row: string[]) => void): void;
Parameters:
- input: CSV data as a string.
- delimiter: The field delimiter (pass null if not applicable).
- quote: The quote character for fields (pass null if not applicable).
- callback: A function that is invoked with an array of strings representing the parsed row.
Example:
import { parse_csv_stream } from 'fast-csv-parse';
parse_csv_stream('col1,col2\nval1,val2', ',', '"', (row) => {
console.log('Parsed row:', row);
});Architecture
JavaScript Perspective
The JavaScript layer provides an easy-to-use wrapper around the WebAssembly build. This layer exposes the CSV parsing API through the function parse_csv, with full TypeScript support through accompanying ".d.ts" definitions. The build and test processes are managed via npm scripts using wasm-pack and Jest.
Rust Perspective
The core CSV parsing logic is implemented in Rust for efficiency and safety. The project uses wasm-bindgen to expose Rust functions such as parse_csv to JavaScript. SIMD optimizations are supported through Rayon, enhancing performance for large datasets. Additionally, the project contains modules such as csv.rs for parsing logic and stream.rs for handling stream-like operations.
Performance and Memory Management
This project is engineered to handle CSV datasets up to 500MB in size. Key techniques include:
Streaming Algorithms:
- For CSV inputs without quoted fields, the parser utilizes parallel processing via Rayon (
par_bridge()), effectively splitting the data into lines without loading the entire file into memory. - For more complex CSV files that include quoted text, a state-machine based parser efficiently processes data incrementally, thus minimizing peak memory usage.
- For CSV inputs without quoted fields, the parser utilizes parallel processing via Rayon (
Wasm Layer Optimizations:
- The WebAssembly layer is tuned for minimal overhead; data is processed in chunks which allows integration with JavaScript streaming APIs to further manage memory consumption.
- These design choices ensure that the system remains performant even when processing very large CSV files.
Happy parsing and high-performance data processing!
