npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@bluehemoth/csvjsonify

v1.2.0

Published

A package which handles data transformation from csv format to json format

Downloads

18

Readme

csv2json

Description

A simple package meant for loading csv data, transforming it to json format and then outputting the transformed data.

Usage

    Description
    
    A simple package which transforms data from csv to json format.
    
    Options
    
    --sourceFile    Absolute path of the file to be transformed.
    --resultFile    Absolute path of the file where the transformed data will be stored.
    --separator     Symbol, which is used in source file to separate values. The value of the should be either of , | ; \t (tab).
                    Defaults to comma if not provided
    
    Examples
    
    csvToJson --sourceFile "D:\source.csv" --resultFile "D:\result.json" --separator ","

Environment variables

  TRANSFORMER_CHOICE  Feature flag which indicates which data transformed should be used
  
  GOOGLE_DRIVE_STORAGE  Feature flag which enables the upload of transformation result to google drive
  
  GOOGLE_APPLICATION_CREDENTIALS_FILE Name of the Google api service account key file
  
  SHARED_FOLDER_ID  Id of the google drive account that is shared with the Google API service account, transformation result will be uploaded to this folder
   
  DATA_DIR  Absolute path of the test data directory
  
  CREDENTIALS_DIR Absolute path of the folder which contains the credentials file
  
  SOURCE_FILE Name of the source file
  
  RESULT_FILE Name of the results file
  
  LOGGING_LEVEL Application logging level

Feature flags

The package allows the customization of its operation via env file feature flags. The package supports these flags:

    TRANSFORMER_CHOICE:
        description:    Decides which transformer will be used to transform the pipe data
        values:
            legacy_csv: Transformer which transforms csv to json by building a JSON string via simple foreach operation
            optimized_csv:  Transformer which extends the legacy transformer and builds JSON strings via .reduce() method
    
    GOOGLE_DRIVE_STORAGE:
        description:    Decides if the transformed file should be stored in google drive
        values:
            enabled: Enables the storage service
            disabled: Disables the storage service

Google drive storage requirements

To use the google drive storage service the user must provide the required authentification credentials. The following steps describe the process of the authentication:

  1. First follow the provided steps and create a service account and a key assigned to this account
  2. After key creation a credential file should be automatically downloaded to your system - move this file to the root directory of this package
  3. Assign the GOOGLE_APPLICATION_CREDENTIALS_FILE environment variable the path of credentials file (relative to root directory of the package)
  4. Create a google drive folder and share it with the service account (share -> type service account email -> editor -> done)
  5. Copy the id of the shared folder and save its value in SHARED_FOLDER_ID environment variable
  6. Set GOOGLE_DRIVE_STORAGE environment variable to enabled

Running the docker image

From release v1.2.0 the package supports its use in docker containers. To run the package in a container follow these steps:

  1. Create a .env in package directory by following the env.example file and the descriptions of the environment variables in Environment variables section
  2. Run docker-compose up --build to run the package container in a detached mode
  3. After successful run the transformed result will be available in the directory specified in DATA_DIR environment variable

Note: built image is also available here You can run this image via docker run with the following command: sudo docker run -v <absolute path of source/result files directory>:/app/testData -v <absolute path of credentials directory>:/app/credentials --env-file <relative path to env> mind33z/csv2json:<version> npm run start -- --sourceFile "/app/testData/<source file name>" --resultFile "/app/testData/<result file name>"

Benchmarks

During performance measuring two metrics were tracked - execution time and memory. The screenshots below demonstrate the results of converting a sample 0.8 MB test file the full bloated 13 GB test file.

V1.1

For this version, only the execution time metric was tracked, as the results of the previous version showed that there is no need to optimise memory usage. The first screenshot shows the results of the test that was run after the _buildJSONStringFromLine function was enhanced. The second screenshot shows the results of the testing after the code in the _transform function has been converted to asynchronous. Both tests were done with 13GB bloated data file.

enhanced build json execution time

async execution time

Enhacement of the _buildJSONStringFromLine had a positive influence on the execution time - the total time of the function decreased by roughly 10x which in the end led to total runtime decreasing by roughly 30 seconds. Converting _transform had an awful effect on package runtime - the total time of each key transform function (except _buildJSONStringFromLine) increased by 2x. This may have happened due to event loop encountering difficulties because it received too many simple task promises. Only the _buildJSONStringFromLine enchancement will be carried over into later versions.

V1.0

Execution time

Sample data (0.8 MB):

Sample execution time

Bloat data (13 GB):

Bloat execution time

The results of the profiler show that functions _buildJSONStringFromLine, _removeEscapeSlashes, _splitLineToArr influence the execution time the most (apart from node's own functions). It should be noted that on bloated dataset _splitLineToArr method overtakes the _buildJSONStringFromLine method in terms of execution time. The following releases should prioritize improving the highlighted methods.

Memory

Sample data (0.8 MB):

sample memory

Bloat data (13 GB):

bloat memory

The results of memory tracking show that even though the package has to process big amounts of data the memory used remains roughly the same. This can be attributed to the use of streams. No further improvements in memory usage are required.

Changelog

v1.2.0 - (2022-10-24)

Added

  • Upload to google drive functionality
  • Autodetect separator if no --separator argument is provided
  • docker-compose.yml, Dockerfile , and .dockerignore files
  • Workflow job for building a docker image from the project and pushing it to DockerHub
  • Csv to json transformer tests and a workflow that runs these tests on push
  • Custom logger implementation

Updated

  • Added Feature flags, Google drive storage requirements , and Running the docker image sections to the README.md file
  • Fixed a separator symbol bug in optimized JSON building method
  • Fixed JSON formatting issues

v1.1.0 - (2022-10-19)

Added

  • Feature flag toggling functionality via .env
  • CsvToJsonOptimizedStream a transform stream class which acts as an improved iteration of the previous transform stream
  • Refactored the project structure
  • TransformerFactory a factory which handles the creation of different transformers

Updated

  • Added benchmarks of the current version to benchmarks section in README.md

v1.0.0 - (2022-10-18)

Added

  • Implemented CsvToJsonStream class. This class:
    • Transforms CSV to JSON data in chunks
    • Handles the case of chunk having an incomplete line
    • Checks if the CSV line was parsed into an array correctly and that no unescaped separators were used in the data itself
  • Measured the execution time and the memory usage of the converter when using the bloated 13 GB data file and sample 0.8 MB test data file.
  • Created a pipe out of ReadStream, CsvToJsonStream , and WriteStream and achieved the basic functionality of the package

Updated

  • README.md

v0.1.1 - (2022-10-14)

Added

  • Input Handling
  • Github actions workflow (on release bump if tag and package version mismatch and publish to npm)
  • Test data file generation function
  • README.md

Package structure

  • index.js contains the main code of the package
  • handleArgs.js contains logic related to handling input arguments
  • generate.js contains test data file generation logic
  • transformers/CsvToJsonStream.js contains the extended transform class used for transforming data from json format to csv
  • transformers/CsvToJsonOptimizedStream.js contains the enhanced transform methods of the CsvToJsonStream class
  • factories/TransformerFactory.jscontains a factory which handles the creation of different transformers
  • uploadToGoogleDrive.js contains a function which uploads the transformed result file to the shared folder provided in .env file
  • tests/ directory contains the tests of the package and a custom test runner
  • utils/Logger.js contains a custom logger implementation