npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@argilzar/cli-plugin-export-parquet

v1.3.4

Published

Plugin to export data from the flowcore platform as parquet files

Readme

Flowcore CLI Plugin - Export Parquet

Plugin to export data from the flowcore platform as parquet files using DuckDB via the modern @duckdb/node-api package for efficient data processing and storage.

Version oclif

Overview

The Export Parquet plugin for Flowcore CLI allows you to stream data from Flowcore data cores and export it directly to Parquet files. It uses DuckDB via the modern @duckdb/node-api package for in-memory data processing, making it efficient for handling large volumes of streaming data.

Installation

npm install -g @argilzar/cli-plugin-export-parquet

Usage

Basic Usage

# Export with default timestamped filename
flowcore export-parquet "https://flowcore.io/<org>/<Data Core>/*" -s 1y --no-live

Custom Filename with CLI Flag

# Export with custom filename (extension automatically added)
flowcore export-parquet "https://flowcore.io/<org>/<Data Core>/*" -s 1y --no-live --filename brian

# Export with custom filename (extension already included)
flowcore export-parquet "https://flowcore.io/<org>/<Data Core>/*" -s 1y --no-live --filename brian.parquet

# Using short flag
flowcore export-parquet "https://flowcore.io/<org>/<Data Core>/*" -s 1y --no-live -f brian

# Export with custom output directory
flowcore export-parquet "https://flowcore.io/<org>/<Data Core>/*" -s 1y --no-live --output-dir /path/to/exports

# Export with custom filename and output directory
flowcore export-parquet "https://flowcore.io/<org>/<Data Core>/*" -s 1y --no-live --filename brian --output-dir /path/to/exports

# Using short flags
flowcore export-parquet "https://flowcore.io/<org>/<Data Core>/*" -s 1y --no-live -f brian -o /path/to/exports

Custom Filename (Programmatic)

// Create service with custom filename
const exportService = new ExportParquetService(logger, "my-custom-export.parquet");

Programmatic Usage

import { ExportParquetService } from "@flowcore/cli-plugin-export-parquet";

// With default filename (timestamped) and default output directory
const service = new ExportParquetService(logger);

// With custom filename (extension automatically added) and default output directory
const service = new ExportParquetService(logger, "my-export");

// With custom filename (extension already included) and default output directory
const service = new ExportParquetService(logger, "my-export.parquet");

// With custom filename and custom output directory
const service = new ExportParquetService(logger, "my-export", "/path/to/exports");

// With default filename and custom output directory
const service = new ExportParquetService(logger, undefined, "/path/to/exports");

Note: The .parquet extension is automatically added if not provided. Both "my-export" and "my-export.parquet" will result in the same filename.

Commands

export-parquet <STREAM>

Export data from a Flowcore stream as parquet files.

Arguments:

  • STREAM - The stream URL in the format https://flowcore.io/<org>/<Data Core>/*

Flags:

  • -s, --start=<value> - Start time for the export (e.g., "1y", "2024-01-01")
  • -e, --end=<value> - End time for the export (e.g., "2024-12-31")
  • --live - Enable live streaming (continuous export)
  • --no-live - Disable live streaming (one-time export)
  • -f, --filename=<value> - Custom filename for the parquet export (.parquet extension automatically added)
  • -o, --output-dir=<value> - Custom output directory for the parquet file (default: ./exports)

Examples:

# Export last year's data
export-parquet "https://flowcore.io/myorg/mydatacore/*" -s 1y --no-live

# Export specific date range
export-parquet "https://flowcore.io/myorg/mydatacore/*" -s 2024-01-01 -e 2024-12-31 --no-live

# Live streaming export
export-parquet "https://flowcore.io/myorg/mydatacore/*" --live

Features

  • Modern DuckDB Integration: Uses @duckdb/node-api for the latest DuckDB features and performance
  • Streaming Support: Handles both batch and live streaming data
  • Intelligent Schema Detection: Automatically analyzes payload structures and creates appropriately typed columns
  • Dynamic Schema Evolution: Automatically adds new columns as payload structures are discovered
  • Native Data Type Preservation: Stores values in their native types (numbers, booleans, timestamps) instead of JSON strings
  • Proper Timestamp Handling: Correctly handles ISO 8601 datetime strings without conversion errors
  • Unix Timestamp Detection: Automatically detects Unix timestamps in numeric values and creates TIMESTAMP columns
  • Automatic Unix Timestamp Conversion: Converts Unix timestamps to ISO 8601 strings before inserting into DuckDB
  • Intelligent Type Memory: Remembers detected column types and only performs conversions when necessary
  • Custom Filename Support: Allows specifying custom filenames for parquet exports
  • Custom Output Directory: Allows specifying custom output directories for parquet files
  • Timestamped Output: Generates timestamped parquet files
  • Progress Tracking: Shows progress during export operations
  • Error Handling: Robust error handling with detailed logging

How It Works

  1. Initialization: The service initializes an in-memory DuckDB database using @duckdb/node-api
  2. Streaming: As events arrive, they are processed and stored in DuckDB
  3. Dynamic Schema: The service automatically analyzes payload structures and creates new columns with appropriate data types
  4. Type Memory: Column types are stored in memory for efficient future reference
  5. Data Processing: Each event stores flowcore metadata in a single JSON field and spreads payload fields as individually typed columns
  6. Type Preservation: Values are stored in their native types (e.g., numbers as numbers, not quoted strings)
  7. Smart Timestamp Detection: Automatically detects both string-based and numeric Unix timestamps
  8. Conditional Conversion: Unix timestamps are only converted when the column type requires it
  9. Export: When the stream completes, data is exported to a timestamped parquet file
  10. Cleanup: Database connections are properly closed

Output

  • Location: Files are saved to the ./exports/ directory by default, or to a custom directory if specified
  • Naming: Files follow the pattern events_YYYY-MM-DDTHH-MM-SS-sssZ.parquet by default
  • Custom Filenames: Can be customized by passing a filename parameter to the service constructor
  • Custom Output Directories: Can be customized by passing an outputDir parameter to the service constructor
  • Format: Standard Parquet format for optimal compression and query performance
  • Structure: Each row contains:
    • flowcore (JSON): Complete SourceEvent metadata excluding payload:
      • eventId: Unique event identifier
      • dataCoreId: Data core ID
      • flowType: Flow type name
      • eventType: Event type name
      • timeBucket: Time bucket for the event
      • metadata: Event metadata
      • validTime: Event validity timestamp
    • Payload Fields (auto-discovered with intelligent typing):
      • Numeric Fields: BIGINT for integers, DOUBLE for decimals (stored as native numbers)
      • String Fields: VARCHAR for text data (stored as native strings)
      • DateTime Fields: TIMESTAMP for date/time strings and Unix timestamps (stored as native timestamps)
      • Boolean Fields: BOOLEAN for true/false values (stored as native booleans)
      • Complex Fields: JSON for objects and arrays (stored as JSON strings)
      • Unix Timestamp Detection: Automatically detects numeric Unix timestamps (10, 13, or 16 digits) and creates TIMESTAMP columns
      • Unix Timestamp Conversion: Converts Unix timestamps to ISO 8601 strings before insertion to ensure proper DuckDB compatibility
      • Type Memory System: Remembers detected column types to avoid unnecessary conversions on subsequent events
      • Field names match exactly as they appear in the payload
      • These fields are automatically created with appropriate types as they are discovered in the event stream
      • No Double Quotes: Numeric, boolean, and timestamp values are stored without quotes, preserving their native types

Requirements

  • Node.js >= 18.0.0
  • Flowcore CLI with proper authentication
  • Access to the target data core

Dependencies

  • @duckdb/node-api: Modern Node.js API for DuckDB database operations
  • @flowcore/cli-plugin-core: Core Flowcore CLI functionality
  • @flowcore/cli-plugin-config: Configuration management

Development

# Install dependencies
yarn install

# Build the project
yarn build

# Run tests
yarn test

# Run linter
yarn lint

Architecture

The plugin consists of:

  • ExportParquetService: Core service implementing the OutputService interface
  • DuckDB Integration: In-memory database using @duckdb/node-api for data processing
  • Stream Processing: Handles the Flowcore event stream lifecycle
  • Parquet Export: Converts processed data to parquet format

License

MIT