open-data-bridge

v1.0.0

Published

3 months ago

`odb` is a CLI tool and a Node.js library designed to simplify the process of working with open data. It fetches data from URLs (CSV, XML - with CSV being the initial focus), infers a schema, and outputs the data in clean JSON, TypeScript types, and a SQL

0High
0Medium
0Low

leumas-tech

Open Data Bridge (odb)

odb is a CLI tool and a Node.js library designed to simplify the process of working with open data. It fetches data from URLs (CSV, XML - with CSV being the initial focus), infers a schema, and outputs the data in clean JSON, TypeScript types, and a SQLite database.

Features

Data Fetching: Download data from remote URLs.
CSV Parsing: Parse CSV data into a structured format.
Schema Inference: Automatically infer a JSON schema from the data.
JSON Output: Generate pretty-printed JSON output.
TypeScript Types: Generate TypeScript interfaces (.d.ts) from the inferred schema.
SQLite Database: Create and populate a SQLite database with the data.
Schema Mapping (Planned): Apply custom mapping files to transform the inferred schema and data.

Best Use Cases

odb is designed to be a powerful ally for anyone working with open data, especially when dealing with raw, often messy, government-published datasets. Here are some of its best use cases:

1. Rapid Prototyping for Civic Hackers and Developers

Quickly transform raw CSV data into structured JSON and a local SQLite database. This is invaluable for:

Building quick dashboards: Get data into a database without manual schema definition.
Developing data visualizations: JSON output is immediately consumable by charting libraries.
Creating proof-of-concept applications: Focus on the application logic, not data wrangling.

2. Streamlining Data Analysis Workflows

For data analysts, odb automates the tedious initial steps of data preparation:

Automated schema inference: No more guessing data types or manually defining table structures.
Consistent data formats: Easily convert diverse CSVs into a uniform JSON or SQLite format for consistent analysis.
Local data storage: Store data in a portable SQLite database for offline analysis or sharing.

3. Enhancing Developer Productivity and Integration

Developers can leverage odb to improve their workflow:

Auto-generated TypeScript types: Directly use the generated .d.ts files in your TypeScript projects (frontend or backend) for strong typing and improved code quality when consuming the processed data.
API Development: Quickly mock or populate local databases for API development and testing using real-world open data.
Reducing boilerplate: Eliminate the need to write custom parsing scripts and schema definitions for each new dataset.

4. Data Transformation and Cleaning (with `--map` option)

While currently in its basic form, the --map option is designed for future enhancements to allow complex data transformations. This will enable:

Renaming columns: Aligning inconsistent column names across datasets.
Type coercion: Ensuring data types are correct for specific use cases.
Data enrichment: Combining data from multiple sources or adding derived fields.

5. Educational and Learning Purposes

odb can serve as an excellent tool for learning about:

Data processing pipelines: Understand the steps from raw data to structured output.
Schema design: Observe how schemas are inferred from real-world data.
CLI tool development: A practical example of building a useful command-line interface.

In essence, odb is for anyone who wants to spend less time cleaning and structuring open data, and more time building, analyzing, and innovating with it.

Installation

As a CLI Tool

Clone the repository:

git clone https://github.com/LeumasTech/open-data-bridge.git
cd open-data-bridge

Install dependencies:
```
npm install
```
Link the CLI (for global use):
```
npm link
```
Now you can run odb from anywhere in your terminal.

As a Node.js Library

Install it in your project:

npm install open-data-bridge

Usage

CLI Usage

To use the odb CLI, you can run it directly using node or if linked globally, just odb:

# If not globally linked
node index.js fetch <url> [options]

# If globally linked
odb fetch <url> [options]

`fetch <url>`

Fetches data from the provided URL, infers its schema, and outputs it in various formats.

Arguments:

<url>: The URL of the data source (e.g., a CSV file).

**Options:

-m, --map <mappingFile>: (Planned) Path to a JSON mapping file for schema transformation.

CLI Examples

# Fetch a CSV file and generate JSON, TypeScript types, and SQLite database
odb fetch https://raw.githubusercontent.com/datasets/population/main/data/population.csv

# (Planned) Fetch with a mapping file
# odb fetch <url> --map ./my-mapping.json

Library Usage

You can import and use the fetchAction and inferSchema functions directly in your Node.js applications:

const { fetchAction, inferSchema } = require('open-data-bridge');

async function processData() {
  const url = 'https://raw.githubusercontent.com/datasets/population/main/data/population.csv';
  const options = {}; // No mapping file for now

  // Use the fetchAction to process data and generate files
  await fetchAction(url, options);

  // You can also use inferSchema independently if you have records in memory
  // const records = [{ name: 'Test', value: 123 }];
  // const schema = inferSchema(records);
  // console.log(schema);
}

processData();

Output Files

Upon successful execution, the fetch command will generate the following files in your current working directory:

output.json: The fetched data in JSON format.
output.d.ts: TypeScript declaration file with interfaces corresponding to your data.
output.sqlite: A SQLite database file containing your data in a table named data.

Development

Running Tests

npm test

Contributing

Contributions are welcome! Please open an issue or submit a pull request.