open-data-bridge
v1.0.0
Published
`odb` is a CLI tool and a Node.js library designed to simplify the process of working with open data. It fetches data from URLs (CSV, XML - with CSV being the initial focus), infers a schema, and outputs the data in clean JSON, TypeScript types, and a SQL
Readme
Open Data Bridge (odb)
odb is a CLI tool and a Node.js library designed to simplify the process of working with open data. It fetches data from URLs (CSV, XML - with CSV being the initial focus), infers a schema, and outputs the data in clean JSON, TypeScript types, and a SQLite database.
Features
- Data Fetching: Download data from remote URLs.
- CSV Parsing: Parse CSV data into a structured format.
- Schema Inference: Automatically infer a JSON schema from the data.
- JSON Output: Generate pretty-printed JSON output.
- TypeScript Types: Generate TypeScript interfaces (
.d.ts) from the inferred schema. - SQLite Database: Create and populate a SQLite database with the data.
- Schema Mapping (Planned): Apply custom mapping files to transform the inferred schema and data.
Best Use Cases
odb is designed to be a powerful ally for anyone working with open data, especially when dealing with raw, often messy, government-published datasets. Here are some of its best use cases:
1. Rapid Prototyping for Civic Hackers and Developers
Quickly transform raw CSV data into structured JSON and a local SQLite database. This is invaluable for:
- Building quick dashboards: Get data into a database without manual schema definition.
- Developing data visualizations: JSON output is immediately consumable by charting libraries.
- Creating proof-of-concept applications: Focus on the application logic, not data wrangling.
2. Streamlining Data Analysis Workflows
For data analysts, odb automates the tedious initial steps of data preparation:
- Automated schema inference: No more guessing data types or manually defining table structures.
- Consistent data formats: Easily convert diverse CSVs into a uniform JSON or SQLite format for consistent analysis.
- Local data storage: Store data in a portable SQLite database for offline analysis or sharing.
3. Enhancing Developer Productivity and Integration
Developers can leverage odb to improve their workflow:
- Auto-generated TypeScript types: Directly use the generated
.d.tsfiles in your TypeScript projects (frontend or backend) for strong typing and improved code quality when consuming the processed data. - API Development: Quickly mock or populate local databases for API development and testing using real-world open data.
- Reducing boilerplate: Eliminate the need to write custom parsing scripts and schema definitions for each new dataset.
4. Data Transformation and Cleaning (with --map option)
While currently in its basic form, the --map option is designed for future enhancements to allow complex data transformations. This will enable:
- Renaming columns: Aligning inconsistent column names across datasets.
- Type coercion: Ensuring data types are correct for specific use cases.
- Data enrichment: Combining data from multiple sources or adding derived fields.
5. Educational and Learning Purposes
odb can serve as an excellent tool for learning about:
- Data processing pipelines: Understand the steps from raw data to structured output.
- Schema design: Observe how schemas are inferred from real-world data.
- CLI tool development: A practical example of building a useful command-line interface.
In essence, odb is for anyone who wants to spend less time cleaning and structuring open data, and more time building, analyzing, and innovating with it.
Installation
As a CLI Tool
- Clone the repository:
git clone https://github.com/LeumasTech/open-data-bridge.git cd open-data-bridge - Install dependencies:
npm install - Link the CLI (for global use):
Now you can runnpm linkodbfrom anywhere in your terminal.
As a Node.js Library
Install it in your project:
npm install open-data-bridgeUsage
CLI Usage
To use the odb CLI, you can run it directly using node or if linked globally, just odb:
# If not globally linked
node index.js fetch <url> [options]
# If globally linked
odb fetch <url> [options]fetch <url>
Fetches data from the provided URL, infers its schema, and outputs it in various formats.
Arguments:
<url>: The URL of the data source (e.g., a CSV file).
**Options:
-m, --map <mappingFile>: (Planned) Path to a JSON mapping file for schema transformation.
CLI Examples
# Fetch a CSV file and generate JSON, TypeScript types, and SQLite database
odb fetch https://raw.githubusercontent.com/datasets/population/main/data/population.csv
# (Planned) Fetch with a mapping file
# odb fetch <url> --map ./my-mapping.jsonLibrary Usage
You can import and use the fetchAction and inferSchema functions directly in your Node.js applications:
const { fetchAction, inferSchema } = require('open-data-bridge');
async function processData() {
const url = 'https://raw.githubusercontent.com/datasets/population/main/data/population.csv';
const options = {}; // No mapping file for now
// Use the fetchAction to process data and generate files
await fetchAction(url, options);
// You can also use inferSchema independently if you have records in memory
// const records = [{ name: 'Test', value: 123 }];
// const schema = inferSchema(records);
// console.log(schema);
}
processData();Output Files
Upon successful execution, the fetch command will generate the following files in your current working directory:
output.json: The fetched data in JSON format.output.d.ts: TypeScript declaration file with interfaces corresponding to your data.output.sqlite: A SQLite database file containing your data in a table nameddata.
Development
Running Tests
npm testContributing
Contributions are welcome! Please open an issue or submit a pull request.
