npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

databridge

v1.5.7

Published

Data bridging software to modularize, automate and schedule the transfer of data between different sources and destinations.

Downloads

39

Readme

DataBridge

A framework for automated and programmatic data transfer. Separate source and destination modules allow for a high degree of customization and deployment-specific data handling.

Installation

See INSTALL.

Usage

Command-Line

Run bridge or batch. In project directory at command-line:

node app --help
npm start -- --help

Re-setup:

node setup

Cleanup any extraneous output files (make appropriate backups first). Takes -d flag for number of previous days' files to keep.

node clean
node clean -d 3
node clean -d 0

Manage bind variables. Follow prompts:

node bind

Node Module

npm install databridge --save
var databridge = require('databridge');

The databridge module exposes a number of functions/objects. See the documentation at the beginning of each module for specific function reference.

{
  setupConfig: [Function],            // bin/config-setup
  config: {                           // config.json (if exists)
    dirs: {
      batches: '{...}/batches/',
      creds: '{...}/creds/',
      destinations: '{...}/destinations/',
      input: '{...}/input/',
      logs: '{...}/logs/',
      output: '{...}/output/',
      sources: '{...}/sources/'
    },
    logto: 'console',
    defaultBindVars: {},
    schedule: '{...}/schedule.json'
  },
  bridge: [Function],                 // bin/bridge
  batchParse: [Function],             // bin/batch-parse
  bridgeRunner: [Function],           // bin/bridge-runner
  list: {
    src: [Function],                  // bin/list-src
    dest: [Function],                 // bin/list-dest
    tables: [Function],               // bin/list-tables
    batches: [Function]               // bin/list-batches
  }
}

Data Types

Data type detection is handled by typeof() and a combination of source column strings.

  1. GPA anywhere or _DEC at the end of the column name will be parsed as DECIMAL(8,2). (_DEC will be removed from the destination column name.)
  2. DATE or TIMESTAMP in the column name is parsed as DATE.
  3. For all other types, the first row of data will be used.
  4. If typeof(value) == 'number' parsed as INT.
  5. Else parsed as VARCHAR(255)

This behavior is run by bin/col-parser and should be customized for particular situations especially involving large amounts of data.

Indexes

For SQL-based destinations, indexes are created for column names with the trailing string _IND. This trailing string is removed from the destination column name.

Running as a Service / PM2

Uses config.schedule file and can setup service. Requires pm2 installed globally npm install -g pm2.

# start pm2 service
npm run service-start
# restart pm2
npm run service-restart
# stop pm2
npm run service-stop

It is possible to run pm2 at startup.

Schedule / Batch Configuration

NOTE: When using the service, it cannot prompt for bind variables if needed. Therefore, any sources that require bind variables will throw an error if bind variables are not defined. Each job object in that case MUST have a binds: true or defined binds: {...} attribute.

Option for truncate allows for truncating the table without completely dropping and recreating (sql databases only). Use "truncate": true or -n.

Option for update allows for appending values to the table rather than completely deleting table and recreating (databases only). Use "update": true or -u.

Run a custom script with "type": "script" and the name of the file (no extension) inside local/input/ under "name": "script".

Each schedule object requires cron attribute in the following format.

*    *    *    *    *    *
┬    ┬    ┬    ┬    ┬    ┬
│    │    │    │    │    |
│    │    │    │    │    └ day of week (0 - 7) (0 or 7 is Sun)
│    │    │    │    └───── month (1 - 12)
│    │    │    └────────── day of month (1 - 31)
│    │    └─────────────── hour (0 - 23)
│    └──────────────────── minute (0 - 59)
└───────────────────────── second (0 - 59, optional)

Example schedule.json file:

[{
  "name": "test",
  "type": "batch",
  "cron": "*/30 * * * *"
}, {
  "type": "bridge",
  "name": "oracle employees.ferpa_certified => mssql",
  "cron": "*/10 * * * *",
  "binds": true,
  "source": "oracle",
  "destination": "mssql",
  "table": "employees.ferpa_certified",
  "truncate": true
}, {
  "type": "bridge",
  "name": "mssql surveys.population_open => csv",
  "cron": "*/20 * * * *",
  "binds": true,
  "source": "mssql",
  "table": "surveys.population_open",
  "destination": "csv"
}, {
  "type": "bridge",
  "name": "xlsx employees.ferpa_certified => mssql",
  "cron": "*/25 * * * *",
  "binds": true,
  "source": "xlsx",
  "destination": "mssql",
  "table": "employees.ferpa_certified",
  "update": true
}, {
  "type": "script",
  "name": "name of script inside input directory, file extension",
  "cron": "0 1 * * *"
}]

Testing

Install Mocha globally:

npm install -g mocha

All tests:

npm test

All destinations or sources:

mocha spec/destinations
mocha spec/sources

Just one destination/source:

mocha spec/destinations --one=mssql
mocha spec/sources --one=mysql

Customizing sources / destinations

Note about require()

If you plan on using a separate input/output/source dir (as in when you ran npm install, you told databridge to make the "local" folder outside the main databridge folder), you'll need to either:

  • Install necessary packages (like database connection packages) inside that local directory.
  • Use something like require.main.require to include the database source or scripts from the main databridge directory.

Databridge will be requiring the source directory but it will try to require from a different path than the main databridge directory.

See this great GitHubGist article by branneman for more information and options.

Multiple similar sources (oracle => oracle)

One way to accomplish transfer from two separate databases of the same type is to create a source or destination from one to the other that changes the source name (and therefore the credentials/connection file). For example, a custom oracle source could be added inside local/sources/newsource.js would make databridge look for creds/newsource.js. Here's how to do that:

//include oracle source from bin/src/
module.exports = (opt, moduleCallback) => {
  const oracle = require.main.require('./bin/src/oracle')
  oracle(opt, (e, r, c) => {
    moduleCallback(e, r, c)
  })
}

Source modules

Source modules are passed the config/opt object from the bridge.

Source modules MUST return:

  1. null or error.
  2. Number of rows pulled from source.
  3. Array of column names from source.

Destination modules

Destination modules are passed config/opt object from the bridge and the columns definitions generated by bin/col-parser.

Destination modules MUST return:

  1. null or error.
  2. Number of rows written to destination.
  3. Array of column names at destination.