npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

datastore-to-bigquery

v1.2.4

Published

Dump Google Cloud Datastore Contents and load them into BigQuery.

Downloads

8

Readme

version license downloads

datastore-to-bigquery

Dump Google Cloud Datastore Contents and load them into BigQuery.

Sample Output

You can run it with npx:

% npx datastore-to-bigquery --help
usage: datastore-to-bigquery [-h] [-b BUCKET] [-d BACKUPDIR] [-n BACKUPNAME] [-s NAMESPACE] [-p BQPROJECTID]
                             [--datasetName DATASETNAME]
                             projectId

Copy datastore Contents to BigQuery.

positional arguments:
  projectId             Datastore project ID

optional arguments:
  -h, --help            show this help message and exit
  -b BUCKET, --bucket BUCKET
                        GCS bucket to store backup. Needs to be in the same Region as datastore. (default:
                        projectId.appspot.com
  -d BACKUPDIR, --backupDir BACKUPDIR
                        prefix/dir within bucket
  -n BACKUPNAME, --backupName BACKUPNAME
                        name of backup (default: autogenerated)
  -s NAMESPACE, --namespace NAMESPACE
                        datastore namespace
  -p BQPROJECTID, --bqProjectId BQPROJECTID
                        BigQuery project ID. (default: same as datastore)
  --datasetName DATASETNAME
                        Name of BigQuery Dataset to write to. Needs to be in the same Region as GCS bucket. (default:
                        same as projectId)

Please provide `GOOGLE_APPLICATION_CREDENTIALS` via the Environment!

Loading into BigQuery

This loads Datastore Data dumped by datastore-backup or other means into BigQuery. For this you have to make sure that the bucket containing the Data to be loaded and the BigQuery Dataset are in the same location/Region.

The BigQuery Dataset will be created if this does not exist.

CLI Usage

CLI Usage is simple. You have to provide the bucket and path to read from and the name of the BigQuery Project and dataset to write to:

% npx -p datastore-to-bigquery bigqueryLoad --help
usage: bigqueryLoad.ts [-h] bucket pathPrefix projectId datasetName

Load Datastore Backup into BigQuery.

positional arguments:
  bucket       GCS bucket to read backup.
  pathPrefix   Backup dir & name of backup in GCS bucket.
  projectId    BigQuery project ID.
  datasetName  Name of BigQuery Dataset to write to. Needs to be in the same Region as GCS bucket.

optional arguments:
  -h, --help   show this help message and exit

Please provide `GOOGLE_APPLICATION_CREDENTIALS` via the Environment!

Loading takes a few seconds per kind:

% yarn ts-node src/bin/bigqueryLoad.ts samplebucket-tmp bak/20211223T085120-sampleproj sampleproj test_EU
ℹ bucket samplebucket-tmp is in EU
✔  BigQuery Dataset test_EU exists
ℹ dataset sampleproj:test_EU is in unknown location
ℹ Reading samplebucket-tmp/bak/20211223T085120-sampleproj*
✔ Loading NumberingAncestor done in 1.33s
✔ Loading NumberingItem done in 4.231s

Moving the dataset

In case you need the dataset in an different BigQuery location / region for reading you can use bigquery transfer service which is blazing fast:

bq --location=US mk --dataset sampleproj:test_US
bq mk --transfer_config --data_source=cross_region_copy --display_name='Copy Dataset' \
      --project_id=sampleproj --target_dataset=test_US
      --params='{"source_dataset_id":"test_US","source_project_id":"sampleproj"}'

Programmatic Usage

Basically the same as command line usage:

import { BigQuery } from '@google-cloud/bigquery';
import {loadAllKindsFromPrefix} from '../lib/load-into-to-bigquery';

const bigquery = ;
await loadAllKindsFromPrefix(
  new BigQuery({ projectId }),
  args.datasetName,
  args.bucket,
  args.pathPrefix,
);

Full Dump-Load Cycle

% npx datastore-to-bigquery datastoreProject -n production -b bucket-tmp -b bigqueryProject

Hints

Permissions are a a little tricky to set up: Permissions for Datastore Export must exist in the Source and also for writing to the Bucket. Permissions for BigQuery-Load must exist on BigQuery. Permission for listing and reading must exist on GCS.

Locations / Regions are also tricky to setup. Basically the Datastore, the Bucket and the Dataset should have the same region, e.g. EU. If your need to do BigQuery from a different region, see "Moving the dataset".

Beware of namespaces! Dumping different Namespaces and loading them into the same BigQuery Dataset will result in incomplete Data in BigQuery.

See also: