npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@aws-mdaa/dataops-job

v1.6.0

Published

MDAA dataops-job module

Downloads

376

Readme

ETL Jobs

Note: This documentation is also available in a rendered format here.

Deploys Glue ETL jobs with automatic script deployment, job templates for config reuse, continuous logging, VPC binding, and project security configuration wiring. Supports Python and Scala runtimes. Use this module when you need to transform, enrich, or move data between sources using Glue Spark or Python shell jobs as part of your data pipeline.


Pre-built Data Quality Script

The module includes a pre-built Glue ETL script for data quality evaluation in the assets/ directory. Reference it using the asset: prefix in scriptLocation and additionalScripts:

dq-main.py — DQ evaluation

Evaluates data quality rulesets against a single table. Supports inline DQDL, S3-stored DQDL, and Glue recommendation rulesets. Optionally publishes results to SageMaker Unified Studio (DataZone). For multi-table fan-out, use dataops-stepfunction-app with a Distributed Map that starts one dq-main.py job run per table.

DqEvaluation:
  command:
    name: glueetl
    scriptLocation: "asset:dq-main.py"
  additionalScripts:
    - "asset:dq_config.py"
    - "asset:smus.py"

Shared utilities

  • asset:dq_config.py — Configuration utilities. Loads rulesets and source data frames from Glue catalog or connection options.
  • asset:smus.py — SMUS publishing. Posts DQ evaluation results to DataZone via post_time_series_data_points.

Deployed Resources

This module deploys and integrates the following resources:

Glue Jobs - Glue Jobs will be created for each job specification in the configs

  • Automatically configured to use project security config
  • Can optionally be VPC bound (via Glue connection)
  • Automatically configured to use project bucket as temp location
  • Can use job templates to promote reuse/minimize config duplication

dataops-job


Related Modules

  • DataOps Project — Deploy the shared project infrastructure (KMS keys, security configs, connections, buckets) that ETL jobs reference
  • Crawlers — Deploy crawlers to catalog ETL job output data in the Glue Catalog
  • Workflows — Orchestrate ETL jobs and crawlers together in Glue Workflows
  • Step Functions — Orchestrate ETL jobs with Step Functions state machines
  • Data Quality — Deploy data quality rulesets to validate ETL job output
  • Data Lake — ETL jobs can read from and write to data lake S3 buckets

Security/Compliance Details

This module is designed in alignment with MDAA security/compliance principles and CDK nag rulesets. Additional review is recommended prior to production deployment, to assist in meeting organization-specific compliance requirements.

  • Encryption at Rest:
    • Jobs use project Glue security configuration for encrypting output data, logs, and bookmarks with the project KMS key
    • S3 output optionally encrypted with a separate data lake KMS key
  • Least Privilege:
    • Execution roles scoped per job
    • Project resources referenced via project: prefix for consistent access control
  • Network Isolation:
    • Optional VPC binding via Glue connections for accessing data sources in private networks

Configuration

MDAA Config

Add the following snippet to your mdaa.yaml under the modules: section of a domain/env in order to use this module:

dataops-job: # Module Name can be customized
  module_path: '@aws-mdaa/dataops-job' # Must match module NPM package name
  module_configs:
    - ./dataops-job.yaml # Filename/path can be customized

Module Config Samples and Variants

Copy the contents of the relevant sample config below into the ./dataops-job.yaml file referenced in the MDAA config snippet above.

Minimal Configuration

Deploys a single Glue ETL job with project autowiring. Start here for a basic ETL job within an existing DataOps project.

sample-config-minimal.yaml

# Contents available via above link
--8<-- "target/docs/packages/apps/dataops/dataops-job-app/sample_configs/sample-config-minimal.yaml"

Comprehensive Configuration

Demonstrates Glue ETL and Python shell jobs with templates, job bookmarks, connections, and extra libraries, all wired to a DataOps project. Start here when evaluating all available options for job types, templates, connections, and library configurations.

sample-config-comprehensive.yaml

# Contents available via above link
--8<-- "target/docs/packages/apps/dataops/dataops-job-app/sample_configs/sample-config-comprehensive.yaml"

Standalone Configuration (No Project)

Demonstrates standalone Glue jobs with explicit KMS, bucket, deployment role, and security configuration. Use this when deploying outside of a DataOps project, providing infrastructure references directly.

sample-config-noproject.yaml

# Contents available via above link
--8<-- "target/docs/packages/apps/dataops/dataops-job-app/sample_configs/sample-config-noproject.yaml"

Worker Type Configuration

Uses workerType + numberOfWorkers instead of maxCapacity for explicit control over Glue worker sizing (Standard, G.1X, or G.2X). Choose this variant when you need predictable worker allocation instead of maxCapacity-based auto-scaling.

sample-config-workertype.yaml

# Contents available via above link
--8<-- "target/docs/packages/apps/dataops/dataops-job-app/sample_configs/sample-config-workertype.yaml"

Config Schema Docs