npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@aws-mdaa/dataops-job

v1.4.0

Published

MDAA dataops-job module

Readme

ETL Jobs

The Data Ops Job CDK application is used to deploy the resources required to support and perform data operations on top of a Data Lake using Glue Jobs.


Deployed Resources and Compliance Details

dataops-job

Glue Jobs - Glue Jobs will be created for each job specification in the configs

  • Automatically configured to use project security config
  • Can optionally be VPC bound (via Glue connection)
  • Automatically configured to use project bucket as temp location
  • Can use job templates to promote reuse/minimize config duplication

Configuration

MDAA Config

Add the following snippet to your mdaa.yaml under the modules: section of a domain/env in order to use this module:

dataops-job: # Module Name can be customized
  module_path: '@aws-caef/dataops-job' # Must match module NPM package name
  module_configs:
    - ./dataops-job.yaml # Filename/path can be customized

Module Config (./dataops-job.yaml)

Config Schema Docs

Sample Job Config

Job configs can be templated in order to reuse job definitions across multiple jobs for which perhaps only a few parameters change (such as input/output paths). Templates can be stored separate from job configs, or stored together with job configs in the same file.

projectName: dataops-project-test

templates:
  # An example job template. Can be referenced from other jobs. Will not itself be deployed.
  ExamplePythonTemplate:
    executionRoleArn: some-arn
    # (required) Command definition for the glue job
    command:
      # (required) Either of "glueetl" | "pythonshell"
      name: 'glueetl'
      # (optional) Python version.  Either "2" or "3"
      pythonVersion: '3'
      # (required) Path to a .py file relative to the configuration.
      scriptLocation: ./src/glue/python/job.py
    # (required) Description of the Glue Job
    description: Example of a Glue Job using an inline script
    # (optional) List of connections for the glue job to use.  Reference back to the connection name in the 'connections:' section of the project.yaml
    connections:
      - project:connections/connectionVpc
    # (optional) key: value pairs for the glue job to use.  see: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
    defaultArguments:
      --job-bookmark-option: job-bookmark-enable
    # (optional) maximum concurrent runs.  See: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html#aws-glue-api-jobs-job-ExecutionProperty
    executionProperty:
      maxConcurrentRuns: 1
    # (optional) Glue version to use as a string.  See: https://docs.aws.amazon.com/glue/latest/dg/release-notes.html
    glueVersion: '2.0'
    # (optional) Maximum capacity.  See: MaxCapcity Section: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html
    # Use maxCapacity or WorkerType.  Not both.
    #maxCapacity: 1
    # (optional) Maximum retries.  see: MaxRetries section:
    maxRetries: 3
    # (optional) Number of minutes to wait before sending a job run delay notification.
    notificationProperty:
      notifyDelayAfter: 1
    # (optional) Number of workers to provision
    #numberOfWorkers: 1
    # (optional) Number of minutes to wait before considering the job timed out
    timeout: 60
    # (optional) Worker type to use.  Any of: "Standard" | "G.1X" | "G.2X"
    # Use maxCapacity or WorkerType.  Not both.
    #workerType: Standard

  # An example job template. Can be referenced from other jobs. Will not itself be deployed.
  ExampleScalaTemplate:
    executionRoleArn: some-arn
    # (required) Command definition for the glue job
    # (optional) key: value pairs for the glue job to use.  see: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
    defaultArguments:
      --job-language: scala
    # (optional) Glue version to use as a string.  See: https://docs.aws.amazon.com/glue/latest/dg/release-notes.html
    glueVersion: '5.0'

jobs:
  # Job definitions below
  PythonJobOne: # Job Name
    template: 'ExamplePythonTemplate' # Reference a job template.
    defaultArguments:
      --Input: s3://some-bucket/some-location1
    allocatedCapacity: 2
    continuousLogging:
      # For allowed values, refer https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_logs.RetentionDays.html
      # Possible values are: 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 3653, and 0.
      logGroupRetentionDays: 3

  PythonJobTwo:
    template: 'ExamplePythonTemplate' # Reference a job template.
    defaultArguments:
      --Input: s3://some-bucket/some-location2
      --enable-spark-ui: 'true'
      --spark-event-logs-path: s3://some-bucket/spark-event-logs-path/JobTwo/
    allocatedCapacity: 20
    # (Optional) List of all the helper scripts reference in main glue ETL script.
    # All these helper scripts will be grouped at immediate parent directory level, which will result in dedicated zip.
    # After deployment, they will be alongside the main script. Hence, must be referenced by file names directly from main glue script
    # Example (main.py)
    # from core import core_function1, core_function2;
    # from helper_etl import helper_function1, helper_function2;
    additionalScripts:
      - ./src/glue/python/helper_etl.py
      - ./src/glue/python/utils/core.py
    # (Optional) List of additional files which will be available to the Glue Job next to the main script
    additionalFiles:
      - ./src/glue/scala/extra_file.txt

  # Job definitions below
  ScalaJobOne: # Job Name
    template: 'ExampleScalaTemplate' # Reference a job template.
    description: testing
    defaultArguments:
      --class: some.java.package.App
    allocatedCapacity: 2
    command:
      # (required) Either of "glueetl" | "pythonshell"
      name: 'glueetl'
      # (required) Path to a script file relative to the configuration.
      scriptLocation: ./src/glue/scala/App.scala
    # (Optional) List of additional files which will be available to the Glue Job next to the main script
    additionalFiles:
      - ./src/glue/scala/extra_file.txt
    # (Optional) List of additional jars which will be loaded into the Spark driver and executor JVMs for use
    # within the Scala script
    additionalJars:
      - ./src/glue/scala/lib/extra.jar