npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

talentgarden-serverless-glue

v27.0.2

Published

Felipe Nuñez's Serverless plugin to deploy Glue Jobs, adapted by Edoardo Polito for Talent Garden

Downloads

30

Readme

Talent Garden Serverless Glue

This is based on Felipe Nuñez's Serverless plugin to deploy Glue Jobs, adapted by Edoardo Polito for Talent Garden.


This is a plugin for Serverless framework that provide the possibility to deploy AWS Glue Jobs and Triggers

Install

  1. run npm install --save-dev talentgarden-serverless-glue
  2. add serverless-glue in serverless.yml plugin section
    plugins:
        - talentgarden-serverless-glue

How it works

The plugin creates CloufFormation resources of your configuration before making the serverless deploy then add it to the serverless template.

So any glue-job deployed with this plugin is part of your stack too.

How to configure your GlueJob(s)

Configure your glue jobs in the root of servelress.yml like this:

Glue:
  bucketDeploy: someBucket # Required
  createBucket: true # Optional, default = false
  createBucketConfig: # Optional 
    ACL: private # Optional, private | public-read | public-read-write | authenticated-read
    LocationConstraint: af-south-1
    GrantFullControl: 'STRING_VALUE' # Optional
    GrantRead: 'STRING_VALUE' # Optional
    GrantReadACP: 'STRING_VALUE' # Optional
    GrantWrite: 'STRING_VALUE' # Optional
    GrantWriteACP: 'STRING_VALUE' # Optional
    ObjectLockEnabledForBucket: true # Optional
    ObjectOwnership: BucketOwnerPreferred # Optional
  s3Prefix: some/s3/key/location/ # optional, S3 prefix where all of your job will be uploaded into; default = 'glueJobs/'
  tempDirBucket: someBucket # optional, default = '{serverless.serviceName}-{provider.stage}-gluejobstemp'
  tempDirS3Prefix: some/s3/key/location/ # optional, default = ''. The job name will be appended to the prefix name
  jobs:
    - resourceName: SuperGlueJob # Optional, you can use this parameter to reference the job in your serverless template. The value is converted to PascalCase. When not specified, JobName is used
      name: super-glue-job # Required
      scriptPath: src/script.py # Required script will be named with the name after '/' and uploaded to s3Prefix location
      scriptS3LocationPrefix: some/s3/prefix # Optional, use this key instwead of s3Prefix if you want to specify a different S3 prefix for each Glue Job in your stack
      Description: # Optional, string
      tempDir: true # Optional true | false
      type: spark # spark / pythonshell # Required
      glueVersion: python3-2.0 # Required python3-1.0 | python3-2.0 | python3-3.0 | python2-1.0 | python2-0.9 | scala2-1.0 | scala2-0.9 | scala2-2.0
      pythonVersion: "3.9" # Optional
      librarySet: analytics # Optional
      role: arn:aws:iam::000000000:role/someRole # Required
      MaxCapacity: 1 #Optional
      MaxConcurrentRuns: 3 # Optional
      WorkerType: Standard # Optional, G.1X | G.2X
      NumberOfWorkers: 1 # Optional
      Connections: # Optional
        - some-conection-string
        - other-conection-string
      Timeout: # Optional, number
      MaxRetries: # Optional, number
      DefaultArguments: # Optional
        class: string # Optional
        scriptLocation: string # Optional
        extraPyFiles: string # Optional
        extraJars: string # Optional
        userJarsFirst: string # Optional
        usePostgresDriver: string # Optional
        extraFiles: string # Optional
        disableProxy: string # Optional
        jobBookmarkOption: string # Optional
        enableAutoScaling: string # Optional
        enableS3ParquetOptimizedCommitter: string # Optional
        enableRenameAlgorithmV2: string # Optional
        enableGlueDatacatalog: string # Optional
        enableMetrics: string # Optional
        enableContinuousCloudwatchLog: string # Optional
        enableContinuousLogFilter: string # Optional
        continuousLogLogGroup: string # Optional
        continuousLogLogStreamPrefix: string # Optional
        continuousLogConversionPattern: string # Optional
        enableSparkUi: string # Optional
        sparkEventLogsPath: string # Optional
        customArguments: # Optional; these are user-specified custom default arguments that are passed into cloudformation with a leading -- (required for glue)
          custom_arg_1: custom_value
          custom_arg_2: other_custom_value
      SupportFiles: # Optional
        - local_path: path/to/file/or/folder/ # Required if SupportFiles is given, you can pass a folder path or a file path
          s3_bucket: bucket-name-where-to-upload-files # Required if SupportFiles is given
          s3_prefix: some/s3/key/location/ # Required if SupportFiles is given
          execute_upload: True # Boolean, True to execute upload, False to not upload. Required if SupportFiles is given
      Tags:
        job_tag_example_1: example1
        job_tag_example_2: example2
  triggers:
    - name: some-trigger-name # Required
      Description: # Optional, string
      StartOnCreation: True # Optional, True or False
      schedule: 30 12 * * ? * # Optional, CRON expression. The trigger will be created with On-Demand type if the schedule is not provided.
      Tags:
        trigger_tag_example_1: example1     
      actions: # Required. One or more jobs to trigger
        - name: super-glue-job # Required
          args: # Optional
            custom_arg_1: custom_value
            custom_arg_2: other_custom_value
          timeout: 30 # Optional, if set, it overwrites specific jobs timeout when job starts via trigger
          SecurityConfiguration: # Optional, name of security configuration

You can define a lot of jobs...

  Glue:
    bucketDeploy: someBucket
    jobs:
      - name: jobA
        scriptPath: scriptA
        ...
      - name: jobB
        scriptPath: scriptB
        ...

And a lot of triggers...

  Glue:
    triggers:
        - name:
            ...
        - name:
            ...

Glue configuration parameters

|Parameter|Type|Description|Required| |-|-|-|-| |bucketDeploy|String|S3 Bucket name|true| |createBucket|Boolean|If true, a bucket named as bucketDeploy will be created before. Helpful if you have not created the bucket first|false| createBucketConfig|createBucketConfig| Bucket configuration for creation on S3 |false| |s3Prefix|String|S3 prefix name|false| |tempDirBucket|String|S3 Bucket name for Glue temporary directory. If dont pass argument the bucket'name will generates with pattern {serverless.serviceName}-{provider.stage}-gluejobstemp|false| |tempDirS3Prefix|String|S3 prefix name for Glue temporary directory|false| |jobs|Array|Array of glue jobs to deploy|true|

CreateBucket confoguration parameters

|Parameter|Type|Description|Required| |-|-|-|-| |ACL|String|The canned ACL to apply to the bucket. Possible values include:privatepublic-readpublic-read-writeauthenticated-read|False| |LocationConstraint|String| Specifies the Region where the bucket will be created. If you don't specify a Region, the bucket is created in the US East (N. Virginia) Region (us-east-1). Possible values are: af-south-1ap-east-1ap-northeast-1ap-northeast-2ap-northeast-3ap-south-1ap-southeast-1ap-southeast-2ca-central-1cn-north-1cn-northwest-1EUeu-central-1eu-north-1eu-south-1eu-west-1eu-west-2eu-west-3me-south-1sa-east-1us-east-2us-gov-east-1us-gov-west-1us-west-1us-west-2|false| |GrantFullControl|String|Allows grantee the read, write, read ACP, and write ACP permissions on the bucket.|false| |GrantRead|(String|Allows grantee to list the objects in the bucket.|false| |GrantReadACP|String|Allows grantee to read the bucket ACL.|false| |GrantWrite|String|Allows grantee to create new objects in the bucket. For the bucket and object owners of existing objects, also allows deletions and overwrites of those objects.|false| |GrantWriteACP|String|Allows grantee to write the ACL for the applicable bucket.|false| |ObjectLockEnabledForBucket|Boolean|Specifies whether you want S3 Object Lock to be enabled for the new bucket.|false| |ObjectOwnership|String|The container element for object ownership for a bucket's ownership controls.Possible values include:BucketOwnerPreferredObjectWriterBucketOwnerEnforced|false|

Jobs configurations parameters

|Parameter|Type|Description|Required| |-|-|-|-| |resourceMame|String|Logical name of the resource|false| |name|String|name of job|true| |Description|String|Description of the job|False| |scriptPath|String|script path in the project|true| |tempDir|Boolean|flag indicate if job required a temp folder, if true plugin create a bucket for tmp|false| |type|String|Indicate if the type of your job. Values can use are : spark or pythonshell|true| |glueVersion|String|Indicate language and glue version to use ( [language][version]-[glue version]) the value can you use are: python3-1.0python3-2.0python3-3.0python2-1.0python2-0.9scala2-1.0scala2-0.9scala2-2.0|true| |pythonVersion|String| Python version to use when pythonshell job|false| |librarySet|String| What Glue Python Library preset to load|false| |role|String| arn role to execute job|true| |MaxCapacity|Double| The number of AWS Glue data processing units (DPUs) that can be allocated when this job runs|false| |MaxConcurrentRuns|Double|max concurrent runs of the job|false| |MaxRetries|Int|Maximum number of retires in case of failure|False| |Timeout|Int|Job timeout in number of minutes|False| |WorkerType|String|The type of predefined worker that is allocated when a job runs. Accepts a value of Standard, G.1X, or G.2X.|false| |NumberOfWorkers|Integer|number of workers|false| |SecurityConfiguration|String|The name of the security configuration that the job should use|false| |Connections|List|a list of connections used by the job|false| |DefaultArguments|object|Special Parameters Used by AWS Glue for mor information see this read the AWS documentation|false| |SupportFiles|List|List of supporting files for the glue job that need upload to S3|false| |Tags|JSON|The tags to use with this job. You may use tags to limit access to the job. For more information about tags in AWS Glue, see AWS Tags in AWS Glue in the developer guide.|false|

Triggers configuration parameters

|Parameter|Type|Description|Required| |-|-|-|-| |name|String|name of the trigger|true| |schedule|String|CRON expression|false| |actions|Array|An array of jobs to trigger|true| |Description|String|Description of the Trigger|False| |StartOnCreation|Boolean|Whether the trigger starts when created. Not supperted for ON_DEMAND triggers|False|

Only On-Demand and Scheduled triggers are supported.

Trigger job configuration parameters

|Parameter|Type|Description|Required| |-|-|-|-| |name|String|The name of the Glue job to trigger|true| |timeout|Integer|Job execution timeout. It overwrites|false| |args|Map|job arguments|false| |Tags|JSON|The tags to use with this triggers. For more information about tags in AWS Glue, see AWS Tags in AWS Glue in the developer guide.|false|

And now?...

Only run serverless deploy