@aws-mdaa/dataops-data-quality-l3-construct
v1.4.0
Published
MDAA DataOps Data Quality L3 Construct
Downloads
50
Readme
DataOps Data Quality L3 Construct
AWS CDK L3 Construct for deploying AWS Glue Data Quality rulesets in DataOps workflows.
Overview
This construct creates AWS Glue Data Quality rulesets for automated validation and monitoring of data in Glue Catalog tables. It supports both DQDL (Data Quality Definition Language) strings and structured rule objects for defining validation rules.
Features
- Flexible Rule Definition: Support for both raw DQDL strings and structured rule objects
- Comprehensive Rule Types: 15+ built-in rule types for common validation scenarios
- SSM Integration: Automatic publishing of ruleset metadata to SSM Parameter Store
- Project Integration: Seamless integration with DataOps project infrastructure
- Type Safety: Full TypeScript type definitions for all rule types
Installation
npm install @aws-mdaa/dataops-data-quality-l3-constructUsage
Basic Example
import { DataOpsDataQualityL3Construct } from '@aws-mdaa/dataops-data-quality-l3-construct';
import { Stack } from 'aws-cdk-lib';
const stack = new Stack(app, 'DataQualityStack');
new DataOpsDataQualityL3Construct(stack, 'DataQuality', {
naming: myNaming,
domainUnit: myDomainUnit,
projectName: 'my-dataops-project',
rulesetConfigs: {
'customer-quality': {
name: 'customer-data-validation',
description: 'Validate customer data quality',
targetTable: {
databaseName: 'customer_db',
tableName: 'customers',
},
ruleset: [
{
RuleType: 'IsComplete',
Column: 'customer_id',
},
{
RuleType: 'Uniqueness',
Column: 'email',
Operator: '>',
Threshold: 0.95,
},
],
},
},
});Using DQDL Strings
new DataOpsDataQualityL3Construct(stack, 'DataQuality', {
naming: myNaming,
domainUnit: myDomainUnit,
projectName: 'my-dataops-project',
rulesetConfigs: {
'order-quality': {
name: 'order-validation',
targetTable: {
databaseName: 'orders_db',
tableName: 'orders',
},
ruleset: `Rules = [
IsComplete "order_id",
ColumnValues "status" in ["pending", "completed", "cancelled"],
RowCount > 0
]`,
},
},
});Supported Rule Types
Completeness Rules
IsComplete: Column has no null valuesCompleteness: Column meets completeness threshold
Uniqueness Rules
IsUnique: Column has all unique valuesUniqueness: Column meets uniqueness thresholdIsPrimaryKey: Column is a valid primary key
Schema Rules
ColumnExists: Column exists in tableColumnDataType: Column has expected data typeColumnLength: Column length meets criteria
Value Rules
ColumnValues: Column values are in allowed set
Count Rules
RowCount: Table row count meets criteriaColumnCount: Table column count meets criteria
Statistical Rules
Mean: Column mean meets criteriaStandardDeviation: Column standard deviation meets criteria
Freshness Rules
DataFreshness: Data is recent based on timestamp column
Custom Rules
CustomSql: Custom SQL queries for complex validation
API Reference
DataOpsDataQualityL3ConstructProps
| Property | Type | Required | Description |
|----------|------|----------|-------------|
| rulesetConfigs | { [key: string]: DataQualityRulesetDefinition } | Yes | Map of ruleset configurations |
| projectName | string | Yes | DataOps project name for resource coordination |
| naming | IMdaaResourceNaming | Yes | Naming configuration |
| domainUnit | DomainUnit | Yes | Domain unit configuration |
DataQualityRulesetDefinition
| Property | Type | Required | Description |
|----------|------|----------|-------------|
| name | string | Yes | Unique ruleset name |
| targetTable | DataQualityTargetTable | Yes | Target table configuration |
| ruleset | string \| DataQualityRule[] | Yes | DQDL string or rule objects |
| description | string | No | Ruleset description |
DataQualityTargetTable
| Property | Type | Required | Description |
|----------|------|----------|-------------|
| databaseName | string | Yes | Glue database name |
| tableName | string | Yes | Glue table name |
| catalogId | string | No | AWS account ID for cross-account access |
DataQualityRule
| Property | Type | Required | Description |
|----------|------|----------|-------------|
| RuleType | string | Yes | Type of validation rule |
| Column | string | No | Column name for column-specific rules |
| Operator | string | No | Comparison operator |
| Threshold | number | No | Threshold value (0-1) |
| Value | number | No | Numeric comparison value |
| Values | (string \| number)[] | No | Allowed values list |
| Where | string | No | SQL WHERE clause |
| Sql | string | No | Custom SQL query |
| DataType | string | No | Expected data type |
| Duration | string | No | Duration for freshness checks |
License
This project is licensed under the Apache-2.0 License.
