@renovosolutions/cdk-library-aurora-native-backup

v0.1.1

Published

3 months ago

AWS CDK construct library for Aurora backup and restore using ECS on a schedule, storing backups in S3.

Downloads

0High
0Medium
0Low

renovosolutions_automation

bmiller08

sfarage

ataraxia937

aurora aws-cdk aws-cdk-construct backup cdk projen serverless

cdk-library-aurora-native-backup

A CDK construct library that creates and manages Docker images for Aurora PostgreSQL native backups using pg_dump. The resulting images are designed for use with Amazon ECS Fargate for scalable, serverless backup operations.

Features

Multi-Database Support: Back up multiple databases from the same Aurora cluster in a single service
Pre-built Docker Image: Amazon Linux 2023 base with PostgreSQL 17 client tools and AWS CLI v2
ECR Repository Management: Automatically creates and manages ECR repositories with security best practices
Complete Backup Service: Ready-to-use ECS Fargate service for scheduled Aurora backups
EFS and S3 Support: Built-in support for backing up to EFS with S3 sync
Comprehensive Backup: Uses pg_dump directory format for efficient storage and simplified restore
Production Ready: Includes proper error handling, logging, and cleanup mechanisms
Secure Authentication: Uses AWS Secrets Manager for database password management

API Doc

See API

Interface Structure

The library provides two main constructs, each with its own configuration interface:

AuroraBackupRepository (AuroraBackupRepositoryProps): Manages the ECR repository and Docker image for backups.
AuroraNativeBackupService (AuroraNativeBackupServiceProps): Manages the backup service infrastructure (VPC, Aurora cluster, S3 bucket, compute resources, etc.), and uses:
- AuroraBackupConnectionProps: For database connection settings (username, database names array, password secret).

This separation allows for cleaner organization of image/repository management, connection credentials, and infrastructure settings.

Multi-Database Support

The library supports backing up multiple databases from the same Aurora PostgreSQL cluster in a single backup service. Simply provide an array of database names in the databaseNames property (defaults to ['postgres'] if not specified). Each database will be backed up separately and stored in its own S3 folder structure.

Database User Setup

Create a dedicated database user with read-only backup permissions on ALL databases to be backed up.

For PostgreSQL 14+ (recommended), use the built-in pg_read_all_data role for comprehensive read access:

-- Connect to each database and grant permissions
\c your_database_1;
GRANT CONNECT ON DATABASE your_database_1 TO backup_user;
GRANT pg_read_all_data TO backup_user;

-- Repeat for each additional database
\c your_database_2;
GRANT CONNECT ON DATABASE your_database_2 TO backup_user;
GRANT pg_read_all_data TO backup_user;

The pg_read_all_data role automatically provides:

SELECT on all tables and views
USAGE on all schemas
SELECT and USAGE on all sequences
Access to future objects without requiring additional grants

Note: This library requires PostgreSQL 14 or newer for the pg_read_all_data role.

Shortcomings

The backup service requires password-based authentication (no IAM database authentication for now)
The backup container runs as a scheduled task, not continuously, so it cannot capture incremental changes
Custom backup scripts are not currently supported, only the built-in pg_dump functionality
When backing up multiple databases, if one database backup fails, the task continues with the remaining databases but the overall task does not fail - individual database backup failures must be monitored through CloudWatch logs

Examples

Prerequisites

To use this construct, you must have:

An AWS CDK stack with a defined environment (account and region)
An existing VPC for the backup service
An existing Aurora PostgreSQL database cluster
An AWS Secrets Manager secret containing database credentials (recommended)
A database user with the required backup permissions (see above)

Complete Backup Service (Recommended)

For most use cases, use the AuroraNativeBackupService which provides a complete, ready-to-use backup solution:

TypeScript

import { Stack, StackProps, Duration, aws_ec2 as ec2, aws_rds as rds, aws_scheduler as scheduler, aws_secretsmanager as secretsmanager } from 'aws-cdk-lib';
import { Construct } from 'constructs';
import { AuroraNativeBackupService, AuroraBackupRepository } from '@renovosolutions/cdk-library-aurora-native-backup';

export class BackupServiceStack extends Stack {
  constructor(scope: Construct, id: string, props: StackProps) {
    super(scope, id, props);

    // Your existing Aurora PostgreSQL database cluster and VPC
    const vpc = ec2.Vpc.fromLookup(this, 'Vpc', { isDefault: true });
    const dbCluster = rds.DatabaseCluster.fromDatabaseClusterAttributes(this, 'DbCluster', {
      clusterIdentifier: 'my-production-cluster',
      clusterEndpointAddress: 'cluster.xyz.region.rds.amazonaws.com',
      port: 5432,
    });

    // First create the backup repository
    const backupRepository = new AuroraBackupRepository(this, 'BackupRepository', {
      repositoryName: 'aurora-postgres-backup',
    });

    // Secret containing the backup user's password
    const backupUserSecret = secretsmanager.Secret.fromSecretAttributes(this, 'BackupUserSecret', {
      secretArn: 'arn:aws:secretsmanager:region:account:secret:backup-user-password-abc123',
    });

    // Create the complete backup service
    const backupService = new AuroraNativeBackupService(this, 'BackupService', {
      cluster: dbCluster,
      vpc,
      backupBucketName: 'my-aurora-production-backups',
      ecrRepository: backupRepository.repository,
      connection: {
        username: 'backup_user',
        databaseNames: ['production', 'analytics', 'reporting'],
        passwordSecret: backupUserSecret,
      },
      retentionDays: 30,
      backupSchedule: scheduler.ScheduleExpression.cron({ minute: '0', hour: '2' }), // Daily at 2 AM UTC
      cpu: 1024, // Override default of 256
      memoryLimitMiB: 2048, // Override default of 512
    });
  }
}

Python

from aws_cdk import (
  Stack,
  Duration,
  aws_ec2 as ec2,
  aws_rds as rds,
  aws_scheduler as scheduler,
  aws_secretsmanager as secretsmanager
)
from constructs import Construct
from cdk_library_aurora_native_backup import AuroraNativeBackupService, AuroraBackupRepository

class BackupServiceStack(Stack):
  def __init__(self, scope: Construct, id: str, **kwargs):
    super().__init__(scope, id, **kwargs)

    # Your existing Aurora PostgreSQL database cluster and VPC
    vpc = ec2.Vpc.from_lookup(self, "Vpc", is_default=True)
    db_cluster = rds.DatabaseCluster.from_database_cluster_attributes(self, "DbCluster",
      cluster_identifier="my-production-cluster",
      cluster_endpoint_address="cluster.xyz.region.rds.amazonaws.com",
      port=5432
    )

    # First create the backup repository
    backup_repository = AuroraBackupRepository(self, "BackupRepository",
      repository_name="aurora-postgres-backup"
    )

    # Secret containing the backup user's password
    backup_user_secret = secretsmanager.Secret.from_secret_attributes(self, "BackupUserSecret",
      secret_arn="arn:aws:secretsmanager:region:account:secret:backup-user-password-abc123"
    )

    # Create the complete backup service
    backup_service = AuroraNativeBackupService(self, "BackupService",
      cluster=db_cluster,
      vpc=vpc,
      backup_bucket_name="my-aurora-production-backups",
      ecr_repository=backup_repository.repository,
      connection={
        "username": "backup_user",
        "database_names": ["production", "analytics", "reporting"],
        "password_secret": backup_user_secret
      },
      retention_days=30,
      backup_schedule=scheduler.ScheduleExpression.cron(minute='0', hour='2'),  # Daily at 2 AM UTC
      cpu=1024,  # Override default of 256
      memory_limit_mi_b=2048  # Override default of 512
    )

Environment Variables

All environment variables used by the backup container are set automatically by the constructs. You do not need to set them manually.

| Environment Variable | Description | CDK Prop / Source | |-----------------------|-----------------------------------------------------------|------------------------------------------| | DB_HOST | Aurora PostgreSQL database cluster endpoint | cluster.clusterEndpoint.hostname | | DB_NAMES | Array of database names to backup | connection.databaseNames | | DB_USER | Database username | connection.username | | DB_PASSWORD | Database password | connection.passwordSecret | | AWS_REGION | AWS region | Stack.region | | CLUSTER_IDENTIFIER | Cluster ID used as S3 path prefix (backups/{CLUSTER_IDENTIFIER}/) | cluster.clusterIdentifier | | DB_PORT | Database port (default: 5432) | cluster.clusterEndpoint.port | | BACKUP_ROOT | Backup directory (default: /mnt/aurora-backups) | (internal default) | | S3_BUCKET | S3 bucket for backup sync | backupBucketName | | S3_PREFIX | S3 prefix (default: backups) | (internal default) |

Backup Process

Validation: Checks AWS credentials and creates backup directories
Database Backup: For each database in the DB_NAMES array:
- Uses pg_dump --format=directory with gzip compression (level 9) for each data file
- Creates separate backup directory per database with date stamp
- If one database backup fails, continues with remaining databases
Verification: Validates each backup contains toc.dat file
S3 Sync: Syncs each database backup to S3 bucket under separate database folders
Cleanup: Removes local backups after successful S3 sync

Security Considerations

ECR repositories created with image scanning enabled
EFS encryption in transit supported
IAM permissions follow principle of least privilege
Use AWS Secrets Manager for database passwords in production
Consider VPC endpoints for S3 to avoid internet traffic

Backup Storage Structure

Local EFS structure (per database):

/mnt/aurora-backups/
├── production/
│   └── YYYY-MM-DD/
│       ├── toc.dat                # PostgreSQL table of contents
│       ├── ####.dat.gz            # Compressed table data files
│       └── ####.dat.gz            # Additional data files
├── analytics/
│   └── YYYY-MM-DD/
│       ├── toc.dat
│       └── ####.dat.gz
└── reporting/
    └── YYYY-MM-DD/
        ├── toc.dat
        └── ####.dat.gz

S3 structure:

s3://my-backup-bucket/
└── backups/
    └── {CLUSTER_IDENTIFIER}/
        ├── production/
        │   └── YYYY-MM-DD/
        │       ├── toc.dat
        │       └── ####.dat.gz
        ├── analytics/
        │   └── YYYY-MM-DD/
        │       ├── toc.dat
        │       └── ####.dat.gz
        └── reporting/
            └── YYYY-MM-DD/
                ├── toc.dat
                └── ####.dat.gz

Restoration

Interactive Restore CLI (Recommended)

This library includes an interactive TypeScript CLI that simplifies the restore process with auto-discovery and guided prompts:

npx ts-node restore_script/aurora-restore-cli.ts

Features:

Auto-discovery: Automatically finds S3 backup buckets using the aurora_native_backup_bucket=true tag
Interactive selection: Guided prompts for cluster, database, backup date, and tables
Table-level restore: Select specific tables or restore entire database
Optimized downloads: Only downloads required backup files
Ready-to-run commands: Generates and optionally executes pg_restore commands

Prerequisites:

Node.js and TypeScript installed
AWS credentials configured (via AWS CLI, environment variables, or IAM role)
pg_restore command available in your PATH
Network access to target PostgreSQL database
Database user with restore permissions on target database:
- CREATE privilege (for creating tables, indexes, constraints)
- INSERT privilege (for loading data)
- USAGE and CREATE on schemas
- For full database restore: CREATEDB privilege or superuser role

Setup and Execution:

First, install dependencies:

cd restore_script
yarn install

Then run the interactive CLI:

npx ts-node aurora-restore-cli.ts

The CLI will guide you through selecting your backup source, target database, and specific tables to restore.

Workflow:

S3 Configuration: Auto-discovers backup bucket or prompts for manual entry
Source Selection: Choose cluster, database, and backup date
Table Selection: Select specific tables or full database restore
Target Configuration: Enter target database connection details
Execution: Downloads backup files and generates restore command

Manual Restoration

For advanced users or automation, backups are stored in S3 under organized paths:

s3://my-backup-bucket/backups/{CLUSTER_IDENTIFIER}/{DATABASE_NAME}/YYYY-MM-DD/

Download backup files:

aws s3 cp --recursive s3://my-backup-bucket/backups/{CLUSTER_IDENTIFIER}/production/YYYY-MM-DD/ /path/to/backup/directory/

Restore commands:

Full database restore:

pg_restore -h target-host -U username -d target_db -v -C /path/to/backup/directory/

List backup contents:

pg_restore --list /path/to/backup/directory/

Selective table restore:

pg_restore -h target-host -U username -d target_db -v -t table_name /path/to/backup/directory/

Contributing

Contributions are welcome! Please follow these guidelines to help us maintain and improve the project:

Code Structure and Interfaces

The main user-facing interfaces are:
- AuroraBackupRepositoryProps in src/aurora-backup-repository.ts
- AuroraNativeBackupServiceProps and AuroraBackupConnectionProps in src/aurora-native-backup-service.ts
All constructs and their configuration interfaces are defined in the src/ directory.

Code Generation and Project Tasks

This project uses projen for project management and code generation.
If you make changes to the project configuration (.projenrc.ts), run:
```
npx projen
```
This will regenerate all managed files, including package.json and other configuration files.

Building and Testing

To build the project and run all tests, use:
```
yarn build
```
This will compile the code, run unit tests, and ensure everything is up to date.

License

This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.