Enabling RDS Migrations with Flyway in CodeBuild
Overview
Flyway is a tool used to manage schema and data model changes in a relational database, such as Postgres. Instead of using a complicated DSL it uses plain SQL scripts. The migrations are applied in order by a file naming convention; for example a date/time version stamp and a descriptive filename. They are located together, typically in a flyway/migrations
folder in your project. A simple SQL schema creation example with PostgreSQL:
-- flyway/migrations/V2023.01.05.001__users_and_roles.sql -- Postgres - enable crypto lib for the gen_random_uuid function CREATE EXTENSION IF NOT EXISTS pgcrypto; -- New table for application users create table user_accounts ( id uuid primary key default gen_random_uuid(), cognito_id varchar not null, email_address varchar not null, first_name varchar not null, last_name varchar not null, create_date date not null, unique(cognito_id) ); CREATE TABLE IF NOT EXISTS app_roles ( role_id uuid primary key default gen_random_uuid(), role_name varchar not null );
Another migration file follows:
-- flyway/migrations/V2023.01.06.001__add_role-association.sql CREATE TABLE IF NOT EXISTS user_app_roles ( user_id uuid references user_accounts, role_id uuid references app_roles, active boolean not null default true, primary key (user_id, role_id) ); insert into app_roles(role_name) values ('admin'); insert into app_roles(role_name) values ('user');
Flyway is written in Java, and requires a Java Runtime to function. It can run from Docker images, downloaded JAR files or by using WGET to fetch it from RedGate’s website.
Using Flyway is as easy as pointing to a migrations directory and passing the appropriate flags. The tool maintains a table of applied changes, flyway_schema_history
, and generates an associated checksum based on the SQL statements contained within the migration.
This guide explains how to attach CodeBuild to a running AWS RDS database instance by defining security groups appropriately, and then invoking Flyway from within AWS CodeBuild.
The Challenge
In AWS deployments, an RDS database is generally walled off within its own security groups in one or more private subnets of a VPC. Placing an RDS instance in a public subnet risks exposing a database to the Internet, and is generally an unsafe practice.
CodeBuild also runs within a security group. To get CodeBuild to talk to the database, you must configure a connection between the groups, known as in ingress rule, to allow for proper communications..
If your application runs within a VPC (and most container-based applications running on ECS/EKS do), you’ll also need CodeBuild to run within the same VPC so, that it can connect to the RDS database.
Let's see how we can get this done.
Key RDS Configuration Details
AWS RDS is Amazon's database service offering. It is actually many different services, from configuration-provisioned instances and clusters to a serverless (i.e. managed by AWS) engine that can be ramped up and down as needed.
We’ll set up an Aurora V2 Serverless RDS cluster to run our PostgreSQL database serverless instance. (Serveless Instances are a type of database instance that is fully managed by AWS once you send it the appropriate scaling instructions).
This RDS example is configured to connect to a VPC by defining a Database Security Group and using the `VpcId` property to reference the VPC. Here's a sample:
DatabaseSecurityGroup: Type: "AWS::EC2::SecurityGroup" Properties: GroupName: "react-nodejs-sample-dbSG" VpcId: !Ref VpcId
We need four additional objects: an RDS Parameter Group, an RDS Subnet Group, a Database Cluster, and a Database Instance.
The RDS Parameter Group
A parameter group allows setting database configuration parameters across all database instances for the RDS cluster. Here is a minimal example:
RDSParameterGroup: Type: "AWS::RDS::DBClusterParameterGroup" Properties: Description: !Sub "Managed by CloudFormation: ${AWS::StackName}" Family: "aurora-postgresql14" Parameters: client_encoding: "UTF8"
The RDS Subnet Group
An RDSSubnetGroup is defined to connect one or more of the VPCs subnets (usually the private ones) that will ultimately be connected to the RDS cluster.
RDSSubnetGroup: Type: "AWS::RDS::DBSubnetGroup" Properties: DBSubnetGroupName: !Sub "${AWS::StackName}-rds-subnet-group" DBSubnetGroupDescription: !Sub "Managed by CloudFormation: ${AWS::StackName}" SubnetIds: [ !Ref PrivateSubnet01, !Ref PrivateSubnet02 ]
The RDS Cluster
The RDS Cluster then references both the Subnet Group and the Security Group (and refers to a Secrets Manager configuration named DBSecrets in our sample):
DatabaseCluster: Type: "AWS::RDS::DBCluster" Properties: Engine: "aurora-postgresql" EngineVersion: "14.6" ServerlessV2ScalingConfiguration: MinCapacity: 0.5 MaxCapacity: 2 # enable logs? # EnableCloudwatchLogsExports: # - postgresql DBClusterParameterGroupName: !Ref RDSParameterGroup MasterUsername: !Sub "{{resolve:secretsmanager:${DBSecrets}:SecretString:dbUserName}}" MasterUserPassword: !Sub "{{resolve:secretsmanager:${DBSecrets}:SecretString:dbPassword}}" DatabaseName: !Sub "{{resolve:secretsmanager:${DBSecrets}:SecretString:dbDatabaseName}}" Port: 5432 # note, default is 3306, even for Postgres DBSubnetGroupName: !Ref RDSSubnetGroup VpcSecurityGroupIds: - !Ref DatabaseSecurityGroup # can define when backups and maintenance run # PreferredMaintenanceWindow: "Sun:06:00-Sun:06:59" # PreferredBackupWindow: "05:00-05:30" BackupRetentionPeriod: 7 StorageEncrypted: true
The RDS Database Instance
Now we’re ready to set up our actual “serverless database instance”.
Side note here: “serverless … instance?” Doesn’t serverless imply that the instances are AWS’s problem? Yes, but you need to take this step for Serverless Aurora V2 (unlike V1) and specifically NOT set the EngineMode
to serverless
like in Serverless Aurora V1 in order to not get a very confusing “the service is not available at this time” error.
Rant over. Moving on to the Database Instance configuration:
ServerlessInstance: Type: "AWS::RDS::DBInstance" Properties: DBClusterIdentifier: !Ref DatabaseCluster DBInstanceClass: db.serverless Engine: aurora-postgresql
CodeBuild and its Security Group
Of course, we also have to configure CodeBuild, which executes an AWS-provisioned Virtual Machine to run builds.
CodeBuild requires a security group so that it can access resources within the VPC.
CodeBuildSecurityGroup: Type: "AWS::EC2::SecurityGroup" Properties: GroupName: !Sub "${AWS::StackName}-cbSG" GroupDescription: "Allows access to services from CodeBuild" VpcId: !Ref VpcId
Since CodeBuild is configured with a separate Security Group than RDS and we have no further rules defined, it cannot yet communicate with the RDS host, even though CodeBuild is likely running within the same subnet.
Security Group Ingress Rule to Access RDS
We can allow traffic from resources using one Security Group to access services using another Security Group by defining an ingress rule. The ingress rule we attach to the Database Security Group allows traffic from CodeBuild to access port 5432 of the RDS database:
DatabaseSecurityGroupCBtoDB: Type: "AWS::EC2::SecurityGroupIngress" Properties: GroupId: !Ref DatabaseSecurityGroup Description: "Access from codebuild container" IpProtocol: "tcp" FromPort: 5432 ToPort: 5432 SourceSecurityGroupId: !GetAtt CodeBuildSecurityGroup.GroupId
Now your CodeBuild scripts should be able to run any SQL applications, such as psql
or flyway
, against the hostname defined by your RDS service.
Configuring CodeBuild to access your database
Note: This tutorial assumes a basic knowledge of CodeBuild itself, but for an overview of CodeBuild, check out an introductory video I did a few years back or the project homepage.
The script we'll be using in Codebuild looks somewhat like this:
MigrationsProject: Type: AWS::CodeBuild::Project Properties: ... Source: Auth: Type: OAUTH Location: !Ref GitHubProjectUrl Type: GITHUB GitCloneDepth: 1 BuildSpec: | version: 0.2 phases: install: runtime-versions: java: corretto17 build: commands: - > # Note: this is the recommended way from RedGate to download the CLI # tools - wget fetches a file, tar extracts it, and ln -s links it in place # to the known path /usr/local/bin as flyway # # A production-ready script should # pass the version of Flyway and Root # URL in an input parameter... wget -qO- https://repo1.maven.org/.../flyway-commandline-9.11.0-linux-x64.tar.gz | \ tar xvz && \ ln -s `pwd`/flyway-9.11.0/flyway /usr/local/bin - cd flyway - > flyway migrate -locations=filesystem:migrations/**/*.sql \ -password=$DB_PASSWORD \ -user=$DB_USERNAME \ -url=jdbc:postgresql://$DB_HOSTNAME:$DB_PORT/$DB_DATABASE \ -connectRetries=300 \ -X
The CodeBuild script checks out the project from Git (in our case, using GitHub), and makes the directory available before running the script. We’ve placed our flway migrations in the flyway/migrations
directory of the project.
We can use the Environment settings to inject our non-credentials configuration in this CodeBuild task:
Environment: Type: LINUX_CONTAINER ComputeType: BUILD_GENERAL1_SMALL Image: aws/codebuild/amazonlinux2-x86_64-standard:4.0 PrivilegedMode: true EnvironmentVariables: - Name: "DB_HOSTNAME" Value: !Ref "DatabaseEndpointAddress" - Name: "DB_PORT" Value: !Ref "DatabaseEndpointPort" ...
How to hide your secrets
Hiding your database credentials from prying eyes is essential, especially in a production environment. Emitting a password in a log file is a big security hole. CodeBuild has the ability to encrypt your secrets, and we’ll use that now to tighten up our solution.
To hide the database security credentials from prying eyes, inject the ARN (Amazon Resource Name, the universal locatable name for your AWS objects) of the entire secret (DBSecrets
) from the CloudFormation template as an environment variable.
Environment: EnvironmentVariables: # ... the ones above, and ... - Name: "DB_SECRETS_ARN" Value: !Ref "DBSecrets"
Next, in the build specification itself, add an env
property, using the secrets-manager
sub-property to inject encrypted variables for each secret’s key in the Secrets Manager, using unix shell substitution to provide the ARN, a colon, and the key in the secret to inject each property securely:
BuildSpec: | version: 0.2 env: secrets-manager: DB_DATABASE: ${DB_SECRETS_ARN}:dbDatabaseName DB_USERNAME: ${DB_SECRETS_ARN}:dbUserName DB_PASSWORD: ${DB_SECRETS_ARN}:dbPassword phases: install: runtime-versions: java: corretto17 ...
Now you’ve prevented CodeBuild from leaking your password into log files and the Console itself. You’ll note that we don’t see these variables anywhere, in logs or the console.
Also note: although Flyway itself hides the password property in its logs, that accounts for its own log entries. Any echo
of the unencrypted property in the shell scripts would show the values, as would the AWS Console for people given access to run CodeBuild scripts. Hence you should always use an encrypted secret for properties you wish to protect from prying eyes.
Additional CodeBuild Configuration Steps
Now that CodeBuild runs within a VPC, you’ll need to evaluate other needs, such as:
- Ensuring your VPC has access to download Flyway from the Internet
- Access to AWS APIs that interact with other services like ECS, EKS, S3, etc.
Access to the Internet from CodeBuild
To access external resources, your VPC will need a NAT Gateway installed. The NAT Gateway will open up traffic from the VPC to the Internet, allowing our CodeBuild to call the wget
command to download Flyway. A NAT Gateway is one-way; it will allow the AWS resources like CodeBuild to reach out to the internet, but it will not allow incoming traffic.
# Requires a public IP Address ElasticIP01: Type: "AWS::EC2::EIP" Properties: Domain: "vpc" # Assign the Gateway and attach it to the subnet via the Elastic IP NAT: Type: "AWS::EC2::NatGateway" Properties: SubnetId: !Ref PublicSubnet01 AllocationId: !GetAtt ElasticIP01.AllocationId
Accessing AWS Services via APIs
Interacting with ECS or other AWS API-based services will require your CodeBuild to use an IAM Role, setting an IAM Policy to allow it to call various APIs. We’ll address this in my next blog entry.
Wrap-up
Now you'll be able to trip a CodeBuild to deploy changes to your RDS database. This is especially useful if you also have full-stack developers working locally on Docker images and a Postgres local instance, and want to apply the same migration techniques using Flyway from local development, to staging, to production.
Code Sample
There is a sample of this configuration, including a full CloudFormation and sample migrations, in Chariot’s AWS Examples project. It’s in the rds-flyway-migrations folder. Enjoy, it’s a great repository to learn about various AWS configurations, and referenced by a lot of Keith Gregory’s AWS blog posts.