Limiting Cross-stack References in CDK

by
Tags: , , , ,
Category:

Several years ago I wrote CloudFormation Tips and Tricks, in which I gave the advice to “use outputs lavishly, exports sparingly.” The reason is that when you export a value from one stack and import it into another you bind those stacks tightly together, and can’t change that exported value.

For example, you might create one stack with an ECS task definition, and then in a second stack create a Lambda and an EventBridge rule that triggers it when the task completes. To create that rule, you export the task definition ARN from the first stack, and import it into the second. But then, after initial deployment, you realize that the ECS task needs more memory. That change, however, creates a new revision of the task definition, which means a new ARN. And the rules that govern cross-stack references won’t let you redeploy the first stack because of that change.

With raw CloudFormation templates, it’s relatively easy to break these references: use an output (without export) in the first stack to expose the value, and a parameter in the second to import it. This does mean that stacks can be out-of-sync, with dependent stacks referring to values that are no longer valid. However, this can be solved in your deployment pipeline.

CDK, however, makes it easy to hide these cross-stack relationships behind properties in the CDK “app”: the task definition exposed as an attribute on its stack, and passed as a construction-time property to the second stack. You might think (as I did, until I looked at the generated code) that CDK would create internal parameters in the second stack, and the CDK CLI would set those parameter when it does a deploy.

Unfortunately, it doesn’t: it uses exports and imports, and thereby creates a cross-stack reference that will break a future deployment.

After fighting with these cross-stack references in a recent project, I started using the techniques described in this post to minimize them. You will find examples here.

Tip #1: Increase the size of your deployment unit

I am a believer in modularization: each separately-maintained component should be deployed independently. My example above was taken from a real-world data pipeline: one stack defined an ECS task that performed data extraction, and one defined a transformation stage with a Lambda that decided whether or not that transformation would run. Because the Lambda was closely tied to the transformation process, it seemed reasonable to include it in that second stack.

While the official CDK best-practices documentation advocates a similar mindset toward stacks, I’ve found that in practice it’s often better to make constructs the unit of modularity. Each construct defines related functionality (such as the Lambda and its trigger), with one stack to rule them all … err, deploy the various constructs. This is a little challenging to implement with multiple teams: you’ll need a shared repository, and also a good deployment process. And you might run into the limit on resources in a CloudFormation stack (although thankfully that has been constantly increasing).

This approach also lets you discover reusable constructs. For example, in a data pipeline you might have multiple Lambdas that are triggered by an EventBridge event. Once you’ve written that first, dedicated construct, it’s relatively easy to transform it into a generic construct. One that might be applicable to multiple projects, and published in a “shared constructs” library.

With that said, there are some important use-uses for multiple stacks. First, it makes sense to extract pieces that are truly unchanging, such as your VPC configuration.

Second, there are some situations that require a manual step as part of the complete deployment. For example, an ECS task depends on a Docker registry to retrieve its task image. In most cases, that will be an internal ECR repository, which you create using the CDK Repository construct. Then, you reference the repositoryUri property of that construct when you create a ContainerDefinition. So far, so good, but now you want to control that task using an ECS service, so add a FargateService construct into the same stack. However, this fails to deploy, because you haven’t pushed an image into the repository.

There’s no way around extracting repository creation into its own stack: you must have a valid image in order to deploy the service. So you create one stack for the repository, and one for the task definition and service, and then deploy those stacks in two stages with a docker push between them. Fortunately, an ECR repository URI is unchanging, so can be exported/imported safely.

Tip #2: Expose unchanging information and create references in code

In the case of a Lambda triggered by the completion of an ECS task, you don’t need to know the complete task definition ARN. Instead, you can define an EventBridge rule using a prefix of that ARN. This simplifies things considerably: to construct the prefix of a task definition ARN, you just need the account ID, region, and task definition “family”:.

export interface Stack_2_Props extends cdk.StackProps {

  /** The family name of the task definition that triggers the Lambda. */
  readonly taskDefinitionFamilyName: string;

  ...
}

export class Stack_2 extends cdk.Stack {

  constructor(scope: Construct, id: string, props: Stack_2_Props) {
    ...

    const taskDefinitionPrefix = "arn:aws:ecs:" +
                                 props.env!.region! + ":" + props.env!.account! + 
                                 ":task-definition/" + props.taskDefinitionFamilyName;

    const rule = new events.Rule(this, "TaskListenerTrigger", {
      description:
        "Triggers the " + handler.functionName + " on ECS task completion",
      eventPattern: {
        source: ["aws.ecs"],
        detailType: ["ECS Task State Change"],
        detail: {
          lastStatus: ["STOPPED"],
          taskDefinitionArn: [{
            prefix: taskDefinitionPrefix,
          }],
        },
      },
    });

While I consider this to be the “best” solution in my examples, it has limited utility: EventBridge rules are one of the few places where you can use a partial ARN, and ECS task definitions are one of the few places where even a minor change results in a new ARN.

Tip #3: Use Parameter Store to expose stack information

This is my favorite solution, because it gives you great flexibility when changing stacks. But it also offers the most opportunity for causing two stacks to become out-of-sync. But that shouldn’t be a problem as long as you deploy everything together.

The idea is that you create a Systems Manager parameter that holds whatever value you want to make accessible to another stack. For example, a task definition ARN:

export interface Stack_1_Props extends cdk.StackProps {
  ...

  /** Where we save the ARN for the task definition, so that it can be consumed later */
  readonly taskDefinitionArnExportPath: string;
}


export class Stack_1 extends cdk.Stack {
  constructor(scope: Construct, id: string, props: Stack_1_Props) {
    ...

    const ssmExport = new ssm.StringParameter(this, 'TaskDefinitionArnParameter', {
       parameterName:   props.taskDefinitionArnExportPath,
       stringValue:     taskDefinition.taskDefinitionArn,
    });

Then, in that other stack, you use the valueForStringParameter() function to retrieve that parameter:


export interface Stack_2_Props extends cdk.StackProps {
  ...

  /** Where we retrieve the ARN for the task definition. */
  readonly taskDefinitionArnExportPath: string;
}


export class Stack_2 extends cdk.Stack {

  constructor(scope: Construct, id: string, props: Stack_2_Props) {
    ...

    const taskDefinitionArn = ssm.StringParameter.valueForStringParameter(this, props.taskDefinitionArnExportPath);

    const rule = new events.Rule(this, "TaskListenerTrigger", {
      description:
        "Triggers the " + handler.functionName + " on ECS task completion",
      eventPattern: {
        source: ["aws.ecs"],
        detailType: ["ECS Task State Change"],
        detail: {
          lastStatus: ["STOPPED"],
          taskDefinitionArn: [taskDefinitionArn],
        },
      },
    });
}
...

Behind the scenes, CDK translates this reference to a stack parameter that uses one of the SSM parameter types, and CloudFormation reads the value from Parameter Store. This is a nice feature, that I think could be used effectively in “pure” CloudFormation templates as well.

This approach does suffer from problems. As I mentioned above, it lets the two stacks go out-of-sync, so that the second uses an old value of the parameter. To avoid this, you must ensure that all stacks using parameters will be updated at the same time. My experience with Terraform remote state references is that this isn’t that big a problem in a disciplined devops team.

The second problem is that all parties have to agree on the name of the parameter. As used in my example, the name is defined in the “app” portion of the CDK script, and passed to each of the stacks as properties. You need to provide a name, rather than expose the parameter, because CDK tries to be smart and actually exports the parameter’s value (which creates the very cross-stack reference we’re trying to avoid).

Wrapping up

CDK provides the deployment engineer with a great deal of power, because it runs an actual program rather than applying a fixed template. At “synth time” you can discover information about your deployment environment, and use that information to manage the resources that you create.

However, at its core CDK is in fact applying a fixed template. Which means that you can’t discover more information about your environment during the deploy; it all has to happen during synthesis. And in a multi-stack environment, the only way for a later stack to know about the changes made by its predecessors is if they explicitly provide that information in a form that the dependent stack can consume.

The simplest way to do this in CDK is for one stack construct to expose properties that are then provided to another by the CDK application. But as you quickly learn, this is rarely the best approach, and often leads to “Export … cannot be updated” errors. Hopefully this post has given you some ideas to avoid that.