Picking the Right AWS Compute Infrastructure

The landscape of AWS compute options is overwhelming: EC2, ECS, EKS, Lambda, Lightsail, Elastic Beanstalk, Batch … did I miss any? At their core, all of these options let you run a program, but they have different characteristics; depending on what you want to do, one will be better than another. This post gives you some criteria to help you pick the best compute option for your needs.

Are you running a user-facing web service?

I’m going to get this out of the way up-front: I don’t think that Lambda is appropriate for web applications. The table below shows why: it compares the average response time for a simple “hello, world” webpage written in Node.JS, in milliseconds, as measured using the Firefox developer tools. The leftmost column is a stand-alone Node.JS server, running in an Elastic Beanstalk environment with an Elastic Load Balancer. The second is for calls to a “warmed-up” Lambda function, also using an Elastic Load Balancer, while the third and fourth show the time to make an initial call to Lambda when there isn’t already an execution environment running.

Node.JS	Lambda	Lambda, Cold Start	Lambda, Cold Start in VPC
82	46	415	10,792

The promise of Lambda is that it runs only when needed, and will instantly scale to match demand. This means that you don’t pay to run servers if nobody is visiting your site, and don’t have to scramble to add capacity if you get mentioned by a popular news aggregator. However, there’s a lot that has to happen behind the scenes when a Lambda first starts, which means that the “cold start” time is much larger than the normal response time. And if you’re running your Lambda inside a VPC (for example, because you need to access an RDS database), then that startup time is measured in double-digit seconds.

There are, of course, clever things that you can do client-side to reduce the perceived time to process a request. For example, pre-fetching data that the user is likely to need, or performing updates in the background. And in some cases, such as report generation, you may already expect long response times and will benefit from moving the load off your primary application servers.

But I recommend that you carefully analyze your needs before choosing a Lambda-only solution. Or better, do “the simplest thing that could possibly work” by running your application on a traditional server, then selectively move pieces to Lambda once you understand your traffic patterns.

Can your task be triggered by an event?

While I don’t consider Lamba appropriate for user-facing code, it excels at the sort of background tasks that comprise a large part of modern web applications. For example, report generation: there’s rarely a reason for that to happen inside your primary app-server. Instead, the server can write a message to an SQS queue, and that queue can trigger a Lambda that runs the report and then sends an email to the user with the results.

Lambdas also excel at tasks that happen on a periodic basis, triggered by a CloudWatch scheduled event. For example, if you generate a daily summary report for each of your users, you can configure a Lambda to produce these reports at a fixed time each day.

How long does it take to run your task?

While Lambda is great at background tasks, it can’t be used when those tasks take more than 15 minutes to run. Nor can Lambda handle tasks that require more than 512MB of temporary file storage.

If you have tasks like this, you should look at AWS Batch: like Lambda, it runs only when needed, and you pay only for what you use. Unlike Lambda, Batch lets you run arbitrary Docker containers, and will continue to run them until either they’re done or hit a timeout that you specify.

The combination of Batch and Lambda together is exceptionally powerful: the Lambda exists to be triggered by some external event (such as an SQS queue), and in turn queues the Batch job. Depending on how fancy you want to be, you could then trigger another Lambda when the Batch job completes — either successfully or not — to take some further action.

Do you need special libraries?

One of the other reasons to use Batch rather than Lambda is that you can install additional software in a Batch container. For example, you might want to use the popular ffmpeg library to validate and transform video files that have been uploaded to S3. While this task might fit in the time and space limitations of Lambda, you won’t be able to run it because you can’t install the library.

However, since Batch uses Docker containers, you can install whatever software you wish into the container image before running it. You’re not limited to a predefined execution environment.

Are you already using Docker?

I’ve mentioned Docker in the context of AWS Batch, but AWS has many options for containerized deployments. If you’re not familiar with Docker, it is a tool that lets you package an application and any supporting libraries or programs into an image that can then be run on any machine that supports Docker.

This has several benefits for deploying software, the biggest being that you know exactly what system configuration you’re using; you don’t have to worry that someone installed an incompatible package that will cause your application to break. Almost as important, when it comes to scaling, is that you don’t need to install that software as part of preparing a new machine; this saves precious minutes when you need to add additional resources to meet load.

AWS provides several options for running Docker, but the one I want to focus on here is ECS, the Elastic Container Service. For long-running applications, including web-apps, ECS with the Fargate deployment type gives you a simpler deployment model than EC2 virtual machines. This is partly because Docker handles all of your dependencies, but also because Fargate eliminates decisions around instance type: you specify your needed CPU and memory, and ECS gives you a runtime environment to match.

Do you want to be cloud-agnostic?

In addition to ECS, Amazon provides EKS, the Elastic Kubernetes Service. Kubernetes is a full container orchestration solution, providing a “one stop” solution to managing multiple interacting containers, scaling them, and deploying changes. You can think of it as a combination of ECS, Elastic Load Balancing, Route53 DNS, SSM Parameter Store, and perhaps a few other AWS services.

If you’re an AWS-centric organization, then I’ll argue that there’s no need to use Kubernetes: the other services that I list above give you a set of tools that can provide similar features but also be used in other situations. However, if you are just getting started with cloud deployments, and especially if you’re already using containers, then Kubernetes means that you can avoid becoming AWS-centric: Kubernetes knowledge is transferable to any of the major cloud vendors.

Conclusion

That was a lot of material, and it’s not even an exhaustive list of criteria; just the ones that I find myself using when evaluating a deployment architecture. If I were to boil them down to some simple rules-of-thumb, they would look like this:

Lambda: use for relatively short non-interactive tasks.
Batch: Use when your non-interactive tasks require libraries that aren’t available from Lambda, or that won’t run within Lambda’s execution constraints.
ECS: If you can containerize your application, this should be all you need.
EC2 managed by Elastic Beanstalk: If you’re not comfortable with containers, this will give you “one click” deployments for many standard applications.
EC2: If you need extensive customization of your environment, or specific features (such as attached instance storage) that aren’t available from other options. But be prepared to invest in making your deployments easi