Lambda SnapStart is intended to improve the cold start time for a Lambda function. It’s been available for Java workloads since 2022, and was recently released for Python and .Net. It works by running the initialization code of your Lambda function when you release a version, and then storing an image of the initialized Lambda execution environment. Cold starts load this image rather than running the initialization code themselves. Given that cold starts happen unpredictably, and may be measured in seconds, this seems like a great feature.
The reality, as usual, is more nuanced. SnapStart introduces its own cold start delays, as it loads the image into the runtime. And it increases the time and effort of deployment. In this post I drill down into the nuance, so that you can decide whether it’s a worthwhile choice fo your project.
Introduction: Cold Starts
On the surface, AWS Lambda is simple: you write a script with a handler function, and Lambda invokes that function whenever configured events happen. For example, when a new message arrives from an SQS queue, or an Application Load Balancer receives a request from a client.
Unseen by most developers, there’s a lot that happens to make this work. In simplified form, before Lambda can invoke your handler function it must create an execution environment and run any initialization code in your application. As an optimization, a single Lambda execution environment may be reused for multiple invocations: after initialization, the Lambda runtime enters a loop and waits for invocation events. After some length of time – 15 minutes without invocations, longer if in active use – Lambda shuts down the execution environment.
The end result is that sometimes — when there isn’t an execution environment waiting for invocations — your Lambda functions take extra time to execute, because they need to run initialization code: a “cold” start. How often this happens depends on your function’s actual usage patterns. If you’ve implemented a website with Lambda, and get a steady stream of visitors, then chances of them experiencing a cold start is very low; most will hit execution environments that are already running. But if your traffic is bursty or infrequent, then cold starts might noticeably degrade your users’ experience.
Timing: a do-nothing Lambda
To understand how cold starts impact Lambda runtimes, I started by creating two Lambda functions, one in Python and one in Java, that don’t do anything. This let me measure the baseline performance of creating a new execution environment.
Python (3.12):
def lambda_handler(event, context): pass
Java (Corretto 21):
package example; import com.amazonaws.services.lambda.runtime.Context; public class DoNothing { public void handler(Object value, Context lambdaContext) throws Exception { // nothing happening here } }
I provisioned each with 2048 MB of memory, and forced cold starts by updating the Lambda function before each run. Here are the timings, averaged over three runs:
Initialization | Runtime | |
---|---|---|
Python | 92.92 ms | 1.95 ms |
Java | 419.98 ms | 7.24 ms |
The key take-away from this example is that there’s always a penalty for cold start, and that you need to decide whether that penalty is acceptable.
Adding Initialization Tasks
Of course, real Lambdas actually do something. And that something often requires some level of initialization: for example, reading database connection information from Secrets Manager and then establish a connection to a database. That leads to the question of when (and where) you should perform such initialization tasks.
You could, of course, read the secret and open the connection anew every time your Lambda runs. That, however, ignores the fact that a Lambda execution environment may remain running for multiple invocations. You can improve performance for the second and subsequent invocations if you execute the initialization steps once and cache the results.
There are two other benefits to initializing your Lambda outside of the handler function. The first applies if you use provisioned concurrency: in this case there’s no visible cold start time for your provisioned functions, because they’re initialized when you provision them (although if you exceed the provisioned concurrency, those excess functions will be initialized on first call). The second benefit, which I have observed but appears to be undocumented, is that Lambda gets a full CPU allotment during initialization even if its provisioned memory corresponds to a fractional CPU.
Timing: retrieving a database secret
So how do you do this pre-function initialization? With “scripting” languages such as Python, it’s easy: everything in your primary module that isn’t inside a function gets executed during initialization. For example, here’s the “do nothing” Lambda, updated to retrieve a secret during initialization:
import boto3 import json import os secret_arn = os.environ['DATABASE_SECRET_ARN'] sm_client = boto3.client('secretsmanager') secret_value = sm_client.get_secret_value(SecretId=secret_arn)['SecretString'] database_creds = json.loads(secret_value) def lambda_handler(event, context): print(database_creds)
Java is different: a Lambda handler function is actually a method in a class, which is instantiated by the Lambda runtime. Initialization code for the Lambda lives in the class’s constructor:
package example; import java.util.Collections; import java.util.Map; import com.amazonaws.services.lambda.runtime.Context; import com.fasterxml.jackson.databind.ObjectMapper; import software.amazon.awssdk.services.secretsmanager.SecretsManagerClient; import software.amazon.awssdk.services.secretsmanager.model.GetSecretValueRequest; import software.amazon.awssdk.services.secretsmanager.model.GetSecretValueResponse; public class RetrieveSecretInInitializer { public final static String DATABASE_SECRET_ENVAR = "DATABASE_SECRET_ARN"; private MapdatabaseCredentials = Collections.emptyMap(); public RetrieveSecretInInitializer() { try { String secretArn = System.getenv(DATABASE_SECRET_ENVAR); SecretsManagerClient client = SecretsManagerClient.builder().build(); GetSecretValueRequest request = GetSecretValueRequest.builder().secretId(secretArn).build(); GetSecretValueResponse result = client.getSecretValue(request); ObjectMapper mapper = new ObjectMapper(); databaseCredentials = mapper.readValue(result.secretString(), Map.class); } catch (Exception ex) { throw new RuntimeException("initialization failed", ex); } } public void handler(Object value, Context lambdaContext) throws Exception { System.out.println(databaseCredentials); } }
And here are the timings. As above, values are the average of three runs.
Initialization | Runtime | |
---|---|---|
Python | 543.16 ms | 1.82 ms |
Java | 2,036.22 ms | 5.57 ms |
That’s a big jump, especially on the Java side. Moreover, the times are similar to running as a stand-alone program on my laptop: 400 ms for Python, 2,500 for Java (counting time for all threads). As far as I can tell, most of that time for Java is involved in classloading. Java reads classes from disk on an as-needed basis, and in the case of this simple program, there are 5,400 classes that get loaded.
Adding SnapStart
The promise of SnapStart is eliminating this initialization time by running the code once and preserving an image of the program state. However, this approach has limitations:
First, SnapStart only applies to published versions of your function. This is the same limitation that applies to provisioned capacity. But it does mean that you must follow a formal release process, which might require updating multiple services in your application, such as an API Gateway or load balancer target group, when you deploy new code.
Publishing a SnapStart version also takes quite a lot of time: the Console warns that it will take “a few minutes.” For my example program, it was around 90 seconds. During this time, Lambda runs the initialization code and writes the initialized memory image into its cache. Again, this adds friction to your deployment process.
SnapStart also charges you for cached function versions, based on the amount of time that the image is active, and for each restore from cache. These numbers are very small – 13 cents a day for a 1 GB function – but they can add up if you have lots of functions and don’t remove old versions.
But the biggest limitation, in my opinion, is that SnapStart doesn’t preserve network connections. It can’t: a TCP network connection is stateful, and tied to the machine that initiated that connection. Any established connections will be invalid when the new execution enviornment is loaded. The reason that this is an issue is that the cached image won’t know that the connection is invalid, and won’t discover that fact until an attempt to use the connection times out. This timeout may be measured in seconds, even minutes.
As a result, your program must ensure that any such connections are re-established when the image is restored. SnapStart gives you hooks that let you do that; be sure to use them with any connection pools (and this is why my example program only retrieves credentials and doesn’t establish connections).
SnapStart Timings
For timings, I simply published versions of the previous Lambda with SnapStart enabled.
Restore | Runtime | |
---|---|---|
Python | 621.55 ms | 4.67 ms |
Java | 765.93 ms | 24.54 ms |
That certainly improved Java startup time, but Python took a little longer than the baseline. I suspect this is driven more by the size of the image to be restored, rather than the language.
Surprisingly, the function runtime also takes significantly longer, in both cases. But this appears to be a one-time event: subsequent function invocations are similar to a non-Snapstart deployment.
In neither case would I want this to be a regular occurrence for a latency-sensitive application.
Conclusion
The benefit of SnapStart will depend on how much initialization you do. For Java, the bar is pretty low; for Python, not so much. I probably wouldn’t pay the cost of SnapStart if all I needed to do was retrieve a secret or three. But if I were downloading a large model from S3, it might be worthwhile.
A better question is what level of latency is acceptable for you. In an active web service, at steady state, you probably won’t have that many cold starts. And your users might be OK with a half second occasionally (I wouldn’t bet on it, but they might).
But if your goal is to avoid latency spikes, then I have the same advice that I gave five years ago: don’t use Lambda. Instead, use an always-on web framework, such as Django or Spring, deployed in a Docker container, and with an auto-scaling policy to add or remove containers based on load.