In my last post, I used a Lambda Container Image to package my Lambda with OS-dependent libraries. I was happy with how easy it was to build and deploy the container, but unhappy with how “first start” initialization time added over a second to my Lambda’s execution time. In this post, I take a different path, and show how the pre-built AWS images can simplify building a “traditional” (deployed as a ZIP file) Lambda.
To recap the previous post, when I want to access a Postgres database from Python code, I turn to the psycopg2
lbrary. However, that library depends on native libraries, and the version that will be installed on my Ubuntu development machine isn’t compatible with the AWS environment where the Lambda actually runs. So I can’t just install locally and ZIP it into a deployment bundle.
As I noted in that post, there are options to work around this issue. One is to build the deployment bundle on an EC2 instance running AWS Linux. This is a good choice if you’re already running your CI/CD system on EC2, or are using AWS CodeBuild. But if not, it introduces an inconvenient step into your builds.
Instead, I’ll use the pre-built Docker images that AWS introduced with Lambda Container Images. This works whether you’re running on a Macintosh laptop or in a third-party CI/CD pipeline, as long as you have access to Docker. And because the images are maintained by AWS, you have a strong guarantee that anything you build within them will be able to run on Lambda.
Exploring the base image
When it release Lambda Container Images, AWS made “base images” available for all of the supported Lambda runtimes. These images replicate the Lambda environment, although as we’ll see they don’t include all of the libraries that are pre-installed in a Lambda runtime (with the AWS SDK being foremost among these).
Let’s explore the image by running a shell:
docker run \ -it --rm \ --entrypoint /bin/bash \ -v /tmp:/mnt \ amazon/aws-lambda-python:3.8
I’ve split this command into multiple lines so that I can call out each piece:
-
-it --rm
starts the container interactively, and removes it after shutdown. -
--entrypoint /bin/bash
tells Docker to run thebash
shell rather than the image’s default entrypoint. This default entrypoint is the Lambda Runtime Interface Emulator, which sets up an HTTP endpoint and invokes your containerized Lambda for testing. -
-v /tmp:/mnt
mounts the host machine’s/tmp
directory as/mnt
inside the container. This lets you copy things between host and container in a controlled manner. -
amazon/aws-lambda-python:3.8
is the name of the image.
Running this image requires downloading roughly 200 MB, and it may take a few minutes to do that depending on your Internet connection. Once it’s downloaded, the image starts and you’re presented with the bash-4.2#
prompt. Below I show a few commands and their output; some work, some don’t. I’ve added blank lines between commands, and in some cases use a hash sign (#
) to provide a “comment.” Take some time on your own to look around; if you accidentally delete something inside the container, it doesn’t matter. When done, type exit
to shut down the container.
bash-4.2# pwd /var/task bash-4.2# python Python 3.8.6 (default, Dec 16 2020, 01:05:15) [GCC 7.3.1 20180712 (Red Hat 7.3.1-11)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path ['', '/var/lang/lib/python38.zip', '/var/lang/lib/python3.8', '/var/lang/lib/python3.8/lib-dynload', '/var/lang/lib/python3.8/site-packages'] >>> import boto3 Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'boto3' >>> exit() bash-4.2# ls -l /var/lang/lib/python3.8/site-packages total 744 -rw-r--r-- 1 root root 126 Dec 16 01:06 easy_install.py drwxr-xr-x 5 root root 4096 Jan 13 15:27 pip # and so on bash-4.2# aws sts get-caller-identity bash: aws: command not found bash-4.2# pip install awscli Collecting awscli # output truncated bash-4.2# aws sts get-caller-identity Unable to locate credentials. You can configure credentials by running "aws configure". # ok, that’s expected, but at least I can run the program bash-4.2# make bash: make: command not found bash-4.2# git --version bash: git: command not found bash-4.2# yum install git make # lots of output follows Complete! bash-4.2# git clone https://github.com/chariotsolutions/aws-examples.git Cloning into 'aws-examples'... # output truncated bash-4.2# touch /mnt/example.txt bash-4.2# exit
There are a few things that I want to call out. First is that the “working directory” is /var/task
, which is where Lambda unpacks a Python deployment bundle. And, as advertised, when you run Python it’s the 3.8.x version.
What I found surprising is that this image does not contain the boto3
library, even though it’s installed by default when you deploy a “traditional” Lambda. When working with the Python base image you need to explicitly install it.
Not surprising is that the AWS command-line program isn’t installed, nor are development tooks such as make
or git
. However, you can install all of these, as shown.
The last thing that I want to call out is that the container runs as root
by default. That touch
command at the end creates the file /tmp/example.txt
, which you won’t be able to delete without using sudo
. This will cause you no end of pain if you write files to your home directory rather than somewhere in /tmp
, but in the next section I’ll show how to start the container so that it writes files as your normal user.
Work At Chariot
If you value continual learning, a culture of flexibility and trust, and being surrounded by colleagues who are curious and love to share what they’re learning (with articles like this one, for example!) we encourage you to join our team. Many positions are remote — browse open positions, benefits, and learn more about our interview process below.
Using the container to install packages
You can run pip
from within the container, and you can write files to a mounted directory outside of the container. This means that you can retrieve modules with binary components, and use them to build your deployment bundle:
docker run --rm \ --entrypoint /var/lang/bin/pip \ -v "$(pwd):/mnt" \ --user "$(id -u):$(id -g)" \ amazon/aws-lambda-python:3.8 \ install --target /mnt/build --upgrade psycopg2-binary
As before, let’s look at each piece of the command:
- I’ve removed the
-it
option because I’m not running interactively. - The entrypoint — the program to run when the container starts — is now
/var/lang/bin/pip
. - Rather than mount
/tmp
, I’m mounting the current directory — assumed to be the project directory. - The
--user "$(id -u):$(id -g)"
option is the aforementioned work-around for containers running as root: here I tell it to use my current user and group ID instead. As you’ll see below, this can cause its own issues. -
install --target /mnt/build --upgrade psycopg2-binary
is the command passed topip
. It installs the packages to the/mnt/build
directory, which means it’s actually writing output to the thebuild
sub-directory of the host directory (creating it if necessary).
When you run this, you’ll see output like the following. The warning is because most of the files in the image are owned by root
, but we’re running as a different user; it can be ignored.
WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo’s -H flag. Collecting psycopg2-binary Downloading psycopg2_binary-2.8.6-cp38-cp38-manylinux1_x86_64.whl (3.0 MB) Installing collected packages: psycopg2-binary Successfully installed psycopg2-binary-2.8.6
In a real build, I’d then combine these installed packages with the Lambda source files from the src
, packaging the whole thing as a ZIP. Also in a real-world project I’d use requirements.txt
, rather than explicitly-named packages.
Building and deploying entirely within a container
Running pip install
inside a container is a useful technique (although if you build frequently you’ll want to also mount an external cache directory so that you’re not repeatedly downloading the same modules). But you can take it one step further, and create a Docker container that provides all of the tooling needed to completely build and deploy your project.
Why would you do this? Well, one reason is to ensure that your entire team has the same tooling, so that you don’t have an “it builds on my machine!” situation. Another is that you can take this same Docker image and run in your CI/CD environment, without worrying about what it might be running internally.
For my example I’ll be building a Python Lambda using make
. And I want the ability to deploy the Lambda once it’s built. We’ll start with the Dockerfile (since this is an example, I don’t use any tricks to minimize the number of layers):
FROM amazon/aws-lambda-python:3.8 RUN yum install -y make zip RUN pip install awscli COPY Makefile / WORKDIR /build ENV DEPLOY_DIR=/tmp/deploy ENTRYPOINT ["/usr/bin/make", "--environment-overrides", "--directory=/build", "--makefile=/Makefile"]
I start with the base image, and then add the tooling that I need to build. Since I’m using make
, I also provide a standardized Makefile. The goal of this image is to run that Makefile, and to that end I’ve created a new ENTRYPOINT
, which overrides the entrypoint of the base image.
I’m going to skip building the Docker image. In subsequent sections I’ll assume that it has the name build-environment:latest
. Instead, I want to focus on the Makefile:
.PHONY: default deploy package test init clean LAMBDA_NAME ?= Example DEPLOY_DIR ?= /tmp/deploy ARTIFACT ?= example.zip SRC_DIR := /build/src LIB_DIR := /tmp/lib default: package deploy: package aws lambda update-function-code --function-name $(LAMBDA_NAME) --zip-file fileb://$(DEPLOY_DIR)/$(ARTIFACT) package: test mkdir -p ${DEPLOY_DIR} rm -f ${DEPLOY_DIR}/${ARTIFACT} cd $(SRC_DIR) ; zip -qr ${DEPLOY_DIR}/${ARTIFACT} * cd $(LIB_DIR) ; zip -qr ${DEPLOY_DIR}/${ARTIFACT} * test: init # run any unit tests here init: mkdir -p ${LIB_DIR} pip install -r /build/requirements.txt -t $(LIB_DIR) --upgrade clean: rm $(DEPLOY_DIR)/$(ARTIFACT)
This Makefile installs modules into a local lib
directory, then zips those modules together with the project source code to produce the deployment bundle. If you run the following command, you will end up with the file example.zip
— the deployment bundle — in your working directory.
docker run --rm \ --user $(id -u):$(id -g) \ -v $(pwd):/build \ -e DEPLOY_DIR=/build \ build-environment:latest
By now you should be familiar with using -v
to mount a directory into the container, and --user
to ensure that the container runs as your current user. The one thing unique to this command is -e DEPLOY_DIR=/build
.
This has the effect of overriding the DEPLOY_DIR
variable inside the Makefile, because the Dockerfile specifies --environment-overrides
as part of the make
command. And since /build
is mapped to your current directory, that’s where the bundle ends up.
Which is great, but this Makefile has an additional feature: the deploy
target runs the AWS CLI to upload the bundle directly to the Lambda function. To make this work you must already have a Lambda with the name Example
(or use -e LAMBDA_NAME=YourLambdaName
to override the Makefile’s default value).
You also need to provide AWS credentials to the container, so that the CLI can do its job. I prefer managing access keys as environment variables, and it’s easy to tell Docker to export your current variables inside a container, so that’s the approach I show here. You could also mount the $HOME/.aws
directory, and let the CLI read it.
docker run --rm \ -v $(pwd):/build \ -e AWS_ACCESS_KEY_ID \ -e AWS_SECRET_ACCESS_KEY \ -e AWS_DEFAULT_REGION \ build-environment:latest \ deploy
In this case I tell the container to run the deploy
task, rather than the default (package
) task. Also note that I didn’t provide the DEPLOY_DIR
environment variable. That’s because I don’t need the bundle to be saved in my working directory; it need never leave the container’s filesystem.
If you’d like to try this out, it’s available on GitHub. Note that this repository contains all examples for Chariot’s AWS blog posts, each in its own sub-directory.
Closing thoughts
If you haven’t guessed, I’m a proponent of integrating Docker into the development process. I’ve been using Docker to provide services such as databases (often preconfigured with seed data) for several years. And in recent years, as I’ve worked more with Python, I’ve become a fan of Docker as “a better venv,” giving you an environment that’s consistent across host operating systems.
I’ve also noticed that companies that rely on Lambdas, and have lots of them, also tend to have lots of different ways to build them. My approach of building everything within a container could just add to the maintenance burden, except for one thing: it guarantees that the deployment bundle will actually run in the Lambda environment.
And lastly, this gives you one solution to the “chicken and egg” problem of Lambda deployments. That will be the topic of my next post.