Building and Deploying Lambdas from a Docker Container

by
Tags: ,
Category:

In my last post, I used a Lambda Container Image to package my Lambda with OS-dependent libraries. I was happy with how easy it was to build and deploy the container, but unhappy with how “first start” initialization time added over a second to my Lambda’s execution time. In this post, I take a different path, and show how the pre-built AWS images can simplify building a “traditional” (deployed as a ZIP file) Lambda.

To recap the previous post, when I want to access a Postgres database from Python code, I turn to the psycopg2 lbrary. However, that library depends on native libraries, and the version that will be installed on my Ubuntu development machine isn’t compatible with the AWS environment where the Lambda actually runs. So I can’t just install locally and ZIP it into a deployment bundle.

As I noted in that post, there are options to work around this issue. One is to build the deployment bundle on an EC2 instance running AWS Linux. This is a good choice if you’re already running your CI/CD system on EC2, or are using AWS CodeBuild. But if not, it introduces an inconvenient step into your builds.

Instead, I’ll use the pre-built Docker images that AWS introduced with Lambda Container Images. This works whether you’re running on a Macintosh laptop or in a third-party CI/CD pipeline, as long as you have access to Docker. And because the images are maintained by AWS, you have a strong guarantee that anything you build within them will be able to run on Lambda.

Exploring the base image

When it release Lambda Container Images, AWS made “base images” available for all of the supported Lambda runtimes. These images replicate the Lambda environment, although as we’ll see they don’t include all of the libraries that are pre-installed in a Lambda runtime (with the AWS SDK being foremost among these).

Let’s explore the image by running a shell:

docker run \
       -it --rm \
       --entrypoint /bin/bash \
       -v /tmp:/mnt \
       amazon/aws-lambda-python:3.8

I’ve split this command into multiple lines so that I can call out each piece:

  • -it --rm starts the container interactively, and removes it after shutdown.
  • --entrypoint /bin/bash tells Docker to run the bash shell rather than the image’s default entrypoint. This default entrypoint is the Lambda Runtime Interface Emulator, which sets up an HTTP endpoint and invokes your containerized Lambda for testing.
  • -v /tmp:/mnt mounts the host machine’s /tmp directory as /mnt inside the container. This lets you copy things between host and container in a controlled manner.
  • amazon/aws-lambda-python:3.8 is the name of the image.

Running this image requires downloading roughly 200 MB, and it may take a few minutes to do that depending on your Internet connection. Once it’s downloaded, the image starts and you’re presented with the bash-4.2# prompt. Below I show a few commands and their output; some work, some don’t. I’ve added blank lines between commands, and in some cases use a hash sign (#) to provide a “comment.” Take some time on your own to look around; if you accidentally delete something inside the container, it doesn’t matter. When done, type exit to shut down the container.

bash-4.2# pwd
/var/task

bash-4.2# python
Python 3.8.6 (default, Dec 16 2020, 01:05:15) 
[GCC 7.3.1 20180712 (Red Hat 7.3.1-11)] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> import sys
>>> sys.path
['', '/var/lang/lib/python38.zip', '/var/lang/lib/python3.8', '/var/lang/lib/python3.8/lib-dynload', '/var/lang/lib/python3.8/site-packages']

>>> import boto3
Traceback (most recent call last):
  File "", line 1, in 
ModuleNotFoundError: No module named 'boto3'

>>> exit()

bash-4.2# ls -l /var/lang/lib/python3.8/site-packages
total 744
-rw-r--r-- 1 root root    126 Dec 16 01:06 easy_install.py
drwxr-xr-x 5 root root   4096 Jan 13 15:27 pip
# and so on

bash-4.2# aws sts get-caller-identity
bash: aws: command not found

bash-4.2# pip install awscli
Collecting awscli
# output truncated

bash-4.2# aws sts get-caller-identity
Unable to locate credentials. You can configure credentials by running "aws configure".
# ok, that’s expected, but at least I can run the program

bash-4.2# make
bash: make: command not found

bash-4.2# git --version
bash: git: command not found

bash-4.2# yum install git make
# lots of output follows
Complete!

bash-4.2# git clone https://github.com/chariotsolutions/aws-examples.git
Cloning into 'aws-examples'...
# output truncated

bash-4.2# touch /mnt/example.txt

bash-4.2# exit

There are a few things that I want to call out. First is that the “working directory” is /var/task, which is where Lambda unpacks a Python deployment bundle. And, as advertised, when you run Python it’s the 3.8.x version.

What I found surprising is that this image does not contain the boto3 library, even though it’s installed by default when you deploy a “traditional” Lambda. When working with the Python base image you need to explicitly install it.

Not surprising is that the AWS command-line program isn’t installed, nor are development tooks such as make or git. However, you can install all of these, as shown.

The last thing that I want to call out is that the container runs as root by default. That touch command at the end creates the file /tmp/example.txt, which you won’t be able to delete without using sudo. This will cause you no end of pain if you write files to your home directory rather than somewhere in /tmp, but in the next section I’ll show how to start the container so that it writes files as your normal user.

Using the container to install packages

You can run pip from within the container, and you can write files to a mounted directory outside of the container. This means that you can retrieve modules with binary components, and use them to build your deployment bundle:

docker run --rm \
           --entrypoint /var/lang/bin/pip \
           -v "$(pwd):/mnt" \
           --user "$(id -u):$(id -g)" \
           amazon/aws-lambda-python:3.8 \
           install --target /mnt/build --upgrade psycopg2-binary

As before, let’s look at each piece of the command:

  • I’ve removed the -it option because I’m not running interactively.
  • The entrypoint — the program to run when the container starts — is now /var/lang/bin/pip.
  • Rather than mount /tmp, I’m mounting the current directory — assumed to be the project directory.
  • The --user "$(id -u):$(id -g)" option is the aforementioned work-around for containers running as root: here I tell it to use my current user and group ID instead. As you’ll see below, this can cause its own issues.
  • install --target /mnt/build --upgrade psycopg2-binary is the command passed to pip. It installs the packages to the /mnt/build directory, which means it’s actually writing output to the the build sub-directory of the host directory (creating it if necessary).

When you run this, you’ll see output like the following. The warning is because most of the files in the image are owned by root, but we’re running as a different user; it can be ignored.

WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo’s -H flag.
Collecting psycopg2-binary
  Downloading psycopg2_binary-2.8.6-cp38-cp38-manylinux1_x86_64.whl (3.0 MB)
Installing collected packages: psycopg2-binary
Successfully installed psycopg2-binary-2.8.6

In a real build, I’d then combine these installed packages with the Lambda source files from the src, packaging the whole thing as a ZIP. Also in a real-world project I’d use requirements.txt, rather than explicitly-named packages.

Building and deploying entirely within a container

Running pip install inside a container is a useful technique (although if you build frequently you’ll want to also mount an external cache directory so that you’re not repeatedly downloading the same modules). But you can take it one step further, and create a Docker container that provides all of the tooling needed to completely build and deploy your project.

Why would you do this? Well, one reason is to ensure that your entire team has the same tooling, so that you don’t have an “it builds on my machine!” situation. Another is that you can take this same Docker image and run in in your CI/CD environment, without worrying about what it might be running internally.

For my example I’ll be building a Python Lambda using make. And I want the ability to deploy the Lambda once it’s built. We’ll start with the Dockerfile (since this is an example, I don’t use any tricks to minimize the number of layers):

FROM amazon/aws-lambda-python:3.8

RUN yum install -y make zip

RUN pip install awscli

COPY Makefile /

WORKDIR /build

ENV DEPLOY_DIR=/tmp/deploy

ENTRYPOINT ["/usr/bin/make", "--environment-overrides", "--directory=/build", "--makefile=/Makefile"]

I start with the base image, and then add the tooling that I need to build. Since I’m using make, I also provide a standardized Makefile. The goal of this image is to run that Makefile, and to that end I’ve created a new ENTRYPOINT, which overrides the entrypoint of the base image.

I’m going to skip building the Docker image. In subsequent sections I’ll assume that it has the name build-environment:latest. Instead, I want to focus on the Makefile:

.PHONY: default deploy package test init clean

LAMBDA_NAME     ?= Example

DEPLOY_DIR      ?= /tmp/deploy
ARTIFACT        ?= example.zip

SRC_DIR         := /build/src
LIB_DIR         := /tmp/lib

default: package

deploy: package
	aws lambda update-function-code --function-name $(LAMBDA_NAME) --zip-file fileb://$(DEPLOY_DIR)/$(ARTIFACT)

package: test
	mkdir -p ${DEPLOY_DIR}
	rm -f ${DEPLOY_DIR}/${ARTIFACT}
	cd $(SRC_DIR) ; zip -qr ${DEPLOY_DIR}/${ARTIFACT} *
	cd $(LIB_DIR) ; zip -qr ${DEPLOY_DIR}/${ARTIFACT} *

test: init 
	# run any unit tests here

init:
	mkdir -p ${LIB_DIR}
	pip install -r /build/requirements.txt -t $(LIB_DIR) --upgrade

clean:
	rm $(DEPLOY_DIR)/$(ARTIFACT)

This Makefile installs modules into a local lib directory, then zips those modules together with the project source code to produce the deployment bundle. If you run the following command, you will end up with the file example.zip — the deployment bundle — in your working directory.

docker run --rm \
       --user $(id -u):$(id -g) \
       -v $(pwd):/build \
       -e DEPLOY_DIR=/build \
       build-environment:latest

By now you should be familiar with using -v to mount a directory into the container, and --user to ensure that the container runs as your current user. The one thing unique to this command is -e DEPLOY_DIR=/build.

This has the effect of overriding the DEPLOY_DIR variable inside the Makefile, because the Dockerfile specifies --environment-overrides as part of the make command. And since /build is mapped to your current directory, that’s where the bundle ends up.

Which is great, but this Makefile has an additional feature: the deploy target runs the AWS CLI to upload the bundle directly to the Lambda function. To make this work you must already have a Lambda with the name Example (or use -e LAMBDA_NAME=YourLambdaName to override the Makefile’s default value).

You also need to provide AWS credentials to the container, so that the CLI can do its job. I prefer managing access keys as environment variables, and it’s easy to tell Docker to export your current variables inside a container, so that’s the approach I show here. You could also mount the $HOME/.aws directory, and let the CLI read it.

docker run --rm \
       -v $(pwd):/build \
       -e AWS_ACCESS_KEY_ID \
       -e AWS_SECRET_ACCESS_KEY \
       -e AWS_DEFAULT_REGION \
       build-environment:latest \
       deploy

In this case I tell the container to run the deploy task, rather than the default (package) task. Also note that I didn’t provide the DEPLOY_DIR environment variable. That’s because I don’t need the bundle to be saved in my working directory; it need never leave the container’s filesystem.

If you’d like to try this out, it’s available on GitHub. Note that this repository contains all examples for Chariot’s AWS blog posts, each in its own sub-directory.

Closing thoughts

If you haven’t guessed, I’m a proponent of integrating Docker into the development process. I’ve been using Docker to provide services such as databases (often preconfigured with seed data) for several years. And in recent years, as I’ve worked more with Python, I’ve become a fan of Docker as “a better venv,” giving you an environment that’s consistent across host operating systems.

I’ve also noticed that companies that rely on Lambdas, and have lots of them, also tend to have lots of different ways to build them. My approach of building everything within a container could just add to the maintenance burden, except for one thing: it guarantees that the deployment bundle will actually run in the Lambda environment.

And lastly, this gives you one solution to the “chicken and egg” problem of Lambda deployments. That will be the topic of my next post.