Building Lambdas with Poetry

by
Tags: ,
Category:

Coming from a Java background, with tools like Maven (or Gradle, or really, anything other than SBT), I consider the Python development process to be a bit of a mess. The pieces are all there: a central repository for publicly-available packages, a way to install the packages you want, and several ways to run your program with only those packages. But it seems that everybody has a different way to combine those pieces. So when a colleague introduced me to Poetry, my first reaction was “oh great, another tool that solves part of my problem.”

The problem that I was trying to solve was building a Lambda deployment bundle. These are ZIP files that contain your program code, along with all of the dependencies it needs. Poetry is designed to handle the first part of that: packaging the program code. I didn’t see how it helped me until one morning, when I had an “oh, that’s obvious!” moment. And now I don’t want to build bundles any other way.

If you’d like to see Poetry in action, I’ve updated our CloudTrail to Elasticsearch Lambda to build with either pip or poetry.

Introduction to Poetry

Poetry combines dependency management with virtual environments. You tell it the dependencies that you need for production use, and the ones that you need for development. It creates a virtual environment with both, and then builds a Python Wheel archive referencing just the production dependencies.

Once you install it, there are three ways to start a new project:

  1. poetry new poetry-demo
    This creates a new directory with all of the Poetry files and some template Python modules.

  2. poetry init
    This is for an existing project. Poetry walks you through a Q&A about your project, and produces the pyproject.toml file.

  3. Copy pyproject.toml from another project and edit as needed
    Once you’ve been using Poetry for a while, this is the easiest way to transform an existing project.

Poetry projects have the following features:

  • pyproject.toml is the project’s configuration file
    It specifies the project name, version, and all dependencies (and some other stuff that I’m not going to discuss).

  • Poetry divides dependencies into those required to build the module, and those required to use the module
    For example, your module might require the requests and pytest libraries. The former is required to use your module in production, so must become a transitive dependency of whoever uses your module, while the latter is used only while your developing the module.

  • Poetry is intended to build a single top-level package
    This was, for me, the biggest disconnect from my former way of building Python programs. I had always structured my projects with a top-level src directory, with modules and packages underneath. Poetry replaces this directory with one named after your project, and it becomes the root package. If this is confusing, we’ll see it in more detail below.

Once you’ve created a new Poetry project the next step is to add some dependencies. There’s a Poetry command that will update dependencies, but I think it’s easier to just edit the file, adding to tool.poetry.dependencies or tool.poetry.dev-dependencies section as appropriate.

For example, the following configuration says that I need Python 3.7 or above, and the third-party requests and aws-requests-auth modules, taking the most recent patch version that’s equal to or higher than what I’ve specified.

[tool.poetry.dependencies]
python              = "^3.7"
requests            = "~2.25.1"
aws-requests-auth   = "~0.4.3"

After adding all of your dependencies (or after downloading a project that’s already configured to use Poetry), it’s time to create your virtual environment:

poetry install

Poetry stores all of the files for a virtual environment outside of the project directory. On Linux, you’ll find them under $HOME/.cache/pypoetry/virtualenvs/; each project has its own environment, named after the top-level package.

Beware: if you work on a lot of projects, you’ll find that this directory takes up quite a bit of disk space: each virtual environment has a separate copy of its dependencies. I have gotten into the habit of deleting virtual environments when I’m done working on a project, and re-initializing when I go back to the project.

Poetry also creates the file poetry.lock, which lists the actual versions that it installed. This file serves two purposes. First, it allows repeatable builds: if you check out an older version of a project, Poetry will retrieve the same dependency versions that you used to build it originally. The second purpose is that you can run install again, and Poetry will verify that it has that version and not downloaded it again.

OK, you’ve now got your virtual environment set up; what can you do?

  • poetry shell
    Starts a new terminal shell that uses the virtual environment. If you run python from within this shell, you’ll have access to all of your dependencies (including development dependencies). It’s useful if you do REPL-based development.

  • poetry run
    Runs a Python package from within the virtual environment. This is the way to run tests; for example, poetry run pytest.

  • poetry build
    Creates the output files, storing them in the dist directory (you should add this directory to your .gitignore). Poetry produces two output files: a “tarball” for setuptools, and a “wheel” for pip.

Creating a Lambda bundle

The problem with using Poetry to build Lambda bundles is that Lambda doesn’t want either a tarball or a wheel. Instead, it wants a ZIP that contains all of the dependencies required to run the function. As I said, I had an “oh, that’s obvious!” moment to bridge that gap; I don’t want to say how long it took me to get there.

poetry run pip install --upgrade -t package dist/*.whl
cd package ; zip -r ../artifact.zip . -x '*.pyc'

That’s it: you just need to use pip to put your package and its mainline dependencies in one place. You now have a deployment bundle that can be uploaded to Lambda.

Odds and ends

Well, that’s not completely it. There are a few peculiarities that you should be aware of.

boto3

If you’re writing a Lambda, chances are good that you need to talk to AWS, and in Python that means the boto3 library. So it needs to be a dependency of your project.

However, it’s provided by the Lambda runtime, so it doesn’t need to be packaged in your deployment bundle.

My solution is to add it to the development dependencies. That way, it’s available for any tests that you might write (often in partnership with moto). And if your package is included by another program, it’s generally safe to assume that that program has its own boto3 dependency.

Use a dependency specification appropriate to your library’s backwards-compatibility practices

Coming from a Java background, where backwards compatibility in third-party libraries is (usually) a top priority, I’ve been shocked — shocked! — at just how often Python packages introduce breaking changes. They may appear to use semantic versioning, but don’t trust them!

Poetry’s dependency specifications give you the flexibility to lock down the versions you use (the descriptions below assume version numbers that follow the major.minor.patch form):

  • A caret (^) picks the highest minor version for a given major version (with a caveat if the major version is 0). I’ve learned through bitter experience that the only place it’s safe to use this is in for the Python runtime version: “^3.7” picks the highest available version of Python3.
  • A tilde (~) picks the highest patch for a minor version. This is generally safe for all libraries, and necessary for some (I’m looking at you, SQLAlchemy!). So if I were to use “~1.3”, that would give me the highest 1.3.x patch release, but not give me a 1.4.x release.
  • Use just the version number if you want to lock yourself to that version. I use this when I know that my Lambda will fulfill its dependency from a layer: I don’t want my code to depend on features that may not be available in that layer.

Note that these dependencies specifications are only valid in the absence of a poetry.lock file. If that file exists, then Poetry will use whatever dependency versions it specifies.

Local dependencies

If you have only a few Lambdas, they can be self-contained: all source code in a single project. However, as you increase the number of projects, you’ll find that there’s code that can be shared between projects. For example, a module that applies a standard logging configuration, or one that creates a database connection based on the contents of a Secrets Manager secret.

It’s easy enough to create a new project to contain this shared code, but how do your other projects access it?

The correct solution is to stand up a repository server, such as CodeArtifact. You then publish your modules to that server, and configure Poetry to look there for all packages. The server also acts as a cache for public packages, which means that you’re being a good neighbor to PyPi. Moreover, once it has cached all the packages you use, you won’t be affected if PyPi experiences a denial-of-service attack.

But for many organizations, standing up a repository server is on the list of “things we’ll do when we get time.” And even if you already have a repository server, it won’t help you while actively developing the shared library in concert with its dependents.

To support that case, I use Poetry’s ability to reference packages by relative path:

[tool.poetry.dev-dependencies]
myco_shared = {path = "../shared/dist/myco_shared-1.2.3-py3-none-any.whl", develop = true}

There are a few important things to note about this:

  • Its uses a relative path, which means that everybody has to check out projects in the same way. Not optimal, but not onerous either.
  • develop = true tells Poetry that it should symlink the package sources into your virtual environment, so that you can make edits in the shared library’s project and they’ll appear in your project.
  • It’s a development dependency. This requires a little more explanation.

The approach that I use for building a bundle — Poetry to create a “wheel” and then pip to install that wheel — doesn’t work when the wheel specifies a local file. If you try, you’ll get an exception from pip’s requirements parser, claiming that the relative path is an invalid URL (which, reading the pip requirements specification, makes sense).

I suspect it would be possible to use absolute paths and a file:// URL, but that isn’t easily transported between different machines. So instead, I mark the dependency as a “development” dependency, and then my build process explicitly installs it along with the dependent package:

poetry run pip install -t package dist/*.whl
poetry run pip install -t package ../shared/dist/myco_shared-1.2.3-py3-none-any.whl

Wrapping up

As I said at the beginning of this post, I prefer the Maven approach to dependency management: there’s a single directory on your machine that caches all packages, public and private, from which they can be shared between projects and easily assembled into a ZIP for deployment. However, I recognize that a large part of that ability is due to the Java classpath, which specifies individual library files, while Python’s sys.path. specifies directories.

However, both the development turnaround time and Lambda startup time are significantly better with Python than Java. So for the foreseeable future, Python is my implementation language of choice, and Poetry my dependency manager of choice.


 


 

Can we help you?

Ready to transform your business with customized cloud solutions? Chariot Solutions is your trusted partner. Our consultants specialize in managing cloud and data complexities, tailoring solutions to your unique needs. Explore our cloud and data engineering offerings or reach out today to discuss your project.