Rightsizing Data for Athena

Amazon Athena is a service that lets you run SQL queries against structured data files stored in S3. It takes a “divide and conquer” approach, spinning up parallel query execution engines that each examine only a portion of your data. The performance of these queries, however, depends on how you consolidate and partition your data. In this post I compare query times for a moderately large dataset, looking for the “sweet spot” between number of files and individual file size.

Using CodeArtifact with Poetry

In my last post I discussed how an artifact server is the best way to publish locally-developed Python packages. In this post, I show you how to set up the AWS CodeArtifact service and use it with pip and Poetry.

Three Approaches to Deploying Lambdas

“Traditional” deployment patterns separate the application from its infrastructure. Lambda deployments turn this model on its head, binding the infrastructure tightly to the running code. This can be a challenge, especially when developing in a team: it is all too easy for one developer to accidentally overwrite another’s work. In this post I look at several deployment options, and how they impact a development team.