AWS Resources

All Blog Posts

Transforming Data with Amazon Athena

My prior posts used Lambda to do data transformation. But what if we could use a non-programmatic tool, in keeping with the Extract-Load-Transform mindset of the modern data pipeline. As it turns, we can: Amazon Athena can write data as well as query it. There are, of course, a few stumbles along the way. In this blog post I walk through the process of aggregating CloudTrail data using SQL.

Aggregating Files in your Data Lake – Part 1

As I’ve written in the past, large numbers of small files make for an inefficient data lake. But sometimes, you can’t avoid small files. Our CloudTrail repository, for example, has 4,601,675 files as-of this morning, 44% of which are under 1,000 bytes long. In this post, I develop a Lambda-based data pipeline to aggregate these files, storing them in a new S3 location partitioned by date. Along the way I call out some of the challenges that face such a pipeline.

Small Data: a pipeline for low-latency decision support

In my last post, I said that I didn’t think Postgres was a good choice for a decision support database, versus a task-specific DBMS such as Redshift. In this post I’m going to take the opposite stand, and say that there are cases where Postgres is appropriate: namely, low-latency systems that contain a limited amount of data.

Why Not Just Use Postgres?

My last few posts have focused on Redshift and Athena, two specialized tools for managing and querying Big Data. But there’s a meme that’s been floating around for at least a few years that you should just use Postgres for anything data-related. It may not provide all of the features and capabilities of a dedicated tool, but is one less thing to learn and manage. Should this advice also apply to your data warehouse?

1 2 8

Featured Videos

All the AWS CodeBuild You Can Stomach in 45 Minutes

In this 45 minute talk, Ken Rimple gives a quick overview of AWS CodeBuild, then dives into a few of the challenges he’s faced, from dealing with build errors properly, configuring CodeBuild to run inside of AWS, testing locally so you don’t go crazy waiting for 15 minutes each time you deploy a new build, how to properly access your build artifacts and reports, running tools like Cypress, to building and deploying Docker containers to ECS, and more.

AWS: Things I Learned the Hard Way

Amazon Web Services (AWS) is a collection of nearly 200 services. They can be intimidating to the newcomer, and offer many opportunities for mistakes: some expensive, some just inconvenient. In this Lunch and Learn, our panel of AWS experts look at some of the mistakes they made, and how these could have been avoided.

All Videos

Looking to discuss an AWS project with our team? Contact us.