Aggregating Files in your Data Lake – Part 1

As I’ve written in the past, large numbers of small files make for an inefficient data lake. But sometimes, you can’t avoid small files. Our CloudTrail repository, for example, has 4,601,675 files as-of this morning, 44% of which are under 1,000 bytes long. In this post, I develop a Lambda-based data pipeline to aggregate these files, storing them in a new S3 location partitioned by date. Along the way I call out some of the challenges that face such a pipeline.

Small Data: a pipeline for low-latency decision support

In my last post, I said that I didn’t think Postgres was a good choice for a decision support database, versus a task-specific DBMS such as Redshift. In this post I’m going to take the opposite stand, and say that there are cases where Postgres is appropriate: namely, low-latency systems that contain a limited amount of data.

Application Development Approaches in AWS Webinar

Chariot’s Ken Rimple, director of Training/Mentoring Services, will take you through some sample AWS architectures and the pros/cons of complexity, cost, and technical considerations for each one.

re:Invent Recap

Chariot’s AWS Practice Lead, Keith Gregory, recaps his experience at Amazon’s re:Invent conference in 2019.

Picking the Right AWS Compute Infrastructure

The correct compute platform depends on the workload that you’re running. This post contains criteria for picking the right environment from the choices that AWS gives you.