Analysis

Data Engineering is more SRE than SQL

Following my post about the Chariot Data Engineering interview, I received some comments along the lines of “wait, you don’t test their SQL skills?!?” Actually, we do: after loading up the test data into Redshift, the candidate creates three progressively difficult queries. But by then, I’m pretty sure they’ve got the skills we need, because … Read More

The Importance of Communication in MVP Product Design

Successful collaboration requires great two-way communication.  That sentiment is core to our product design philosophy here at Chariot Solutions. A critical part of our job is to help balance client and user needs against a project’s budget and technology realities.  This is especially true when building a minimum viable product (MVP). If budget and technology … Read More

Why Not Just Use Postgres?

My last few posts have focused on Redshift and Athena, two specialized tools for managing and querying Big Data. But there’s a meme that’s been floating around for at least a few years that you should just use Postgres for anything data-related. It may not provide all of the features and capabilities of a dedicated tool, but is one less thing to learn and manage. Should this advice also apply to your data warehouse?

Electron, not a walk in the park

Recently, a project I worked on was considering using Electron as a fallback technology for an initial Progressive Web Application. At the time, the assumption was that since Electron uses Chromium, a browser, it should allow application developers to not only use the features of a PWA but also gain native access to technologies, such … Read More

Performance Comparison: Athena versus Redshift

I’ve always been a fan of database servers: self-contained entities that manage both storage and compute, and give you knobs to turn to optimize your queries. The flip side is that I have an inherent distrust of services such as Athena, which promise to run queries efficiently on structured data split between many files in a data lake. It just doesn’t seem natural; where are the knobs?

So, since I had data generated for my post on Athena performance with different file types, I decided to use that data in a performance comparison with Redshift.

Athena Performance Comparison: Avro, JSON, and Parquet

In my “Friends Don’t Let Friends Use JSON” post, I noted that I preferred the Avro file format to Parquet, because it was easier to write code to use it. I expected some pushback, and got it: Parquet is “much” more performant. So I decided to do some benchmarking.