In this week’s TechChat, we welcome Keith Gregory, our Cloud & Data Engineering Practice Lead here at Chariot. Keith is a prolific writer both on the Chariot blog as well as on his own, and is a wealth of knowledge on all things AWS. We touch on Redshift execution plans, how to appropriately size Redshift … Read More
My last few posts have focused on Redshift and Athena, two specialized tools for managing and querying Big Data. But there’s a meme that’s been floating around for at least a few years that you should just use Postgres for anything data-related. It may not provide all of the features and capabilities of a dedicated tool, but is one less thing to learn and manage. Should this advice also apply to your data warehouse?
I’ve always been a fan of database servers: self-contained entities that manage both storage and compute, and give you knobs to turn to optimize your queries. The flip side is that I have an inherent distrust of services such as Athena, which promise to run queries efficiently on structured data split between many files in a data lake. It just doesn’t seem natural; where are the knobs?
So, since I had data generated for my post on Athena performance with different file types, I decided to use that data in a performance comparison with Redshift.
Decision support databases have a number of quirks that are not obvious to the casual user, particularly someone coming from an OLTP background. In this post I look at how unbalanced distributions can impact your query performance, how you can identify imbalances, and what you can do to fix them.