In my “Friends Don’t Let Friends Use JSON” post, I noted that I preferred the Avro file format to Parquet, because it was easier to write code to use it. I expected some pushback, and got it: Parquet is “much” more performant. So I decided to do some benchmarking.
Is Amazon’s new managed, lower cost, petabyte scale warehousing solution a game changer? We’ll review the costs and discuss what does (or does not) make Amazon Redshift reliable, scalable and effective. We’ll dive into the technical details behind the query and storage engines and we’ll expose what works well and what does not. This talk should benefit both those that are and are not already part of the Amazon Web Services ecosystem.