Populating Iceberg Tables with Amazon Data Firehose
In this post I look at using an Amazon Data Firehose to populate Iceberg tables, with the automatic optimization features that AWS announced in November 2024.
In this post I look at using an Amazon Data Firehose to populate Iceberg tables, with the automatic optimization features that AWS announced in November 2024.
In my “Friends Don’t Let Friends Use JSON” post, I noted that I preferred the Avro file format to Parquet, because it was easier to write code to use it. I expected some pushback, and got it: Parquet is “much” more performant. So I decided to do some benchmarking.
I’ve never been a JSON hater, but I’ve recently run into enough pain with JSON as a data serialization format that my feelings are edging toward dislike. However, JSON is a fact of life in most data pipelines, especially those that receive event-stream data from a third-party supplier. This post reflects on some of the problems that I’ve seen, and solutions that I’ve used