parquet

Friends Don’t Let Friends Use JSON (in their data lakes)

I’ve never been a JSON hater, but I’ve recently run into enough pain with JSON as a data serialization format that my feelings are edging toward dislike. However, JSON is a fact of life in most data pipelines, especially those that receive event-stream data from a third-party supplier. This post reflects on some of the problems that I’ve seen, and solutions that I’ve used