In my “Friends Don’t Let Friends Use JSON” post, I noted that I preferred the Avro file format to Parquet, because it was easier to write code to use it. I expected some pushback, and got it: Parquet is “much” more performant. So I decided to do some benchmarking.
In my last post I recommended using Avro for file storage in a data lake. It has the benefits of compact storage and a schema in every file that tells you what data it holds. In this post I show three ways to generate Avro files: one in Java, and two in Python.
I’ve never been a JSON hater, but I’ve recently run into enough pain with JSON as a data serialization format that my feelings are edging toward dislike. However, JSON is a fact of life in most data pipelines, especially those that receive event-stream data from a third-party supplier. This post reflects on some of the problems that I’ve seen, and solutions that I’ve used
We focus on the new Java 8 JDK release, a tutorial on Apache Avro, a review of Ember by the Haydle team, and Don Coleman joins us to talk about Android Wear. Brought to you by the letter ‘T’ for technology!