avro Archives — Chariot Solutions

Athena Performance Comparison: Avro, JSON, and Parquet

In my “Friends Don’t Let Friends Use JSON” post, I noted that I preferred the Avro file format to Parquet, because it was easier to write code to use it. I expected some pushback, and got it: Parquet is “much” more performant. So I decided to do some benchmarking.

Avro Three Ways

In my last post I recommended using Avro for file storage in a data lake. It has the benefits of compact storage and a schema in every file that tells you what data it holds. In this post I show three ways to generate Avro files: one in Java, and two in Python.

Friends Don’t Let Friends Use JSON (in their data lakes)

I’ve never been a JSON hater, but I’ve recently run into enough pain with JSON as a data serialization format that my feelings are edging toward dislike. However, JSON is a fact of life in most data pipelines, especially those that receive event-stream data from a third-party supplier. This post reflects on some of the problems that I’ve seen, and solutions that I’ve used

DevNews 83 – Through a distorted audio channel, we give you Java 8

Podcast: Play in new window | Download (Duration: 29:47 — 40.9MB) | Embed

We focus on the new Java 8 JDK release, a tutorial on Apache Avro, a review of Ember by the Haydle team, and Don Coleman joins us to talk about Android Wear. Brought to you by the letter ‘T’ for technology!