Data

Unbalanced Data in Redshift

Decision support databases have a number of quirks that are not obvious to the casual user, particularly someone coming from an OLTP background. In this post I look at how unbalanced distributions can impact your query performance, how you can identify imbalances, and what you can do to fix them.

Twenty Years of Big Data

More, cheaper, faster: our own Keith Gregory recounts the changes in big data, data storage, and data engineering over the last two decades.

Rightsizing Data for Athena

Amazon Athena is a service that lets you run SQL queries against structured data files stored in S3. It takes a “divide and conquer” approach, spinning up parallel query execution engines that each examine only a portion of your data. The performance of these queries, however, depends on how you consolidate and partition your data. In this post I compare query times for a moderately large dataset, looking for the “sweet spot” between number of files and individual file size.

Hunting the Wolf: App UX and Database Review

The Wolf Golf Scorecard app for Android, by Rod Biresch of Chariot Solutions, is a record-keeping application for a classic four-player golf game. Recently, Chariot had the opportunity to redesign and refresh the look-and-feel and usability of the application (first released in 2016) through a complete design deep-dive process.

Introducing Team Data

This post is a quick primer on the basic titles and skills best suited to fulfill responsibilities along your company’s data pipeline.