This talk will address valuable lessons learned with the current versions of HBase. There are inherent architectural features that warrant for careful evaluation of the data schema and how to scale out a cluster. The audience will get a best practices summary of where there are limitations in the design of HBase and how to avoid those. In particular, we will discuss issues like proper memory tuning (for reads and writes), optimal flush file sizing, compaction tuning, and the number of write ahead logs required. Further, there is a discussion of the theoretical write performance, in comparison to those observed on real clusters. A collection of cheat sheets and example calculation for cluster sizing rounds out the talk towards the end.
We are holding an all-day event on October 30th, downtown in the Philadelphia Cira Centre, that shines a light on large-scale data processing and application management. In this article I’m going to explain a bit about the event’s goals, and some information on the speakers and talks we’ve been lining up.
From the abstract: “Over the past two years Tumblr has experienced tremendous growth, with traffic growing more than 10x from less than 1.6B pageviews a month to nearly 20B pageviews a month. Tumblr started in 2007 as a traditional LAMP application with some memcache usage. Over the past two years Tumblr has moved towards a … Read More