I will discuss the experience at LinkedIn and elsewhere moving from batch-oriented ETL to real-time streams using Apache Kafka. I’ll talk about how the design and implementation of Kafka was driven by this goal of acting as a real-time platform for event data.
From the abstract: Map Reduce begat Hadoop begat Big Data. NoSQL moved us away from the stricture of monolithic storage architectures to fit-for-purpose designs. But, Houston, we still have a problem – architects are still designing systems like they did in the ‘70s. Yet most systems are still designed for store-then-compute rather than to observe, … Read More
Learn the core uses of ZooKeeper in the wild and why it is suited to these use cases. I will also talk about systems that don’t use ZooKeeper and why that can be the right decision. Finally, I will discuss the common challenges of running ZooKeeper as a service and things to look out for when architecting a deployment.
Spark provides two important benefits compared to MapReduce. First, its performance is significantly better than MapReduce. We’ll discuss why. Second, because Spark is implemented in Scala and rooted in the world of functional programming, it provides better, more composable primitives that make it easier for developers to create a wide variety of high-performance applications. We’ll discuss these primitives and look at some example applications.
From the abstract: ZooKeeper is everywhere these days. It’s a core component of the Hadoop ecosystem. Your favorite startup probably uses it internally. But as every good skeptic knows, just because something is popular doesn’t mean you should use it. In this talk I will go over the core uses of ZooKeeper in the wild … Read More
In this talk, Hive and Cassandra author (and Hive committer and PMC member) Edward Capriolo will discuss common big-data software challenges and how they can be solved using both batch and stream processing. Technology focus will primarily be on Apache Kafka for publish-subscribe messaging, Storm for stream processing, and Apache Cassandra as a NoSQL data store.
We are holding an all-day event on October 30th, downtown in the Philadelphia Cira Centre, that shines a light on large-scale data processing and application management. In this article I’m going to explain a bit about the event’s goals, and some information on the speakers and talks we’ve been lining up.
This week we feature an interview with Toby DiPasquale of Invite Media. Toby and I discuss the Map-Reduce algorithm, which is the engine that powers Google’s indexing and data processing systems. We start off by discussing how Google started indexing pages, using traditional methods such as C/C++ routines. Quickly this became unmanageable, as the amount … Read More