Spark is an open-source computation platform for Big Data. Leaders in the Hadoop community, such as Cloudera, have embraced Spark as a replacement for MapReduce, the venerable standard for writing Hadoop jobs.
This talk explores why this change is needed. Spark provides two important benefits compared to MapReduce. First, its performance is significantly better than MapReduce. We’ll discuss why. Second, because Spark is implemented in Scala and rooted in the world of functional programming, it provides better, more composable primitives that make it easier for developers to create a wide variety of high-performance applications. We’ll discuss these primitives and look at some example applications.