Spark is becoming a data processing giant, but it leaves much as an exercise for the user. Developers need to write specialized logic to move between batch and streaming modes, manually deal with late or out-of-order data, and explicitly wire complex flows together. This talk looks at how we tackled these problems over a multi-petabyte dataset at Cerner.
In today’s world of exploding big and fast data, developers who want both streaming analytics and ad hoc, OLAP-like analysis have often had to develop complex architectures such as Lambda—a path for fast streaming analytics using NoSQL stores such as Cassandra and HBase with a separate batch path involving HDFS and Parquet. While this approach works, it involves too many moving parts, too many technologies for ops, and too many engineering hours. Helena Edelson and Evan Chan highlight a much simpler approach to combine streaming and ad hoc/batch analysis using what they call the NoLambda stack (Apache Spark/Scala, Mesos, Akka, Cassandra, Kafka), plus FiloDB, a new entrant to the distributed-database world that combines streaming and ad hoc analytics.
In this talk we present a business use case where Capital One needs to process customer activities real-time and react to events appropriately as needed. We then present our experience in building a real-time analytics application that serves the business using a set of open source software frameworks with Apache Flink at its core for real-time stream processing engine.
This talk presents Apache Spark, Spark Streaming, Apache Kafka, Apache Cassandra and Akka as supporting Lambda architecture in the context of a fault tolerant, streaming big data pipeline.
This talk presents Apache Spark, Spark Streaming, Apache Kafka, Apache Cassandra and Akka as supporting Lambda architecture in the context of a fault tolerant, streaming big data pipeline. We will walk through the Fault Tolerance story with these technologies to build applications, and how to easily implement and integrate them in a Scala Akka application for real-time delivery of meaning at high velocity, in highly distributed and concurrent environments.
While we keep our eye on all kinds of emerging technologies, here are five in particular we’ll be paying attention to in 2015:
Spark provides two important benefits compared to MapReduce. First, its performance is significantly better than MapReduce. We’ll discuss why. Second, because Spark is implemented in Scala and rooted in the world of functional programming, it provides better, more composable primitives that make it easier for developers to create a wide variety of high-performance applications. We’ll discuss these primitives and look at some example applications.
Today’s Spring is easy to get started with, easy to learn, and embraces convention over configuration. Join Spring developer David Turanski as he takes you on a tour of today’s Spring, including the Spring.IO platform, Spring Boot, Websocket support, Spring HATEOAS, and more! This is a Spring you may not have seen yet.
Tracey Welson-Rossman talks to Anita Garamella Andrews, VP of Client Analytics Services at R.J. Metrics, about analytics and actionable data.
Sujan Kapadia writes: “This year I’ve started going to the DataPhilly meetups, and I think I’m hooked. The bottom line is DataPhilly talks are very intriguing, expose you to topics you don’t encounter everyday, and give you the chance to meet “non-traditional” developers (scientists and statisticians), whose ranks are rapidly growing.”