Philly ETE 2016 – Evan Chan – NoLambda: A new architecture combining streaming, ad hoc, machine learning, and batch analytics

by
Tags: , , , , ,
Category:

Download (PDF, 3.92MB)

Abstract:

In today’s world of exploding big and fast data, developers who want both streaming analytics and ad hoc, OLAP-like analysis have often had to develop complex architectures such as Lambda—a path for fast streaming analytics using NoSQL stores such as Cassandra and HBase with a separate batch path involving HDFS and Parquet. While this approach works, it involves too many moving parts, too many technologies for ops, and too many engineering hours. Helena Edelson and Evan Chan highlight a much simpler approach to combine streaming and ad hoc/batch analysis using what they call the NoLambda stack (Apache Spark/Scala, Mesos, Akka, Cassandra, Kafka), plus FiloDB, a new entrant to the distributed-database world that combines streaming and ad hoc analytics.

Topics include:

  • Modern streaming and batch/ad-hoc architectures
  • Precise and scalable streaming ingestion using Apache Kafka, Akka, Spark Streaming, Cassandra, and FiloDB
  • How a unified streaming + batch stack can lower your TCO
  • What FiloDB is and how it enables fast analytics with competitive storage cost
  • Use cases involving time series, smart cities, and event data
  • Machine learning using Spark MLLib—without the need to export to HDFS
  • Combining streaming and historical/ad-hoc data analysis, including efficient longer-time window analysis

About Evan:

Evan loves to design, build, and improve bleeding edge distributed data and backend systems using the latest in open source technologies. He is the creator of the FiloDB open-source distributed analytical database, as well as the Spark Job Server. He has led the design and implementation of multiple big data platforms based on Storm, Spark, Kafka, Cassandra, and Scala/Akka, including a columnar real-time distributed query engine. He is an active contributor to the Apache Spark project, and a Datastax Cassandra MVP. He has built Spark applications since Spark 0.8, Cassandra since 0.6. He is a big believer in GitHub, open source, and meetups, and have given talks at various conferences including Spark Summit, Cassandra Summit, FOSS4G, and Scala Days. He has a Bachelors and Masters of Electrical Engineering, with distinction, from Stanford University. In his spare time he is a family man, photographer, foodie, avid Oakland Athletics fan, and committed follower of Jesus.