kafka

Philly ETE 2017 #25 – Stream All Things: Patterns of Modern Data Integration – Gwen Shapira

Data integration is a difficult problem. We know this because 80% of the time in every project is spent getting the data you want the way you want it. We know this because this problem remains challenging despite 40 years of attempts to solve it. All we want is a service that will be reliable, … Read More

Philly ETE 2016 #8 – Ewen Cheslack-Postava – Demystifying Stream Processing with Apache Kafka

Kafka Streams represents a new design point in the stream processing space. Where most frameworks provide a service for running stream processing applications, Kafka Streams emphasizes low-overhead development that feels more like developing any other application.

Philly ETE 2016 #4 – Evan Chan – NoLambda: A new architecture combining streaming, ad hoc, machine learning, and batch analytics

In today’s world of exploding big and fast data, developers who want both streaming analytics and ad hoc, OLAP-like analysis have often had to develop complex architectures such as Lambda—a path for fast streaming analytics using NoSQL stores such as Cassandra and HBase with a separate batch path involving HDFS and Parquet. While this approach works, it involves too many moving parts, too many technologies for ops, and too many engineering hours. Helena Edelson and Evan Chan highlight a much simpler approach to combine streaming and ad hoc/batch analysis using what they call the NoLambda stack (Apache Spark/Scala, Mesos, Akka, Cassandra, Kafka), plus FiloDB, a new entrant to the distributed-database world that combines streaming and ad hoc analytics.

Philly ETE 2016 – Ewen Cheslack-Postava – Demystifying Stream Processing with Apache Kafka

Philly ETE 2016 – Evan Chan – NoLambda: A new architecture combining streaming, ad hoc, machine learning, and batch analytics

Webinar: Typesafe and Chariot – Building the Real-Time Organization

This webinar will describe a reference architecture using the Typesafe Reactive Platform and other tools such as Cassandra, Kafka, and Spark that can be used to build out the real-time organization.

Data I/O 2013 – Web-scale Data Processing: Practical approaches for low-latency and batch – Edward Capriolo

Podcast: Play in new window | Download (Duration: 59:47 — 137.9MB) | Embed

In this talk, Hive and Cassandra author (and Hive committer and PMC member) Edward Capriolo will discuss common big-data software challenges and how they can be solved using both batch and stream processing. Technology focus will primarily be on Apache Kafka for publish-subscribe messaging, Storm for stream processing, and Apache Cassandra as a NoSQL data store.