big data

That’s not a Data Lake, THIS is a Data Lake – IoT on AWS – A Philly Cloud Computing Event

This talk will review two common use cases for the use of captured metric data: 1) Real-time analysis, visualization, and quality assurance, and 2) Ad-hoc analysis. The most common open source streaming options will be mentioned, however this talk be concerned with Apache Flink specifically. A brief discussion of Apache Beam will also be included in the context of the larger discussion of a unified data processing model.

Philly ETE 2017 #40 – Scaling with Apache Spark (or a lesson in unintended consequences) – H. Karau

Apache Spark is one the most popular general purpose distributed systems in the past few years. Apache Spark has APIs in Scala, Java, Python and more recently a few different attempts to provide support for R, C#, and Julia. This talk looks at Apache Spark from a performance/scaling point of view and the work we … Read More

PHLAI – Comcast's Artificial Intelligence Conference

I was lucky enough last week to attend PHLAI, a Comcast-sponsored conference on machine learning and artificial intelligence. The dreary weather did not dampen our spirits as practitioners and business stakeholders met to discuss one of the most important trends in our lifetime.

The O'Reilly AI Conference

I recently attended the O’Reilly AI Conference in New York where artificial intelligence practitioners showcased the impressive strides they’ve made so far in using AI for real-world applications

Philly ETE 2017 #38 – Build a Better Monster: Morality, Machine Learning and Mass Surveillance – M. Ceglowski

The tech industry is in the middle of a massive, uncontrolled social experiment. Having made commercial mass surveillance the economic foundation of our industry, we are now learning how indiscriminate collections of personal data, and the machine learning algorithms they fuel, can be put to effective political use. Unfortunately, these experiments are being run in … Read More

Philly ETE 2017 #15 – Scio: Moving Big Data to Google Cloud, a Spotify Story – Neville Li

We will talk about Spotify’s story of migrating our big data infrastructure to Google Cloud. Over the past year or so we moved away from maintaining our own 2500+ node Hadoop cluster to managed services in the cloud. We replaced two key components in our data processing stack, Hive and Scalding, with BigQuery and Scio … Read More

TechCast #102 – Sameer Farooqui and Brian Clapper on Spark

Podcast: Play in new window | Download (Duration: 25:36 — 35.9MB) | Embed

Today’s podcast features Ken Rimple’s interview with Sameer Farooqui and Brian Clapper of DataBricks, the creators of the Spark Big Data engine.

Philly ETE 2016 #10 – Ryan Brush – Untangling Healthcare with Spark and Dataflow

Spark is becoming a data processing giant, but it leaves much as an exercise for the user. Developers need to write specialized logic to move between batch and streaming modes, manually deal with late or out-of-order data, and explicitly wire complex flows together. This talk looks at how we tackled these problems over a multi-petabyte dataset at Cerner.

Philly ETE 2016 #9 – Srinivas Palthepu – Emergence of Real-Time Analytics: Real-time Analysis of Customer Financial Activities With Apache Flink

In this talk we present a business use case where Capital One needs to process customer activities real-time and react to events appropriately as needed. We then present our experience in building a real-time analytics application that serves the business using a set of open source software frameworks with Apache Flink at its core for real-time stream processing engine.

Philly ETE 2016 – Ewen Cheslack-Postava – Demystifying Stream Processing with Apache Kafka

Kafka Streams represents a new design point in the stream processing space. Where most frameworks provide a service for running stream processing applications, Kafka Streams emphasizes low-overhead development that feels more like developing any other application.