apache spark

IoT on AWS – That’s Not A Data Lake…

This talk will review two common use cases for the use of captured metric data: 1) Real-time analysis, visualization, and quality assurance, and 2) Ad-hoc analysis.

Real World Spark Lessons

I recently built a Spark job that runs every morning to collect the previous day’s data from a few different datasources, join some reference data, perform a few aggregations and write all of the results to Cassandra. All in roughly three minutes (not too shabby).

TechCast #102 – Sameer Farooqui and Brian Clapper on Spark

Podcast: Play in new window | Download (Duration: 25:36 — 35.9MB) | Embed

Today’s podcast features Ken Rimple’s interview with Sameer Farooqui and Brian Clapper of DataBricks, the creators of the Spark Big Data engine.

SBT: Group annotated tests to run in forked JVMs

SbtTestGrouping Running tests that use a HiveContext On our current project, we utilize Spark SQL and have several ScalaTest based suites which require a SparkContext and HiveContext. These are started before a suite runs and shut down after it completes via the BeforeAfterAll mixin trait. Unfortunately due to this bug (also see this related pull … Read More

Philly ETE 2015 #9 – Helena Edelson – Streaming Big Data with Spark, Spark Streaming, Kafka, Cassandra and Akka

This talk presents Apache Spark, Spark Streaming, Apache Kafka, Apache Cassandra and Akka as supporting Lambda architecture in the context of a fault tolerant, streaming big data pipeline.