IoT on AWS – That’s Not A Data Lake…
This talk will review two common use cases for the use of captured metric data: 1) Real-time analysis, visualization, and quality assurance, and 2) Ad-hoc analysis.
This talk will review two common use cases for the use of captured metric data: 1) Real-time analysis, visualization, and quality assurance, and 2) Ad-hoc analysis.
I recently built a Spark job that runs every morning to collect the previous day’s data from a few different datasources, join some reference data, perform a few aggregations and write all of the results to Cassandra. All in roughly three minutes (not too shabby).
Podcast: Play in new window | Download (Duration: 25:36 — 35.9MB) | Embed
Today’s podcast features Ken Rimple’s interview with Sameer Farooqui and Brian Clapper of DataBricks, the creators of the Spark Big Data engine.
SbtTestGrouping Running tests that use a HiveContext On our current project, we utilize Spark SQL and have several ScalaTest based suites which require a SparkContext and HiveContext. These are started before a suite runs and shut down after it completes via the BeforeAfterAll mixin trait. Unfortunately due to this bug (also see this related pull … Read More
This talk presents Apache Spark, Spark Streaming, Apache Kafka, Apache Cassandra and Akka as supporting Lambda architecture in the context of a fault tolerant, streaming big data pipeline.