hadoop

Philly ETE 2015 – Jay Kreps – Putting Apache Kafka to Use

I will discuss the experience at LinkedIn and elsewhere moving from batch-oriented ETL to real-time streams using Apache Kafka. I’ll talk about how the design and implementation of Kafka was driven by this goal of acting as a real-time platform for event data.

Philly ETE #30 – Deconstructing the Lambda Architecture. A Small, Fast Data Geek’s Journey Through Big, Slow Data – Darach Ennis

PlayPlay

From the abstract: Map Reduce begat Hadoop begat Big Data. NoSQL moved us away from the stricture of monolithic storage architectures to fit-for-purpose designs. But, Houston, we still have a problem – architects are still designing systems like they did in the ‘70s. Yet most systems are still designed for store-then-compute rather than to observe, … Read More

Philly ETE 2014 – Camille Fournier – ZooKeeper for the Skeptical Architect

Learn the core uses of ZooKeeper in the wild and why it is suited to these use cases. I will also talk about systems that don’t use ZooKeeper and why that can be the right decision. Finally, I will discuss the common challenges of running ZooKeeper as a service and things to look out for when architecting a deployment.

Philly ETE 2014 – Dean Wampler – Why Spark Is the Next Top (Compute) Model

Spark provides two important benefits compared to MapReduce. First, its performance is significantly better than MapReduce. We’ll discuss why. Second, because Spark is implemented in Scala and rooted in the world of functional programming, it provides better, more composable primitives that make it easier for developers to create a wide variety of high-performance applications. We’ll discuss these primitives and look at some example applications.

Data I/O 2013 – Web-scale Data Processing: Practical approaches for low-latency and batch – Edward Capriolo

PlayPlay

In this talk, Hive and Cassandra author (and Hive committer and PMC member) Edward Capriolo will discuss common big-data software challenges and how they can be solved using both batch and stream processing. Technology focus will primarily be on Apache Kafka for publish-subscribe messaging, Storm for stream processing, and Apache Cassandra as a NoSQL data store.

DevNews #20 – Start, Finish, or Play the Game

Amongst our weaponry… Understanding iOS 4 Backgrounding and Delegate Messaging @ Dr. Touch mxcl’s homebrew at master – GitHub Homebrew: OS X’s Missing Package Manager | Engine Yard Ruby on Rails Blog amf.js – A Pure JavaScript AMF Implementation The Incredible, Growing, Commercial Hadoop Market — GigaOM Pro Another Chance To Win a TShirt: What … Read More

TechCast #13 – Toby DiPasquale on Google, Map-Reduce, Hadoop, Amazon EC2 and more

This week we feature an interview with Toby DiPasquale of Invite Media.  Toby and I discuss the Map-Reduce algorithm, which is the engine that powers Google’s indexing and data processing systems.  We start off by discussing how Google started indexing pages, using traditional methods such as C/C++ routines.  Quickly this became unmanageable, as the amount … Read More