map reduce

Data I/O 2013 – Web-scale Data Processing: Practical approaches for low-latency and batch – Edward Capriolo

In this talk, Hive and Cassandra author (and Hive committer and PMC member) Edward Capriolo will discuss common big-data software challenges and how they can be solved using both batch and stream processing. Technology focus will primarily be on Apache Kafka for publish-subscribe messaging, Storm for stream processing, and Apache Cassandra as a NoSQL data store.

TechCast #13 – Toby DiPasquale on Google, Map-Reduce, Hadoop, Amazon EC2 and more

This week we feature an interview with Toby DiPasquale of Invite Media.  Toby and I discuss the Map-Reduce algorithm, which is the engine that powers Google’s indexing and data processing systems.  We start off by discussing how Google started indexing pages, using traditional methods such as C/C++ routines.  Quickly this became unmanageable, as the amount … Read More