TechCast #13 – Toby DiPasquale on Google, Map-Reduce, Hadoop, Amazon EC2 and more

Tags: , ,

This week we feature an interview with Toby DiPasquale of Invite Media.  Toby and I discuss the Map-Reduce algorithm, which is the engine that powers Google’s indexing and data processing systems.  We start off by discussing how Google started indexing pages, using traditional methods such as C/C++ routines.  Quickly this became unmanageable, as the amount of data to index outstripped the processing power and traditional data transformation paradigms.

Toby and I then go into discussing Map Reduce, which was originally posited as a thesis and then published as a seminal paper in the community.  Map Reduce has been implemented by Google, and as we’ll see in the podcast, others followed suit and created the Hadoop engine, a Java-based Map Reduce solution.

We talk about Hadoop and it’s various subprojects, and then get into a discussion on Amazon EC2 and the Cloud Computing movement, including why it is valuable to organizations who want to scale from one to potentially dozens of CPUs.

I’ll post the show notes early next week at  Until then, enjoy the show and comments are always welcome.

Note:  the podcast audio got a bit distorted on Toby’s side, but I don’t think it distracts too much.  Rather than re-record the interview I’m presenting it as-is.