In this talk, Hive and Cassandra author (and Hive committer and PMC member) Edward Capriolo will discuss common big-data software challenges and how they can be solved using both batch and stream processing. Technology focus will primarily be on Apache Kafka for publish-subscribe messaging, Storm for stream processing, and Apache Cassandra as a NoSQL data store.
This week we feature an interview with Toby DiPasquale of Invite Media. Toby and I discuss the Map-Reduce algorithm, which is the engine that powers Google’s indexing and data processing systems. We start off by discussing how Google started indexing pages, using traditional methods such as C/C++ routines. Quickly this became unmanageable, as the amount … Read More