apache beam

IoT on AWS – That’s Not A Data Lake…

This talk will review two common use cases for the use of captured metric data: 1) Real-time analysis, visualization, and quality assurance, and 2) Ad-hoc analysis. Once metric data is generated, to support the use cases mentioned above it must be ingested properly using a robust and fault-tolerant streaming framework. The most common open source streaming options will be mentioned however this talk be concerned with Apache Flink specifically. A brief discussion of Apache Beam will also be included in the context of the larger discussion of a unified data processing model.

Best practices around data persistence will be discussed. An attempt will be made to eliminate confusion about the format data should take when it is ‘at rest’. Different serialization formats will be compared and discussed in context with the most typical analysis use cases. Finally fully managed solutions such as AWS Data Lake will be mentioned briefly. We will discuss their relative advantages and disadvantages.

By Eric Snyder, Software Architect at Chariot Solutions