Data integration is a difficult problem. We know this because 80% of the time in every project is spent getting the data you want the way you want it. We know this because this problem remains challenging despite 40 years of attempts to solve it. All we want is a service that will be reliable, handle all kinds of data and integrate with all kinds of systems, especially with stream processing applications. A service that is easy to manage and scale as our systems grow. Oh, and we want low latency too. Is it too much to ask?
In this presentation, we’ll discuss the basic challenges of data integration, separate what is really important from what is merely nice to have and introduce design and architecture patterns that are used to tackle these challenges. We will then explore how these patterns can be implemented using Apache Kafka. We offer no silver bullets. Rather, we will share pragmatic solutions that many engineering organizations used to build fast, scalable and manageable data pipelines.