Hi, We are planning to build a real time monitoring system with apache kafka. The overall idea is to push data from multiple data sources to kafka and perform data quality checks. I have few questions with this architecture
1. What are the best possible approaches of streaming data from multiple sources which mainly include java applications, oracle database, rest api's, log files to apache kafka? Note each client deployment includes each of such data sources. Hence the number of data sources pushing data to kafka would be equal to the number of customers * x where x are the types of data sources that I listed. Ideally a push approach would suit best instead of a pull approach. In the pull approach the target system would have to be configured with the credentials of various source system which would not be practical 2. How do we handle failures? 3. How do we perform data quality checks on the incoming messages? For e.g. If a certain message does not have all the required attributes, the message could be discarded and an alert could be raised for the maintenance team to check. Kindly let me know your expert inputs. Thanks ! ---- Sent using Guerrillamail.com Block or report abuse: https://www.guerrillamail.com//abuse/?a=VFJxFx4gSLUTgw%2F68W4ecRzCA8WC1Q%3D%3D