Hi,

We are planning to build a real time monitoring system with apache kafka. The 
overall idea is to push data from multiple data sources to kafka and perform 
data quality checks. I have few questions with this architecture

1. What are the best possible approaches of streaming data from multiple 
sources which mainly include java applications, oracle database, rest api's, 
log files to apache kafka? Note each client deployment includes each of such 
data sources. Hence the number of data sources pushing data to kafka would be 
equal to the number of customers * x where x are the types of data sources that 
I listed. Ideally a push approach would suit best instead of a pull approach. 
In the pull approach the target system would have to be configured with the 
credentials of various source system which would not be practical
2. How do we handle failures?
3. How do we perform data quality checks on the incoming messages? For e.g. If 
a certain message does not have all the required attributes, the message could 
be discarded and an alert could be raised for the maintenance team to check.

Kindly let me know your expert inputs. Thanks !





----
Sent using Guerrillamail.com
Block or report abuse: 
https://www.guerrillamail.com//abuse/?a=VFJxFx4gSLUTgw%2F68W4ecRzCA8WC1Q%3D%3D


Reply via email to