I think this architecture design has the following problems:

 

1.     Use Kafka java api to consume Kafka data. If the data is large, it will 
cause a data backlog.

2.     If the downstream is a flink job, using kafka java api to consume kafka 
data is not fault-tolerant. Once a problem occurs, data will be lost, and 
Exactly-once semantics cannot be guaranteed.

3.     We consume data from the original Kafka and write to a new topic, which 
will cause data redundancy and duplication

 

 

 

发件人: <[email protected]> 代表 zhangjun 
<[email protected]>
答复: <[email protected]>
日期: 2021年8月4日 星期三 下午5:36
收件人: <[email protected]>, <[email protected]>
主题: Some questions about kafka adapter

 

 

Hi,all:

I found that when I create a pipeline with a Kafka adapter, 

 

It will first consume data from Kafka, write a new randomly generated topic, 
and then consume data from this new topic. I think this will cause data 
redundancy. I want to know why to do this?

 

Is there any relevant design document about the streampipes project?

 

Thanks.

Reply via email to