I think this architecture design has the following problems:
1. Use Kafka java api to consume Kafka data. If the data is large, it will cause a data backlog. 2. If the downstream is a flink job, using kafka java api to consume kafka data is not fault-tolerant. Once a problem occurs, data will be lost, and Exactly-once semantics cannot be guaranteed. 3. We consume data from the original Kafka and write to a new topic, which will cause data redundancy and duplication 发件人: <[email protected]> 代表 zhangjun <[email protected]> 答复: <[email protected]> 日期: 2021年8月4日 星期三 下午5:36 收件人: <[email protected]>, <[email protected]> 主题: Some questions about kafka adapter Hi,all: I found that when I create a pipeline with a Kafka adapter, It will first consume data from Kafka, write a new randomly generated topic, and then consume data from this new topic. I think this will cause data redundancy. I want to know why to do this? Is there any relevant design document about the streampipes project? Thanks.
