accuracy validation of streaming pipeline

2022-05-20 Thread vtygoss
Hi community! I'm working on migrating from full-data-pipeline(with spark) to incremental-data-pipeline(with flink cdc), and i met a problem about accuracy validation between pipeline based flink and spark. For bounded data, it's simple to validate the two result sets are consitent or not.

Re: accuracy validation of streaming pipeline

2022-05-23 Thread Shengkai Fang
It's a good question. Let me ping @Leonard to share more thoughts. Best, Shengkai vtygoss 于2022年5月20日周五 16:04写道: > Hi community! > > > I'm working on migrating from full-data-pipeline(with spark) to > incremental-data-pipeline(with flink cdc), and i met a problem about > accuracy validation bet

Re: accuracy validation of streaming pipeline

2022-05-24 Thread Shengkai Fang
Hi, all. >From my understanding, the accuracy for the sync pipeline requires to snapshot the source and sink at some points. It is just like we have a checkpoint that contains all the data at some time for both sink and source. Then we can compare the content in the checkpoint and find the differ

Re: accuracy validation of streaming pipeline

2022-05-24 Thread Leonard Xu
Hi, vtygoss > I'm working on migrating from full-data-pipeline(with spark) to > incremental-data-pipeline(with flink cdc), and i met a problem about accuracy > validation between pipeline based flink and spark. Glad to hear that ! > For bounded data, it's simple to validate the two result se