I’ve been investigating a data duplication issue in a Kinesis -> Flink -> Kafka 
exactly once setup.

Found out that at the stop with savepoint next things happen:

  *   The Kafka transaction is committed, the last processed events being 
written
  *   The Kinesis sequence number is written in the savepoint _metadata file.

The problem is that Kinesis connector does not write to the savepoint file the 
sequence number corresponding to the last event written into Kafka! Instead, it 
writes the same  sequence number as the one from the last checkpoint.

That way, after resuming the job from the savepoint, it duplicates all the 
records between the last checkpoint and the savepoint (were committed to Kafka 
at the savepoint).

I believe that’s a kinesis connector issue because I tried the Kafka -> Flink 
-> Kafka setup and the erroneous behavior does not reproduce.

Have anybody reproduces such a situation?

Reply via email to