Hi Aljoscha, I gather you guys aren't able to reproduce this.
Here are the answers to your questions: How do you ensure that you only shut down the brokers once Flink has read all the data that you expect it to read Ninad: I am able to see the number of messages received on the Flink Job UI. And, how do you ensure that the offset that Flink checkpoints in step 3) is the offset that corresponds to the end of your test data. Ninad: I haven't explicitly verified which offsets were checkpointed. When I say that a checkpoint was successful, I am referring to the Flink logs. So, as long as Flink says that my last successful checkpoint was #7. And on recovery, it restores it's state of checkpoint #7. What is the difference between steps 3) and 5)? Ninad: I didn't realize that windows are merged eagerly. I have a session window with interval of 30 secs. Once I see from the UI that all the messages have been received, I don't see the following logs for 30 secs. So that's why I thought that the windows are merged once the window trigger is fired. Ex: I verified from the UI that all messages were received. I then see this checkpoint in the logs: 2017-06-01 20:21:49,012 DEBUG org.apache.flink.streaming.runtime.tasks.StreamTask - Notification of complete checkpoint for task TriggerWindow(ProcessingTimeSessionWindows (30000), ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@e56b3293}, ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521) ) -> Sink: sink.http.sep (1/1) I then see the windows being merged after a few seconds: 2017-06-01 20:22:14,300 DEBUG org.apache.flink.streaming.runtime.operators.windowing.MergingWindowSet - Merging [TimeWindow{start=1496348534287, end=1496348564287}, TimeWindow{start=1496348534300, end=1496348564300}] into TimeWindow{start=1496348534287, end=1496348564300} So, point 3 is referring to these logs "MergingWindowSet - Merging .." And point 4 is referring to the data in windows being evaluated. Hope this helps. Thanks. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Fink-KafkaProducer-Data-Loss-tp11413p13805.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.