Hi Tony, I think your analyses are correct. Especially, yes, if you re-read the data the (ts=3) data should still be considered late if both consumers read with the same speed. If, however, (ts=3) is read before the other consumer reads (ts=8) then it should not be considered late, as you said.
Best, Aljoscha > On 24. Aug 2017, at 15:49, Tony Wei <tony19920...@gmail.com> wrote: > > Hi, > > Recently, I studied about watermark from Flink documents and blogs. > > I have some question about this scenario below. > > Suppose there are five clients sending events with different time to the > topic on Kafka. > Topic has two partitions and five events' timestamp are (ts=1), (ts=2), > (ts=3), (ts=8), (ts=9). > The Flink streaming job uses the following setting: > 1. use AscendingTimestampExtractor > 2. client time as timestamp > 3. use tumbling window with 5 unit window size > 4. not allow late event > > If the client events out of order like this. > Partition A [(ts=1), (ts=8)] > Partition B [(ts=2), (ts=9)] <= (ts=3) delay > Should the window function emit [(ts=1), (ts=2)], keep [(ts=8), (ts=9] in > state and drop out (ts=3) ? > > If all events has come, and then replay the job from the beginning, the > partition state would be > Partition A [(ts=1), (ts=8)] > Partition B [(ts=2), (ts=9), (ts=3)] > Suppose two consumers fetch events with same speed, should the result be the > same as above? > If consumer B reads (ts=3) earlier than consumer A reads (ts=8), would (ts=3) > be placed in the window before watermark becomes to 8 and then emit [(ts=1), > (ts=2), (ts=3)] as result? > > I wonder if those questions are all correct. If not, is there any mechanisms > about watermark and window in Flink that I missed. > > Thank for your help. > > Best Regards, > Tony Wei >