Hi Tony,

I think your analyses are correct. Especially, yes, if you re-read the data the 
(ts=3) data should still be considered late if both consumers read with the 
same speed. If, however, (ts=3) is read before the other consumer reads (ts=8) 
then it should not be considered late, as you said.

Best,
Aljoscha

> On 24. Aug 2017, at 15:49, Tony Wei <tony19920...@gmail.com> wrote:
> 
> Hi,
> 
> Recently, I studied about watermark from Flink documents and blogs.
> 
> I have some question about this scenario below.
> 
> Suppose there are five clients sending events with different time to the 
> topic on Kafka.
> Topic has two partitions and five events' timestamp are (ts=1), (ts=2), 
> (ts=3), (ts=8), (ts=9).
> The Flink streaming job uses the following setting:
> 1. use AscendingTimestampExtractor
> 2. client time as timestamp
> 3. use tumbling window with 5 unit window size
> 4. not allow late event
> 
> If the client events out of order like this.
>   Partition A [(ts=1), (ts=8)]
>   Partition B [(ts=2), (ts=9)]  <= (ts=3) delay
> Should the window function emit [(ts=1), (ts=2)], keep [(ts=8), (ts=9] in 
> state and drop out (ts=3) ?
> 
> If all events has come, and then replay the job from the beginning, the 
> partition state would be
>   Partition A [(ts=1), (ts=8)]
>   Partition B [(ts=2), (ts=9), (ts=3)]
> Suppose two consumers fetch events with same speed, should the result be the 
> same as above?
> If consumer B reads (ts=3) earlier than consumer A reads (ts=8), would (ts=3) 
> be placed in the window before watermark becomes to 8 and then emit [(ts=1), 
> (ts=2), (ts=3)] as result?
> 
> I wonder if those questions are all correct. If not, is there any mechanisms 
> about watermark and window in Flink that I missed.
> 
> Thank for your help.
> 
> Best Regards,
> Tony Wei
> 

Reply via email to