Hi,

Recently, I studied about watermark from Flink documents and blogs.

I have some question about this scenario below.

Suppose there are five clients sending events with different time to the
topic on Kafka.
Topic has two partitions and five events' timestamp are (ts=1), (ts=2),
(ts=3), (ts=8), (ts=9).
The Flink streaming job uses the following setting:
1. use AscendingTimestampExtractor
2. client time as timestamp
3. use tumbling window with 5 unit window size
4. not allow late event

If the client events out of order like this.
  Partition A [(ts=1), (ts=8)]
  Partition B [(ts=2), (ts=9)]  <= (ts=3) delay
Should the window function emit [(ts=1), (ts=2)], keep [(ts=8), (ts=9] in
state and drop out (ts=3) ?

If all events has come, and then replay the job from the beginning, the
partition state would be
  Partition A [(ts=1), (ts=8)]
  Partition B [(ts=2), (ts=9), (ts=3)]
Suppose two consumers fetch events with same speed, should the result be
the same as above?
If consumer B reads (ts=3) earlier than consumer A reads (ts=8), would
(ts=3) be placed in the window before watermark becomes to 8 and then emit
[(ts=1), (ts=2), (ts=3)] as result?

I wonder if those questions are all correct. If not, is there any
mechanisms about watermark and window in Flink that I missed.

Thank for your help.

Best Regards,
Tony Wei

Reply via email to