Did you look at the watermark metrics? Do you know what the current watermark is when the windows are firing. You could also get the current watemark when using a ProcessWindowFunction and also emit that in the records that you're printing, for debugging.

What is that TimestampAssigner you're using for your timestamp assigner/watermark extractor?

Best,
Aljoscha

On 16.06.20 14:10, Ori Popowski wrote:
Okay, so I created a simple stream (similar to the original stream), where
I just write the timestamps of each evaluated window to S3.
The session gap is 30 minutes, and this is one of the sessions:
(first-event, last-event, num-events)

11:23-11:23 11 events
11:25-11:26 51 events
11:28-11:29 74 events
11:31-11:31 13 events

Again, this is one session. How can we explain this? Why does Flink create
4 distinct windows within 8 minutes? I'm really lost here, I'd appreciate
some help.

On Tue, Jun 16, 2020 at 2:17 PM Ori Popowski <ori....@gmail.com> wrote:

Hi, thanks for answering.

I guess you consume from Kafka from the earliest offset, so you consume
historical data and Flink is catching-up.
Yes, it's what's happening. But Kafka is partitioned on sessionId, so skew
between partitions cannot explain it.
I think the only way it can happen is when when suddenly there's one event
with very late timestamp

Just to verify, if you do keyBy sessionId, do you check the gaps of
events from the same session?
Good point. sessionId is unique in this case, and even if it's not - every
single session suffers from this problem of early triggering so it's very
unlikely that all millions sessions within that hour had duplicates.

I'm suspecting that the fact I have two ProcessWindowFunctions one after
the other somehow causes this.
I deployed a version with one window function which just prints the
timestamps to S3 (to find out if I have event-time jumps) and suddenly it
doesn't trigger early (I'm running for 10 minutes and not a single event
has arrived to the sink)

On Tue, Jun 16, 2020 at 12:01 PM Rafi Aroch <rafi.ar...@gmail.com> wrote:

Hi Ori,

I guess you consume from Kafka from the earliest offset, so you consume
historical data and Flink is catching-up.

Regarding: *My event-time timestamps also do not have big gaps*

Just to verify, if you do keyBy sessionId, do you check the gaps of
events from the same session?

Rafi


On Tue, Jun 16, 2020 at 9:36 AM Ori Popowski <ori....@gmail.com> wrote:

So why is it happening? I have no clue at the moment.
My event-time timestamps also do not have big gaps between them that
would explain the window triggering.


On Mon, Jun 15, 2020 at 9:21 PM Robert Metzger <rmetz...@apache.org>
wrote:

If you are using event time in Flink, it is disconnected from the real
world wall clock time.
You can process historical data in a streaming program as if it was
real-time data (potentially reading through (event time) years of data in a
few (wall clock) minutes)

On Mon, Jun 15, 2020 at 4:58 PM Yichao Yang <1048262...@qq.com> wrote:

Hi

I think it maybe you use the event time, and the timestamp between
your event data is bigger than 30minutes, maybe you can check the source
data timestamp.

Best,
Yichao Yang

------------------------------
发自我的iPhone


------------------ Original ------------------
*From:* Ori Popowski <ori....@gmail.com>
*Date:* Mon,Jun 15,2020 10:50 PM
*To:* user <user@flink.apache.org>
*Subject:* Re: EventTimeSessionWindow firing too soon




Reply via email to