Hi @aljoscha

The watermark metrics look fine. (attached screenshot)
[image: image.png]

This is the extractor:
class TimestampExtractor[A, B <: AbstractEvent] extends
BoundedOutOfOrdernessTimestampExtractor[(A, B)](Time.minutes(5)) {
  override def extractTimestamp(element: (A, B)): Long =
    Instant.now.toEpochMilli.min(element._2.sequence / 1000)
}

I'll try to output the watermark and report my findings

On Tue, Jun 16, 2020 at 3:21 PM Aljoscha Krettek <aljos...@apache.org>
wrote:

> Did you look at the watermark metrics? Do you know what the current
> watermark is when the windows are firing. You could also get the current
> watemark when using a ProcessWindowFunction and also emit that in the
> records that you're printing, for debugging.
>
> What is that TimestampAssigner you're using for your timestamp
> assigner/watermark extractor?
>
> Best,
> Aljoscha
>
> On 16.06.20 14:10, Ori Popowski wrote:
> > Okay, so I created a simple stream (similar to the original stream),
> where
> > I just write the timestamps of each evaluated window to S3.
> > The session gap is 30 minutes, and this is one of the sessions:
> > (first-event, last-event, num-events)
> >
> > 11:23-11:23 11 events
> > 11:25-11:26 51 events
> > 11:28-11:29 74 events
> > 11:31-11:31 13 events
> >
> > Again, this is one session. How can we explain this? Why does Flink
> create
> > 4 distinct windows within 8 minutes? I'm really lost here, I'd appreciate
> > some help.
> >
> > On Tue, Jun 16, 2020 at 2:17 PM Ori Popowski <ori....@gmail.com> wrote:
> >
> >> Hi, thanks for answering.
> >>
> >>> I guess you consume from Kafka from the earliest offset, so you consume
> >> historical data and Flink is catching-up.
> >> Yes, it's what's happening. But Kafka is partitioned on sessionId, so
> skew
> >> between partitions cannot explain it.
> >> I think the only way it can happen is when when suddenly there's one
> event
> >> with very late timestamp
> >>
> >>> Just to verify, if you do keyBy sessionId, do you check the gaps of
> >> events from the same session?
> >> Good point. sessionId is unique in this case, and even if it's not -
> every
> >> single session suffers from this problem of early triggering so it's
> very
> >> unlikely that all millions sessions within that hour had duplicates.
> >>
> >> I'm suspecting that the fact I have two ProcessWindowFunctions one after
> >> the other somehow causes this.
> >> I deployed a version with one window function which just prints the
> >> timestamps to S3 (to find out if I have event-time jumps) and suddenly
> it
> >> doesn't trigger early (I'm running for 10 minutes and not a single event
> >> has arrived to the sink)
> >>
> >> On Tue, Jun 16, 2020 at 12:01 PM Rafi Aroch <rafi.ar...@gmail.com>
> wrote:
> >>
> >>> Hi Ori,
> >>>
> >>> I guess you consume from Kafka from the earliest offset, so you consume
> >>> historical data and Flink is catching-up.
> >>>
> >>> Regarding: *My event-time timestamps also do not have big gaps*
> >>>
> >>> Just to verify, if you do keyBy sessionId, do you check the gaps of
> >>> events from the same session?
> >>>
> >>> Rafi
> >>>
> >>>
> >>> On Tue, Jun 16, 2020 at 9:36 AM Ori Popowski <ori....@gmail.com>
> wrote:
> >>>
> >>>> So why is it happening? I have no clue at the moment.
> >>>> My event-time timestamps also do not have big gaps between them that
> >>>> would explain the window triggering.
> >>>>
> >>>>
> >>>> On Mon, Jun 15, 2020 at 9:21 PM Robert Metzger <rmetz...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> If you are using event time in Flink, it is disconnected from the
> real
> >>>>> world wall clock time.
> >>>>> You can process historical data in a streaming program as if it was
> >>>>> real-time data (potentially reading through (event time) years of
> data in a
> >>>>> few (wall clock) minutes)
> >>>>>
> >>>>> On Mon, Jun 15, 2020 at 4:58 PM Yichao Yang <1048262...@qq.com>
> wrote:
> >>>>>
> >>>>>> Hi
> >>>>>>
> >>>>>> I think it maybe you use the event time, and the timestamp between
> >>>>>> your event data is bigger than 30minutes, maybe you can check the
> source
> >>>>>> data timestamp.
> >>>>>>
> >>>>>> Best,
> >>>>>> Yichao Yang
> >>>>>>
> >>>>>> ------------------------------
> >>>>>> 发自我的iPhone
> >>>>>>
> >>>>>>
> >>>>>> ------------------ Original ------------------
> >>>>>> *From:* Ori Popowski <ori....@gmail.com>
> >>>>>> *Date:* Mon,Jun 15,2020 10:50 PM
> >>>>>> *To:* user <user@flink.apache.org>
> >>>>>> *Subject:* Re: EventTimeSessionWindow firing too soon
> >>>>>>
> >>>>>>
> >
>
>

Reply via email to