Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-25 Thread Dong Lin
Hi Piotr, Thanks for the comments. Let me try to explain it below. Overall, the two competing options differ in how an invocation of `#setIsProcessingBacklog(false)` affects the backlog status for the given source (corresponding to the SplitEnumeratorContext instance on which this method is

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-25 Thread Piotr Nowojski
Hi Jarl and Dong, I'm a bit confused about the difference between the two competing options. Could one of you elaborate what's the difference between: > 2) The semantics of `#setIsProcessingBacklog(false)` is that it overrides > the effect of the previous invocation (if any) of >

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-21 Thread Dong Lin
Hi all, Jark and I discussed this FLIP offline and I will summarize our discussion below. It would be great if you could provide your opinion of the proposed options. Regarding the target use-cases: - We both agreed that MySQL CDC should have backlog=true when watermarkLag is large during the

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-18 Thread Dong Lin
Hi Jark, Thanks for the reply. Please see my comments inline. On Tue, Sep 19, 2023 at 10:12 AM Jark Wu wrote: > Hi Dong, > > Sorry for the late reply. > > > The rationale is that if there is any strategy that is triggered and says > > backlog=true, then job's backlog should be true. Otherwise,

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-18 Thread Jark Wu
Hi Dong, Sorry for the late reply. > The rationale is that if there is any strategy that is triggered and says > backlog=true, then job's backlog should be true. Otherwise, the job's > backlog status is false. I'm quite confused about this. Does that mean, if the source is in the changelog

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-18 Thread Dong Lin
Hi Jark, Do you have time to comment on whether the current design looks good? I plan to start voting in 3 days if there is no follow-up comment. Thanks, Dong On Fri, Sep 15, 2023 at 2:01 PM Jark Wu wrote: > Hi Dong, > > > Note that we can not simply enforce the semantics of "any invocation

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-15 Thread Dong Lin
Hi Jark, Please see my reply inline. On Fri, Sep 15, 2023 at 2:01 PM Jark Wu wrote: > Hi Dong, > > > Note that we can not simply enforce the semantics of "any invocation of > > setIsProcessingBacklog(false) will set the job's backlog status to > false". > > Suppose we have a job with two

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-15 Thread Jark Wu
Hi Dong, > Note that we can not simply enforce the semantics of "any invocation of > setIsProcessingBacklog(false) will set the job's backlog status to false". > Suppose we have a job with two operators, where operatorA invokes > setIsProcessingBacklog(false) and operatorB invokes >

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-14 Thread Dong Lin
Hi Jark, Please see my comments inline. On Fri, Sep 15, 2023 at 10:35 AM Jark Wu wrote: > Hi Dong, > > Please see my comments inline below. > > Hmm.. can you explain what you mean by "different watermark delay > > definitions for each source"? > > For example, "table1" defines a watermark

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-14 Thread Jark Wu
;>>>> increasing if no data is generated in Kafka. The > > watermark > > > > > > >> lag and > > > > > > >>>>>>>> backlog > > > > > > >>>>>>>>>>

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-14 Thread Dong Lin
; >>>>>> those > > > > > >>>>>>>>>> records can span from 1 day to 1 hour to 1 second. If > the > > > > > >> records > > > > > >>>>>> span > > > > > >>>>

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-12 Thread Dong Lin
gt; > > >> records, > > > > > >>>>>> those > > > > > >>>>>>>>>> records can span from 1 day to 1 hour to 1 second. If > the > > > > > >> records > > > > > >>>>>> span > > &g

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-11 Thread Jark Wu
> > >>>> If > > > > >>>>>> we > > > > >>>>>>>>>> encounter such a use case, we can create another FLIP to > > > > >> address > > > > >>>> those > > > >

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-10 Thread Dong Lin
; cases. > > > >>>>>>>>>>> However, it seems this is not a general solution for > sources > > > >> to > > > >>>>>>>> determine > > > >>>>>>>>>>> "isProce

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-09 Thread Jark Wu
creases > > >>>> unlimited > > >>>>>> if > > >>>>>>>> no > > >>>>>>>>>>> data is generated in the Kafka. > > >>>>>>>>>>> But in this case, there i

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-08 Thread Xuannan Su
;>>>>>>> What we need is something that reflects the number of records > >>>>>>>> unprocessed > >>>>>>>>>>> by the job. > >>>>>>>>>>> Actually, that is the "pendingRecords&

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-07 Thread Leonard Xu
t;> >>>>>>>>>>> Best, >>>>>>>>>>> Jark >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> >>>>>

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-07 Thread Jark Wu
t;>> > > > > > >>>>>> > > > > > >>>>>> On Tue, Aug 15, 2023 at 11:30 AM Xuannan Su < > > > suxuanna...@gmail.com > > > > > >>>> <mailto:suxuanna...@gmail.com>> wrote: > > > > > >>>>>

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-06 Thread Xuannan Su
configuration key > > provides > > > > us > > > > >>>>>> with > > > > >>>>>>> the flexibility to switch to other approaches to calculate the > > lag > > > > in > > > > >

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-06 Thread Jing Ge
With > > > the > > > >>>>>>> potential introduction of the Generalized Watermark mechanism > in > > > the > > > >>>>>>> future, if I understand correctly, a watermark won't > necessarily > > > need > > > >>>> to >

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-06 Thread Xuannan Su
gt;>> For the reasons above, I prefer introducing the configuration as > > is, > > >>>> and > > >>>>>>> change it later with the a deprecation process or migration > > process. > > >>>> What > > >>>>&

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-09-04 Thread Jing Ge
not expose this detail via the > >>>>>>> configuration > >>>>>>>> option. To be specific, I suggest not mentioning the "watermark" > >>>>>> keyword > >>>>>>> in > >>>>>>>> th

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-31 Thread Xuannan Su
for the reasons you've already mentioned. However, it's > > >>>> not > > >>>>>> the only possible option. Hiding this detail from users would give > > us > > >>>> the > > >>>>>> flexibility to switch to other

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-30 Thread Xuannan Su
e given threshold, Flink will consider latency of >>>>>>>> individual records as less important and prioritize throughput over >>>> it. >>>>>>>> They don't really need the details of how the lags are calculated. >>>>>>>> -

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-30 Thread Jing Ge
>>> Generalized Watermark mechanism, where basically the watermark can > be > >>>>>> anything that needs to travel along the data-flow with certain > >>>> alignment > >>>>>> strategies, and event time watermark would be one specific case o

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-30 Thread Hang Ruan
gt;>>> Generalized Watermark mechanism, where basically the watermark can > be > >>>>>> anything that needs to travel along the data-flow with certain > >>>> alignment > >>>>>> strategies, and event time watermark would be one specific cas

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-29 Thread Xuannan Su
t; >>>>>> I do not intend to block the FLIP on this. I'd also be fine with >>>>>> introducing the configuration as is, and changing it later, if needed, >>>>> with >>>>>> a regular deprecation and migration process. Just making

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-27 Thread Jing Ge
>>>> > >>>> Xintong > >>>> > >>>> > >>>> > >>>> On Mon, Aug 14, 2023 at 12:00 PM Xuannan Su <mailto:suxuanna...@gmail.com>> > >>> wrote: > >>>> > >>>>> Hi Xintong, > >>>

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-21 Thread Xuannan Su
inition, >>>>> watermark is the time progress indication in the data stream. It >>> indicates >>>>> the stream’s event time has progressed to some specific time. On the >>> other >>>>> hand, timestamp in the records is usually used

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-20 Thread Jark Wu
ppears more appropriate and intuitive to calculate the > > event > > > > time lag by watermark and determine the backlog status. And by using > > the > > > > watermark, we can easily deal with the out-of-order and the idleness > > of the > > > > dat

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-14 Thread Xintong Song
gt; > > Xuannan > > > On Aug 10, 2023, 20:23 +0800, Xintong Song , > wrote: > > > > Thanks for preparing the FLIP, Xuannan. > > > > > > > > +1 in general. > > > > > > > > A quick question, could you explain why we are relying on the >

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-14 Thread Xuannan Su
any concern in using watermarks. Just wondering if there's any > > > deep considerations behind this. > > > > > > Best, > > > > > > Xintong > > > > > > > > > > > > On Thu, Aug 3, 2023 at 3:03 PM Xuannan Su wro

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-14 Thread Xintong Song
termark > for > > emitting the record attribute? Why not use timestamps in the records? I > > don't see any concern in using watermarks. Just wondering if there's any > > deep considerations behind this. > > > > Best, > > > > Xintong > > &g

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-13 Thread Xuannan Su
an Su wrote: > > > Hi all, > > > > I am opening this thread to discuss FLIP-328: Allow source operators to > > determine isProcessingBacklog based on watermark lag[1]. We had a several > > discussions with Dong Ling about the design, and thanks for all the > &g

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-10 Thread Xintong Song
behind this. Best, Xintong On Thu, Aug 3, 2023 at 3:03 PM Xuannan Su wrote: > Hi all, > > I am opening this thread to discuss FLIP-328: Allow source operators to > determine isProcessingBacklog based on watermark lag[1]. We had a several > discussions with Dong Ling a

[DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-03 Thread Xuannan Su
Hi all, I am opening this thread to discuss FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag[1]. We had a several discussions with Dong Ling about the design, and thanks for all the valuable advice. The FLIP aims to target the use-case where user want