Hi Piotr,
Thanks for the comments. Let me try to explain it below.
Overall, the two competing options differ in how an invocation of
`#setIsProcessingBacklog(false)` affects the backlog status for the given
source (corresponding to the SplitEnumeratorContext instance on which this
method is
Hi Jarl and Dong,
I'm a bit confused about the difference between the two competing options.
Could one of you elaborate what's the difference between:
> 2) The semantics of `#setIsProcessingBacklog(false)` is that it overrides
> the effect of the previous invocation (if any) of
>
Hi all,
Jark and I discussed this FLIP offline and I will summarize our discussion
below. It would be great if you could provide your opinion of the proposed
options.
Regarding the target use-cases:
- We both agreed that MySQL CDC should have backlog=true when watermarkLag
is large during the
Hi Jark,
Thanks for the reply. Please see my comments inline.
On Tue, Sep 19, 2023 at 10:12 AM Jark Wu wrote:
> Hi Dong,
>
> Sorry for the late reply.
>
> > The rationale is that if there is any strategy that is triggered and says
> > backlog=true, then job's backlog should be true. Otherwise,
Hi Dong,
Sorry for the late reply.
> The rationale is that if there is any strategy that is triggered and says
> backlog=true, then job's backlog should be true. Otherwise, the job's
> backlog status is false.
I'm quite confused about this. Does that mean, if the source is in the
changelog
Hi Jark,
Do you have time to comment on whether the current design looks good?
I plan to start voting in 3 days if there is no follow-up comment.
Thanks,
Dong
On Fri, Sep 15, 2023 at 2:01 PM Jark Wu wrote:
> Hi Dong,
>
> > Note that we can not simply enforce the semantics of "any invocation
Hi Jark,
Please see my reply inline.
On Fri, Sep 15, 2023 at 2:01 PM Jark Wu wrote:
> Hi Dong,
>
> > Note that we can not simply enforce the semantics of "any invocation of
> > setIsProcessingBacklog(false) will set the job's backlog status to
> false".
> > Suppose we have a job with two
Hi Dong,
> Note that we can not simply enforce the semantics of "any invocation of
> setIsProcessingBacklog(false) will set the job's backlog status to false".
> Suppose we have a job with two operators, where operatorA invokes
> setIsProcessingBacklog(false) and operatorB invokes
>
Hi Jark,
Please see my comments inline.
On Fri, Sep 15, 2023 at 10:35 AM Jark Wu wrote:
> Hi Dong,
>
> Please see my comments inline below.
> > Hmm.. can you explain what you mean by "different watermark delay
> > definitions for each source"?
>
> For example, "table1" defines a watermark
;>>>> increasing if no data is generated in Kafka. The
> > watermark
> > > > > > >> lag and
> > > > > > >>>>>>>> backlog
> > > > > > >>>>>>>>>>
; >>>>>> those
> > > > > >>>>>>>>>> records can span from 1 day to 1 hour to 1 second. If
> the
> > > > > >> records
> > > > > >>>>>> span
> > > > > >>>>
gt; > > >> records,
> > > > > >>>>>> those
> > > > > >>>>>>>>>> records can span from 1 day to 1 hour to 1 second. If
> the
> > > > > >> records
> > > > > >>>>>> span
> > &g
> > >>>> If
> > > > >>>>>> we
> > > > >>>>>>>>>> encounter such a use case, we can create another FLIP to
> > > > >> address
> > > > >>>> those
> > > >
; cases.
> > > >>>>>>>>>>> However, it seems this is not a general solution for
> sources
> > > >> to
> > > >>>>>>>> determine
> > > >>>>>>>>>>> "isProce
creases
> > >>>> unlimited
> > >>>>>> if
> > >>>>>>>> no
> > >>>>>>>>>>> data is generated in the Kafka.
> > >>>>>>>>>>> But in this case, there i
;>>>>>>> What we need is something that reflects the number of records
> >>>>>>>> unprocessed
> >>>>>>>>>>> by the job.
> >>>>>>>>>>> Actually, that is the "pendingRecords&
t;>
>>>>>>>>>>> Best,
>>>>>>>>>>> Jark
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>>
>>>>>
t;>>
> > > > > >>>>>>
> > > > > >>>>>> On Tue, Aug 15, 2023 at 11:30 AM Xuannan Su <
> > > suxuanna...@gmail.com
> > > > > >>>> <mailto:suxuanna...@gmail.com>> wrote:
> > > > > >>>>>
configuration key
> > provides
> > > > us
> > > > >>>>>> with
> > > > >>>>>>> the flexibility to switch to other approaches to calculate the
> > lag
> > > > in
> > > > >
With
> > > the
> > > >>>>>>> potential introduction of the Generalized Watermark mechanism
> in
> > > the
> > > >>>>>>> future, if I understand correctly, a watermark won't
> necessarily
> > > need
> > > >>>> to
>
gt;>> For the reasons above, I prefer introducing the configuration as
> > is,
> > >>>> and
> > >>>>>>> change it later with the a deprecation process or migration
> > process.
> > >>>> What
> > >>>>&
not expose this detail via the
> >>>>>>> configuration
> >>>>>>>> option. To be specific, I suggest not mentioning the "watermark"
> >>>>>> keyword
> >>>>>>> in
> >>>>>>>> th
for the reasons you've already mentioned. However, it's
> > >>>> not
> > >>>>>> the only possible option. Hiding this detail from users would give
> > us
> > >>>> the
> > >>>>>> flexibility to switch to other
e given threshold, Flink will consider latency of
>>>>>>>> individual records as less important and prioritize throughput over
>>>> it.
>>>>>>>> They don't really need the details of how the lags are calculated.
>>>>>>>> -
>>> Generalized Watermark mechanism, where basically the watermark can
> be
> >>>>>> anything that needs to travel along the data-flow with certain
> >>>> alignment
> >>>>>> strategies, and event time watermark would be one specific case o
gt;>>> Generalized Watermark mechanism, where basically the watermark can
> be
> >>>>>> anything that needs to travel along the data-flow with certain
> >>>> alignment
> >>>>>> strategies, and event time watermark would be one specific cas
t;
>>>>>> I do not intend to block the FLIP on this. I'd also be fine with
>>>>>> introducing the configuration as is, and changing it later, if needed,
>>>>> with
>>>>>> a regular deprecation and migration process. Just making
>>>>
> >>>> Xintong
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Aug 14, 2023 at 12:00 PM Xuannan Su <mailto:suxuanna...@gmail.com>>
> >>> wrote:
> >>>>
> >>>>> Hi Xintong,
> >>>
inition,
>>>>> watermark is the time progress indication in the data stream. It
>>> indicates
>>>>> the stream’s event time has progressed to some specific time. On the
>>> other
>>>>> hand, timestamp in the records is usually used
ppears more appropriate and intuitive to calculate the
> > event
> > > > time lag by watermark and determine the backlog status. And by using
> > the
> > > > watermark, we can easily deal with the out-of-order and the idleness
> > of the
> > > > dat
gt; > > Xuannan
> > > On Aug 10, 2023, 20:23 +0800, Xintong Song ,
> wrote:
> > > > Thanks for preparing the FLIP, Xuannan.
> > > >
> > > > +1 in general.
> > > >
> > > > A quick question, could you explain why we are relying on the
>
any concern in using watermarks. Just wondering if there's any
> > > deep considerations behind this.
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Thu, Aug 3, 2023 at 3:03 PM Xuannan Su wro
termark
> for
> > emitting the record attribute? Why not use timestamps in the records? I
> > don't see any concern in using watermarks. Just wondering if there's any
> > deep considerations behind this.
> >
> > Best,
> >
> > Xintong
> >
&g
an Su wrote:
>
> > Hi all,
> >
> > I am opening this thread to discuss FLIP-328: Allow source operators to
> > determine isProcessingBacklog based on watermark lag[1]. We had a several
> > discussions with Dong Ling about the design, and thanks for all the
> &g
behind this.
Best,
Xintong
On Thu, Aug 3, 2023 at 3:03 PM Xuannan Su wrote:
> Hi all,
>
> I am opening this thread to discuss FLIP-328: Allow source operators to
> determine isProcessingBacklog based on watermark lag[1]. We had a several
> discussions with Dong Ling a
Hi all,
I am opening this thread to discuss FLIP-328: Allow source operators to
determine isProcessingBacklog based on watermark lag[1]. We had a several
discussions with Dong Ling about the design, and thanks for all the valuable
advice.
The FLIP aims to target the use-case where user want
36 matches
Mail list logo