Do you have advice on how to determine why a checkpoint failed?  1. Timeout
(that's easy to discover as the UI logs them). 2. Other errors are not so
easy to find. How can I find other errors?  Are they in the UI, or good
old-fashioned logging?

On Fri, Jan 29, 2021 at 3:11 AM Congxian Qiu <qcx978132...@gmail.com> wrote:

> Hi Marco
>      You need to figure out why the checkpoint timed out(you can see the
> consumed time of each period for one checkpoint in UI), if it indeed needs
> such long time to complete the checkpoint, then you need to configure a
> longer timeout.
>      If there are some checkpoint errors, we need first to figure out what
> the problem is, in general, a checkpoint can split into some parts such as
> barrie alignment(maybe there is some backpressure or something else, that
> some barrier can't be received in time), sync duration(the thread is too
> busy ...), and async duration(too much io/network process ...), etc.
>
> Best,
> Congxian
>
>
> Marco Villalobos <mvillalo...@kineteque.com> 于2021年1月29日周五 上午7:19写道:
>
>> I am kind of stuck in determining how large a checkpoint interval should
>> be.
>>
>> Is there a guide for that?  If a timeout time is 10 minutes, we time out,
>> what is a good strategy for adjusting that?
>>
>> Where is a good starting point for a checkpoint? How shall they be
>> adjusted?
>>
>> We often see checkpoint errors during our onTimer calls, I don't know if
>> that's related.
>>
>> Marco A. Villalobos
>>
>>
>>

Reply via email to