Re: Flink failure rate restart not work as expect

2022-03-02 Thread Zhilong Hong
n, thx for your time to reply, that help us a lot. > -------------- > *发件人:* 刘 家锹 > *发送时间:* 2022年3月1日 23:06 > *收件人:* Matthias Pohl ; user ; > David Morávek > *主题:* Re: Flink failure rate restart not work as expect > > I realized I missed mentioning something above,

回复: Flink failure rate restart not work as expect

2022-03-01 Thread 刘 家锹
3:06 收件人: Matthias Pohl ; user ; David Morávek 主题: Re: Flink failure rate restart not work as expect I realized I missed mentioning something above, the container exit code is 163, which is not the normal code, at least I can’t find any meaning from google. So, my test didn’t cover this situation, I

Re: Flink failure rate restart not work as expect

2022-03-01 Thread 刘 家锹
kef> 发件人: 刘 家锹 发送时间: Tuesday, March 1, 2022 10:23:50 PM 收件人: Matthias Pohl ; user ; David Morávek 主题: Re: Flink failure rate restart not work as expect We didn't find any obvious configuration issues in our cluster. As far as I know, It works fine in most ca

Re: Flink failure rate restart not work as expect

2022-03-01 Thread 刘 家锹
vek 主题: Re: Flink failure rate restart not work as expect The YARN node manager logs support my observation: The container exits with a failure which, if I understand it correctly, should cause a container restart on the YARN side. In HA mode, Flink expects the underlying resource management to

Re: Flink failure rate restart not work as expect

2022-03-01 Thread Matthias Pohl
link.apache.org < > user@flink.apache.org> > *主题:* Re: Flink failure rate restart not work as expect > > Hi, > I second Alex' observation - based on the logs it looks like the task > restart functionality worked as expected: It tried to restart the tasks > until it r

Re: Flink failure rate restart not work as expect

2022-03-01 Thread Matthias Pohl
Hi, I second Alex' observation - based on the logs it looks like the task restart functionality worked as expected: It tried to restart the tasks until it reached the limit of 4 attempts due to the missing TaskManager. The job-cluster shut down with an error code. At this point, YARN should pick it

Re: Flink failure rate restart not work as expect

2022-02-28 Thread Alexander Preuß
Hi, from a first glance it looks like the exception was thrown very rapidly so it exceeded the maxFailuresPerInterval and the FailureRestartStrategy decided not to restart. Why do you think this is different from the expected behavior? Best, Alex On Tue, Mar 1, 2022 at 3:23 AM 刘 家锹 wrote: > Hi,

Flink failure rate restart not work as expect

2022-02-28 Thread 刘 家锹
Hi, all We encounter some problem with FailureRateRestartStrategy, which confuse us and don't know how to solove it. Here's the situation: Flink version: 1.10.1 Development env: on Yarn FailureRateRestartStrategy: failuresIntervalMS=6,backoffTimeMS=15000,maxFailuresPerInterval=4 One of our