n, thx for your time to reply, that help us a lot.
> --------------
> *发件人:* 刘 家锹
> *发送时间:* 2022年3月1日 23:06
> *收件人:* Matthias Pohl ; user ;
> David Morávek
> *主题:* Re: Flink failure rate restart not work as expect
>
> I realized I missed mentioning something above,
3:06
收件人: Matthias Pohl ; user ;
David Morávek
主题: Re: Flink failure rate restart not work as expect
I realized I missed mentioning something above, the container exit code is 163,
which is not the normal code, at least I can’t find any meaning from google.
So, my test didn’t cover this situation, I
kef>
发件人: 刘 家锹
发送时间: Tuesday, March 1, 2022 10:23:50 PM
收件人: Matthias Pohl ; user ;
David Morávek
主题: Re: Flink failure rate restart not work as expect
We didn't find any obvious configuration issues in our cluster. As far as I
know, It works fine in most ca
vek
主题: Re: Flink failure rate restart not work as expect
The YARN node manager logs support my observation: The container exits with a
failure which, if I understand it correctly, should cause a container restart
on the YARN side. In HA mode, Flink expects the underlying resource management
to
link.apache.org <
> user@flink.apache.org>
> *主题:* Re: Flink failure rate restart not work as expect
>
> Hi,
> I second Alex' observation - based on the logs it looks like the task
> restart functionality worked as expected: It tried to restart the tasks
> until it r
Hi,
I second Alex' observation - based on the logs it looks like the task
restart functionality worked as expected: It tried to restart the tasks
until it reached the limit of 4 attempts due to the missing TaskManager.
The job-cluster shut down with an error code. At this point, YARN should
pick it
Hi,
from a first glance it looks like the exception was thrown very rapidly so
it exceeded the maxFailuresPerInterval and the FailureRestartStrategy
decided not to restart. Why do you think this is different from the
expected behavior?
Best,
Alex
On Tue, Mar 1, 2022 at 3:23 AM 刘 家锹 wrote:
> Hi,
Hi, all
We encounter some problem with FailureRateRestartStrategy, which confuse us and
don't know how to solove it. Here's the situation:
Flink version: 1.10.1
Development env: on Yarn
FailureRateRestartStrategy:
failuresIntervalMS=6,backoffTimeMS=15000,maxFailuresPerInterval=4
One of our