subject:"Kubernetes HA\: New jobs stuck in Initializing for a long time after a certain number of existing jobs are running"

Re: Kubernetes HA: New jobs stuck in Initializing for a long time after a certain number of existing jobs are running

2021-11-22 Thread Yang Wang

I believe this issue[1] is related and has been fixed in 1.13.0 and 1.12.3. [1]. https://issues.apache.org/jira/browse/FLINK-22006 Best, Yang Matthias Pohl 于2021年11月22日周一下午9:19写道： > Hi Joey, > that looks like a cluster configuration issue. The 192.168.100.79:6123 is > not accessible from th

Re: Kubernetes HA: New jobs stuck in Initializing for a long time after a certain number of existing jobs are running

2021-11-22 Thread Joey L

Hi Matthias, Thanks for the response. I actually found the root issue a while after posting the question, and it is related to this JIRA ticket: https://issues.apache.org/jira/browse/FLINK-22006 It appears to be a limit on the concurrent configmaps K8s can watch, and adding this to my config work

Re: Kubernetes HA: New jobs stuck in Initializing for a long time after a certain number of existing jobs are running

2021-11-22 Thread Matthias Pohl

Hi Joey, that looks like a cluster configuration issue. The 192.168.100.79:6123 is not accessible from the JobManager pod (see line 1224f in the provided JM logs): 2021-11-19 04:06:45,049 WARN akka.remote.transport.netty.NettyTransport [] - Remote connection to [null] failed w