Re: [EXTERNAL]Re: Flink Kubernetes Operator - Deadlock when Cluster Cleanup Fails

2024-02-13 Thread Niklas Wilcke
Hi Mate, thanks for creating the issue and pointing it out. I think the issue you created is a bit more specific than my whole point. It rather focuses on the taskmanagers, which is of course fine. From my point of view the following two things are the low hanging fruits: 1. Improving the

Re: Flink Kubernetes Operator - Deadlock when Cluster Cleanup Fails

2024-02-13 Thread Mate Czagany
Hi, I have opened a JIRA [1] as I had the same error (AlreadyExists) last week and I could pinpoint the problem to the TaskManagers being still alive when creating the new Deployment. In native mode we only check for the JobManagers when we wait for the cluster to shut down in contrast to

Re: [EXTERNAL]Re: Flink Kubernetes Operator - Deadlock when Cluster Cleanup Fails

2024-02-13 Thread Niklas Wilcke
Hi Gyula, thanks for the advise. I requested a Jira account and will try to open a ticket as soon as I get access. Cheers, Niklas > On 13. Feb 2024, at 09:13, Gyula Fóra wrote: > > Hi Niklas! > > The best way to report the issue would be to open a JIRA ticket with the same > detailed

Re: Flink Kubernetes Operator - Deadlock when Cluster Cleanup Fails

2024-02-13 Thread Gyula Fóra
Hi Niklas! The best way to report the issue would be to open a JIRA ticket with the same detailed information. Otherwise I think your observations are correct and this is indeed a frequent problem that comes up, it would be good to improve on it. In addition to improving logging we could also

Flink Kubernetes Operator - Deadlock when Cluster Cleanup Fails

2024-02-12 Thread Niklas Wilcke
Hi Flink Kubernetes Operator Community, I hope this is the right way to report an issue with the Apache Flink Kubernetes Operator. We are experiencing problems with some streaming job clusters which end up in a terminated state, because of the operator not behaving as expected. The problem is