We're using the Helm chart to deploy the operator right now, and the image that I'm using was downloaded from Docker Hub: https://hub.docker.com/r/apache/flink-kubernetes-operator/tags. I wouldn't be able to use the release-1.6 branch ( https://github.com/apache/flink-kubernetes-operator/commits/release-1.6) to pick up the fix, unless I'm missing something.
I was attempting to rollback the operator version to 1.4 today, and I ran into the following issues on some operator pods. I was wondering if you seen these Lease issues before. 2023-10-18 21:01:15,251 i.f.k.c.e.l.LeaderElector [ERROR] Exception occurred while releasing lock 'LeaseLock: flink-kubernetes-operator - flink-operator-lease (flink-kubernetes-operator-74f9688dd-bcqr2)' io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException: Unable to update LeaseLock at io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LeaseLock.update(LeaseLock.java:102) at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.release(LeaderElector.java:139) at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.stopLeading(LeaderElector.java:120) at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$start$2(LeaderElector.java:104) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) at io.fabric8.kubernetes.client.utils.Utils.lambda$null$12(Utils.java:523) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PUT at: https://10.241.0.1/apis/coordination.k8s.io/v1/namespaces/flink-kubernetes-operator/leases/flink-operator-lease. Message: Operation cannot be fulfilled on leases.coordination.k8s.io "flink-operator-lease": the object has been modified; please apply your changes to the latest version and try again. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=coordination.k8s.io, kind=leases, name=flink-operator-lease, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Operation cannot be fulfilled on leases.coordination.k8s.io "flink-operator-lease": the object has been modified; please apply your changes to the latest version and try again, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Conflict, status=Failure, additionalProperties={}). On Wed, Oct 18, 2023 at 2:55 PM Gyula Fóra <gyula.f...@gmail.com> wrote: > Hi! > Not sure if it’s the same but could you try picking up the fix from the > release branch and confirming that it solves the problem? > > If it does we may consider a quick bug fix release. > > Cheers > Gyula > > On Wed, 18 Oct 2023 at 18:09, Tony Chen <tony.ch...@robinhood.com> wrote: > >> Hi Flink Community, >> >> Most of the Flink applications run on 1.14 at my company. After upgrading >> the Flink Operator to 1.6, we've seen many jobmanager pods show >> "JobManagerDeploymentStatus: MISSING". >> >> Here are some logs from the operator pod on one of our Flink applications: >> >> [m [33m2023-10-18 02:02:40,823 [m [36mo.a.f.k.o.l.AuditUtils [m [32m[INFO >> ][nemo/nemo-streaming-users-identi-updates] >>> Event | Warning | >> SAVEPOINTERROR | Savepoint failed for savepointTriggerNonce: null >> ... >> [m [33m2023-10-18 02:02:40,883 [m [36mo.a.f.k.o.l.AuditUtils [m [32m[INFO >> ][nemo/nemo-streaming-users-identi-updates] >>> Event | Warning | >> CLUSTERDEPLOYMENTEXCEPTION | Status have been modified externally in >> version 17447422864 Previous: <redacted> >> ... >> [m [33m2023-10-18 02:02:40,919 [m [36mi.j.o.p.e.ReconciliationDispatcher >> [m [1;31m[ERROR][nemo/nemo-streaming-users-identi-updates] Error during >> event processing ExecutionScope{ resource id: >> ResourceID{name='nemo-streaming-users-identi-updates', namespace='nemo'}, >> version: 17447420285} failed. >> ... >> org.apache.flink.kubernetes.operator.exception.ReconciliationException: >> org.apache.flink.kubernetes.operator.exception.StatusConflictException: >> Status have been modified externally in version 17447422864 Previous: >> <redacted> >> ... >> [m [33m2023-10-18 02:03:03,273 [m [36mo.a.f.k.o.o.d.ApplicationObserver >> [m [1;31m[ERROR][nemo/nemo-streaming-users-identi-updates] Missing >> JobManager deployment >> ... >> [m [33m2023-10-18 02:03:03,295 [m [36mo.a.f.k.o.l.AuditUtils [m [32m[INFO >> ][nemo/nemo-streaming-users-identi-updates] >>> Event | Warning | MISSING | >> Missing JobManager deployment >> [m [33m2023-10-18 02:03:03,295 [m [36mo.a.f.c.Configuration [m [33m[WARN >> ][nemo/nemo-streaming-users-identi-updates] Config uses deprecated >> configuration key 'high-availability' instead of proper key >> 'high-availability.type' >> >> >> This seems related to this email thread: >> https://www.mail-archive.com/user@flink.apache.org/msg51439.html. >> However, I believe that we're not seeing the HA metadata getting deleted. >> >> What could cause the JobManagerDeploymentStatus to be MISSING? >> >> Thanks, >> Tony >> >> -- >> >> <http://www.robinhood.com/> >> >> Tony Chen >> >> Software Engineer >> >> Menlo Park, CA >> >> Don't copy, share, or use this email without permission. If you received >> it by accident, please let us know and then delete it right away. >> > -- <http://www.robinhood.com/> Tony Chen Software Engineer Menlo Park, CA Don't copy, share, or use this email without permission. If you received it by accident, please let us know and then delete it right away.