We're using the Helm chart to deploy the operator right now, and the image
that I'm using was downloaded from Docker Hub:
https://hub.docker.com/r/apache/flink-kubernetes-operator/tags. I wouldn't
be able to use the release-1.6 branch (
https://github.com/apache/flink-kubernetes-operator/commits/release-1.6) to
pick up the fix, unless I'm missing something.

I was attempting to rollback the operator version to 1.4 today, and I ran
into the following issues on some operator pods. I was wondering if you
seen these Lease issues before.

2023-10-18 21:01:15,251 i.f.k.c.e.l.LeaderElector      [ERROR] Exception
occurred while releasing lock 'LeaseLock: flink-kubernetes-operator -
flink-operator-lease (flink-kubernetes-operator-74f9688dd-bcqr2)'
io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException:
Unable to update LeaseLock
at
io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LeaseLock.update(LeaseLock.java:102)
at
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.release(LeaderElector.java:139)
at
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.stopLeading(LeaderElector.java:120)
at
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$start$2(LeaderElector.java:104)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
Source)
at io.fabric8.kubernetes.client.utils.Utils.lambda$null$12(Utils.java:523)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
Source)
at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure
executing: PUT at:
https://10.241.0.1/apis/coordination.k8s.io/v1/namespaces/flink-kubernetes-operator/leases/flink-operator-lease.
Message: Operation cannot be fulfilled on leases.coordination.k8s.io
"flink-operator-lease":
the object has been modified; please apply your changes to the latest
version and try again. Received status: Status(apiVersion=v1, code=409,
details=StatusDetails(causes=[], group=coordination.k8s.io, kind=leases,
name=flink-operator-lease, retryAfterSeconds=null, uid=null,
additionalProperties={}), kind=Status, message=Operation cannot be
fulfilled on leases.coordination.k8s.io "flink-operator-lease": the object
has been modified; please apply your changes to the latest version and try
again, metadata=ListMeta(_continue=null, remainingItemCount=null,
resourceVersion=null, selfLink=null, additionalProperties={}),
reason=Conflict, status=Failure, additionalProperties={}).

On Wed, Oct 18, 2023 at 2:55 PM Gyula Fóra <gyula.f...@gmail.com> wrote:

> Hi!
> Not sure if it’s the same but could you try picking up the fix from the
> release branch and confirming that it solves the problem?
>
> If it does we may consider a quick bug fix release.
>
> Cheers
> Gyula
>
> On Wed, 18 Oct 2023 at 18:09, Tony Chen <tony.ch...@robinhood.com> wrote:
>
>> Hi Flink Community,
>>
>> Most of the Flink applications run on 1.14 at my company. After upgrading
>> the Flink Operator to 1.6, we've seen many jobmanager pods show
>> "JobManagerDeploymentStatus: MISSING".
>>
>> Here are some logs from the operator pod on one of our Flink applications:
>>
>> [m [33m2023-10-18 02:02:40,823 [m [36mo.a.f.k.o.l.AuditUtils [m [32m[INFO
>> ][nemo/nemo-streaming-users-identi-updates] >>> Event | Warning |
>> SAVEPOINTERROR | Savepoint failed for savepointTriggerNonce: null
>> ...
>> [m [33m2023-10-18 02:02:40,883 [m [36mo.a.f.k.o.l.AuditUtils [m [32m[INFO
>> ][nemo/nemo-streaming-users-identi-updates] >>> Event | Warning |
>> CLUSTERDEPLOYMENTEXCEPTION | Status have been modified externally in
>> version 17447422864 Previous: <redacted>
>> ...
>> [m [33m2023-10-18 02:02:40,919 [m [36mi.j.o.p.e.ReconciliationDispatcher
>> [m [1;31m[ERROR][nemo/nemo-streaming-users-identi-updates] Error during
>> event processing ExecutionScope{ resource id:
>> ResourceID{name='nemo-streaming-users-identi-updates', namespace='nemo'},
>> version: 17447420285} failed.
>> ...
>> org.apache.flink.kubernetes.operator.exception.ReconciliationException:
>> org.apache.flink.kubernetes.operator.exception.StatusConflictException:
>> Status have been modified externally in version 17447422864 Previous:
>> <redacted>
>> ...
>> [m [33m2023-10-18 02:03:03,273 [m [36mo.a.f.k.o.o.d.ApplicationObserver
>> [m [1;31m[ERROR][nemo/nemo-streaming-users-identi-updates] Missing
>> JobManager deployment
>> ...
>> [m [33m2023-10-18 02:03:03,295 [m [36mo.a.f.k.o.l.AuditUtils [m [32m[INFO
>> ][nemo/nemo-streaming-users-identi-updates] >>> Event | Warning | MISSING |
>> Missing JobManager deployment
>> [m [33m2023-10-18 02:03:03,295 [m [36mo.a.f.c.Configuration [m [33m[WARN
>> ][nemo/nemo-streaming-users-identi-updates] Config uses deprecated
>> configuration key 'high-availability' instead of proper key
>> 'high-availability.type'
>>
>>
>> This seems related to this email thread:
>> https://www.mail-archive.com/user@flink.apache.org/msg51439.html.
>> However, I believe that we're not seeing the HA metadata getting deleted.
>>
>> What could cause the JobManagerDeploymentStatus to be MISSING?
>>
>> Thanks,
>> Tony
>>
>> --
>>
>> <http://www.robinhood.com/>
>>
>> Tony Chen
>>
>> Software Engineer
>>
>> Menlo Park, CA
>>
>> Don't copy, share, or use this email without permission. If you received
>> it by accident, please let us know and then delete it right away.
>>
>

-- 

<http://www.robinhood.com/>

Tony Chen

Software Engineer

Menlo Park, CA

Don't copy, share, or use this email without permission. If you received it
by accident, please let us know and then delete it right away.

Reply via email to