Hi,
We using flink 1.14.4 with flink kubernetes operator.

Sometimes when updating a job, it fails on startup and flink removes all HA 
metadata and exits the jobmanager.


2022-09-14 14:54:44,534 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Restoring job 
00000000000000000000000000000000 from Checkpoint 30829 @ 1663167158684 for 
00000000000000000000000000000000 located at 
s3p://flink-checkpoints/k8s-checkpoint-job-name/00000000000000000000000000000000/chk-30829.
2022-09-14 14:54:44,638 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Job 
00000000000000000000000000000000 reached terminal state FAILED.
org.apache.flink.runtime.client.JobInitializationException: Could not start the 
JobMaster.
Caused by: java.util.concurrent.CompletionException: 
java.lang.IllegalStateException: There is no operator for the state 
4e1d9dde287c33a35e7970cbe64a40fe
2022-09-14 14:54:44,930 ERROR 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error 
occurred in the cluster entrypoint.
2022-09-14 14:54:45,020 INFO  
org.apache.flink.kubernetes.highavailability.KubernetesHaServices [] - Clean up 
the high availability data for job 00000000000000000000000000000000.
2022-09-14 14:54:45,020 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Shutting 
KubernetesApplicationClusterEntrypoint down with application status UNKNOWN. 
Diagnostics Cluster entrypoint has been closed externally..
2022-09-14 14:54:45,026 INFO  
org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Shutting 
down rest endpoint.
2022-09-14 14:54:46,122 INFO  
akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Shutting down 
remote daemon.
2022-09-14 14:54:46,321 INFO  
akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remoting shut 
down.


Kubernetes restarts the pod jobmanager and the new instance, not finding the HA 
metadata, starts the job from an empty state.
Is there some option to prevent jobmanager from exiting after an job FAILED 
state?


________________________________
"This message contains confidential information/commercial secret. If you are 
not the intended addressee of this message you may not copy, save, print or 
forward it to any third party and you are kindly requested to destroy this 
message and notify the sender thereof by email.
Данное сообщение содержит конфиденциальную информацию/информацию, являющуюся 
коммерческой тайной. Если Вы не являетесь надлежащим адресатом данного 
сообщения, Вы не вправе копировать, сохранять, печатать или пересылать его 
каким либо иным лицам. Просьба уничтожить данное сообщение и уведомить об этом 
отправителя электронным письмом."

Reply via email to