If you could find the "Deregistering Flink Kubernetes cluster, clusterId"
in the JobManager log, then it is not the expected behavior.

Having the full logs of JobManager Pod before restarted will help a lot.



Best,
Yang

On Fri, Feb 2, 2024 at 1:26 PM Liting Liu (litiliu) via user <
user@flink.apache.org> wrote:

> Hi, community:
> I'm running a Flink 1.14.3 job with flink-Kubernetes-operator-1.6.0 on the
> AWS. I found my flink jobmananger container's thread restarted after this
> flinkdeployment has been requested to stop, here is the log of jobmanager:
>
> 2024-02-01 21:57:48,977 tn="flink-akka.actor.default-dispatcher-107478"
> INFO
>  org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap
> [] - Application CANCELED:
> java.util.concurrent.CompletionException:
> org.apache.flink.client.deployment.application.UnsuccessfulExecutionException:
> Application Status: CANCELED
> at
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.lambda$unwrapJobResultException$6(ApplicationDispatcherBootstrap.java:353)
> ~[flink-dist_2.11-1.14.3.jar:1.14.3]
> at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> ~[?:1.8.0_322]
> 2024-02-01 21:57:48,984 tn="flink-akka.actor.default-dispatcher-107484"
> INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] -
> Shutting down rest endpoint.
> 2024-02-01 21:57:49,103 tn="flink-akka.actor.default-dispatcher-107478"
> INFO
>  
> org.apache.flink.runtime.entrypoint.component.DispatcherResourceManagerComponent
> [] - Closing components.
> 2024-02-01 21:57:49,105 tn="flink-akka.actor.default-dispatcher-107484"
> INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] -
> Stopped dispatcher akka.tcp://flink@
> 2024-02-01 21:57:49,112
> tn="AkkaRpcService-Supervisor-Termination-Future-Executor-thread-1" INFO
>  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Stopping
> Akka RPC service.
> 2024-02-01 21:57:49,286 tn="flink-metrics-15" INFO
>  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remoting
> shut down.
> 2024-02-01 21:57:49,387 tn="main" INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -
> Terminating cluster entrypoint process
> KubernetesApplicationClusterEntrypoint with exit code 0.
> 2024-02-01 21:57:53,828 tn="main" INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -
> -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties
> 2024-02-01 21:57:54,287 tn="main" INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Starting
> KubernetesApplicationClusterEntrypoint.
>
>
> I found the JM main container's containerId remains the same, after the JM
> auto-restart.
> why did this process start to run after it had been requested to stop?
>
>

Reply via email to