您好,我刚刚开始使用 flink 1.12.1 HA on
k8s,发现jobmanager大约半小时左右会restart,都是这种错误,您遇到过吗?谢谢!

2021-01-17 04:52:12,399 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Suspending
SlotPool.
2021-01-17 04:52:12,399 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
          
[] - Close ResourceManager connection 28ed7c84e7f395c5a34880df91b251c6:
Stopping JobMaster for job p_port_traffic_5m@hive->mysql @2021-01-17
11:40:00(67fb9b15d0deff998e287aa7e2cf1c7b)..
2021-01-17 04:52:12,399 INFO 
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Stopping
SlotPool.
2021-01-17 04:52:12,399 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] -
Disconnect job manager
8c450d0051eff8c045adb76cb9ec4...@akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_32
for job 67fb9b15d0deff998e287aa7e2cf1c7b from the resource manager.
2021-01-17 04:52:12,399 INFO 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] -
Stopping DefaultLeaderElectionService.
2021-01-17 04:52:12,399 INFO 
org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver
[] - Closing
KubernetesLeaderElectionDriver{configMapName='test-flink-etl-67fb9b15d0deff998e287aa7e2cf1c7b-jobmanager-leader'}.
2021-01-17 04:52:12,399 INFO 
org.apache.flink.kubernetes.kubeclient.resources.KubernetesConfigMapWatcher
[] - The watcher is closing.
2021-01-17 04:52:12,416 INFO 
org.apache.flink.runtime.jobmanager.DefaultJobGraphStore     [] - Removed
job graph 67fb9b15d0deff998e287aa7e2cf1c7b from
KubernetesStateHandleStore{configMapName='test-flink-etl-dispatcher-leader'}.
2021-01-17 04:52:30,686 ERROR
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] -
Fatal error occurred in ResourceManager.
org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error
while watching the ConfigMap
test-flink-etl-12c0ac13184d3d98af71dadbc4a81d03-jobmanager-leader
        at
org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_275]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_275]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]
2021-01-17 04:52:30,691 ERROR
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal
error occurred in the cluster entrypoint.
org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error
while watching the ConfigMap
test-flink-etl-12c0ac13184d3d98af71dadbc4a81d03-jobmanager-leader
        at
org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
[flink-dist_2.11-1.12.1.jar:1.12.1]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_275]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_275]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]
2021-01-17 04:52:30,693 INFO  org.apache.flink.runtime.blob.BlobServer          
          
[] - Stopped BLOB server at 0.0.0.0:6124




--
Sent from: http://apache-flink.147419.n8.nabble.com/

回复