看着是有很多Connecting websocket 和 Scheduling reconnect task的log
我觉得还是你的Pod和APIServer的网络不是很稳定

另外,可以的话,你把DEBUG级别的JobManager完整log发一下

Best,
Yang

macdoor <macd...@gmail.com> 于2021年1月19日周二 上午9:31写道:

> 多谢!打开了DEBUG日志,仍然只有最后一个ERROR,不过之前有不少包含
> kubernetes.client.dsl.internal.WatchConnectionManager  的日志,grep
> 了一部分,能看出些什么吗?
>
> job-debug-0118.log:2021-01-19 02:12:25,551 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket successfully opened
> job-debug-0118.log:2021-01-19 02:12:25,646 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Connecting websocket ...
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@2553d42c
> job-debug-0118.log:2021-01-19 02:12:25,647 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket successfully opened
> job-debug-0118.log:2021-01-19 02:12:30,128 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Connecting websocket ...
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@5a9fa83e
> job-debug-0118.log:2021-01-19 02:12:30,176 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket successfully opened
> job-debug-0118.log:2021-01-19 02:12:39,028 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force
> closing the watch
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@2553d42c
> job-debug-0118.log:2021-01-19 02:12:39,028 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Closing websocket
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws
> .RealWebSocket@15b15029
> job-debug-0118.log:2021-01-19 02:12:39,030 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket close received. code: 1000, reason:
> job-debug-0118.log:2021-01-19 02:12:39,030 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Ignoring onClose for already closed/closing websocket
> job-debug-0118.log:2021-01-19 02:12:39,031 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force
> closing the watch
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@2cdbe5a0
> job-debug-0118.log:2021-01-19 02:12:39,031 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Closing websocket
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws
> .RealWebSocket@1e3f5396
> job-debug-0118.log:2021-01-19 02:12:39,033 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket close received. code: 1000, reason:
> job-debug-0118.log:2021-01-19 02:12:39,033 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Ignoring onClose for already closed/closing websocket
> job-debug-0118.log:2021-01-19 02:12:42,677 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Connecting websocket ...
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@210aab4b
> job-debug-0118.log:2021-01-19 02:12:42,678 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket successfully opened
> job-debug-0118.log:2021-01-19 02:12:42,920 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Connecting websocket ...
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@278d8398
> job-debug-0118.log:2021-01-19 02:12:42,921 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket successfully opened
> job-debug-0118.log:2021-01-19 02:12:45,130 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Connecting websocket ...
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@4b318628
> job-debug-0118.log:2021-01-19 02:12:45,132 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket successfully opened
> job-debug-0118.log:2021-01-19 02:13:05,927 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force
> closing the watch
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@278d8398
> job-debug-0118.log:2021-01-19 02:13:05,927 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Closing websocket
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws
> .RealWebSocket@69d1ebd2
> job-debug-0118.log:2021-01-19 02:13:05,930 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket close received. code: 1000, reason:
> job-debug-0118.log:2021-01-19 02:13:05,930 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Ignoring onClose for already closed/closing websocket
> job-debug-0118.log:2021-01-19 02:13:05,940 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force
> closing the watch
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@210aab4b
> job-debug-0118.log:2021-01-19 02:13:05,940 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Closing websocket
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws
> .RealWebSocket@3db9d8d8
> job-debug-0118.log:2021-01-19 02:13:05,942 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket close received. code: 1000, reason:
> job-debug-0118.log:2021-01-19 02:13:05,942 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Ignoring onClose for already closed/closing websocket
> job-debug-0118.log:2021-01-19 02:13:08,378 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Connecting websocket ...
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@4dcf905
> job-debug-0118.log:2021-01-19 02:13:08,381 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket successfully opened
> job-debug-0118.log:2021-01-19 02:13:08,471 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Connecting websocket ...
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@428ca061
> job-debug-0118.log:2021-01-19 02:13:08,472 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket successfully opened
> job-debug-0118.log:2021-01-19 02:13:10,127 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Connecting websocket ...
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@46b49e58
> job-debug-0118.log:2021-01-19 02:13:10,128 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket successfully opened
> job-debug-0118.log:2021-01-19 02:13:21,625 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force
> closing the watch
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@428ca061
> job-debug-0118.log:2021-01-19 02:13:21,625 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Closing websocket
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws
> .RealWebSocket@14e16427
> job-debug-0118.log:2021-01-19 02:13:21,627 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket close received. code: 1000, reason:
> job-debug-0118.log:2021-01-19 02:13:21,627 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Ignoring onClose for already closed/closing websocket
> job-debug-0118.log:2021-01-19 02:13:21,628 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force
> closing the watch
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@4dcf905
> job-debug-0118.log:2021-01-19 02:13:21,628 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Closing websocket
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws
> .RealWebSocket@11708e54
> job-debug-0118.log:2021-01-19 02:13:21,630 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket close received. code: 1000, reason:
> job-debug-0118.log:2021-01-19 02:13:21,630 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Ignoring onClose for already closed/closing websocket
> job-debug-0118.log:2021-01-19 02:13:25,680 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Connecting websocket ...
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@3ba4abd7
> job-debug-0118.log:2021-01-19 02:13:25,681 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket successfully opened
> job-debug-0118.log:2021-01-19 02:13:25,908 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Connecting websocket ...
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@23fe4bdd
> job-debug-0118.log:2021-01-19 02:13:25,909 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket successfully opened
> job-debug-0118.log:2021-01-19 02:13:30,128 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Connecting websocket ...
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@5cf8bd92
> job-debug-0118.log:2021-01-19 02:13:30,175 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket successfully opened
> job-debug-0118.log:2021-01-19 02:13:46,104 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket close received. code: 1000, reason:
> job-debug-0118.log:2021-01-19 02:13:46,105 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Submitting reconnect task to the executor
> job-debug-0118.log:2021-01-19 02:13:46,113 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Scheduling reconnect task
> job-debug-0118.log:2021-01-19 02:13:46,117 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Current reconnect backoff is 1000 milliseconds (T0)
> job-debug-0118.log:2021-01-19 02:13:47,117 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Connecting websocket ...
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@23f03575
> job-debug-0118.log:2021-01-19 02:13:47,120 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket successfully opened
> job-debug-0118.log:     at
>
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
> job-debug-0118.log:     at
>
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
> job-debug-0118.log:     at
>
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
> job-debug-0118.log:     at
>
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
> job-debug-0118.log:     at
>
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
> job-debug-0118.log:     at
>
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>
>
> 最后的ERROR是这样的
>
> 2021-01-19 02:13:47,094 DEBUG
> org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Handling
> event from subtask 406 of source Source:
> HiveSource-snmpprobe.p_snmp_ifXTable: RequestSplitEvent (host='172.0.37.8')
> 2021-01-19 02:13:47,094 INFO
> org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] -
> Subtask 406 (on host '172.0.37.8') is requesting a file source split
> 2021-01-19 02:13:47,094 INFO
> org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] - No
> more splits available for subtask 406
> 2021-01-19 02:13:47,097 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> HiveSource-snmpprobe.p_snmp_ifXTable (318/458)
> (710557b37a1e03f0f462ab5303842489) switched from RUNNING to FINISHED.
> 2021-01-19 02:13:47,097 DEBUG
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Ignoring
> transition of vertex Source: HiveSource-snmpprobe.p_snmp_ifXTable (318/458)
> - execution #0 to FAILED while being FINISHED.
> 2021-01-19 02:13:47,097 DEBUG
> org.apache.flink.runtime.scheduler.SharedSlot
> [] - Remove logical slot (SlotRequestId{988c43b8a7b427ea962685f057438880})
> for execution vertex (id 605b35e407e90cda15ad084365733fdd_317) from the
> physical slot (SlotRequestId{37b03b71035c9d8c564bb7c299ee9b3d})
> 2021-01-19 02:13:47,097 DEBUG
> org.apache.flink.runtime.scheduler.SharedSlot
> [] - Release shared slot externally
> (SlotRequestId{37b03b71035c9d8c564bb7c299ee9b3d})
> 2021-01-19 02:13:47,097 DEBUG
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Releasing
> slot [SlotRequestId{37b03b71035c9d8c564bb7c299ee9b3d}] because: Slot is
> being returned from SlotSharingExecutionSlotAllocator.
> 2021-01-19 02:13:47,097 DEBUG
> org.apache.flink.runtime.scheduler.SharedSlot
> [] - Release shared slot (SlotRequestId{37b03b71035c9d8c564bb7c299ee9b3d})
> 2021-01-19 02:13:47,097 DEBUG
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] -
> Fulfilling
> pending slot request [SlotRequestId{bb5a48db898111288c811359cc2d7f51}] with
> slot [385153f7c5efff54be584439258f7352]
> 2021-01-19 02:13:47,097 DEBUG
> org.apache.flink.runtime.scheduler.SharedSlot
> [] - Allocated logical slot
> (SlotRequestId{78f370c05403ab3d703a8d89c19d23c8}) for execution vertex (id
> 605b35e407e90cda15ad084365733fdd_419) from the physical slot
> (SlotRequestId{bb5a48db898111288c811359cc2d7f51})
> 2021-01-19 02:13:47,097 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> HiveSource-snmpprobe.p_snmp_ifXTable (420/458)
> (d04edd6e11b7cdc9e88c0ab6d756fed2) switched from SCHEDULED to DEPLOYING.
> 2021-01-19 02:13:47,097 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Deploying
> Source: HiveSource-snmpprobe.p_snmp_ifXTable (420/458) (attempt #0) with
> attempt id d04edd6e11b7cdc9e88c0ab6d756fed2 to 172.0.42.250:6122-5d505f @
> 172-0-42-250.flink-taskmanager-query-state.gem-flink.svc.cluster.local
> (dataPort=40697) with allocation id 385153f7c5efff54be584439258f7352
> 2021-01-19 02:13:47,097 DEBUG
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] -
> Cancel slot request 4da96bc97ef9b47ba7e408c78835d75a.
> 2021-01-19 02:13:47,100 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> HiveSource-snmpprobe.p_snmp_ifXTable (413/458)
> (2124587d1641d6cb05c05dfc742e8423) switched from DEPLOYING to RUNNING.
> 2021-01-19 02:13:47,100 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> HiveSource-snmpprobe.p_snmp_ifXTable (414/458)
> (4aa88cc1b61dc5b056ad59d373392c2f) switched from DEPLOYING to RUNNING.
> 2021-01-19 02:13:47,100 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> HiveSource-snmpprobe.p_snmp_ifXTable (415/458)
> (225d9a604e6b852f2ea6e87ebcf3107c) switched from DEPLOYING to RUNNING.
> 2021-01-19 02:13:47,112 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> HiveSource-snmpprobe.p_snmp_ifXTable (417/458)
> (c1a78898e76b3f1761cd5be1913dd24c) switched from DEPLOYING to RUNNING.
> 2021-01-19 02:13:47,112 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> HiveSource-snmpprobe.p_snmp_ifXTable (418/458)
> (6deb899dbd8bf373b349b980d1e78506) switched from DEPLOYING to RUNNING.
> 2021-01-19 02:13:47,113 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> HiveSource-snmpprobe.p_snmp_ifXTable (416/458)
> (5a99fa4a1d8bdbf93345a9b20ae1fa91) switched from DEPLOYING to RUNNING.
> 2021-01-19 02:13:47,117 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> HiveSource-snmpprobe.p_snmp_ifXTable (406/458)
> (d1f9edd1bfdd80eef6b32b8850020130) switched from RUNNING to FINISHED.
> 2021-01-19 02:13:47,117 DEBUG
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Ignoring
> transition of vertex Source: HiveSource-snmpprobe.p_snmp_ifXTable (406/458)
> - execution #0 to FAILED while being FINISHED.
> 2021-01-19 02:13:47,117 DEBUG
> org.apache.flink.runtime.scheduler.SharedSlot
> [] - Remove logical slot (SlotRequestId{037efe676c5cec5fe6b549d3ebd5f72b})
> for execution vertex (id 605b35e407e90cda15ad084365733fdd_405) from the
> physical slot (SlotRequestId{a840e61c33cb3f250cfb54652c87aa64})
> 2021-01-19 02:13:47,117 DEBUG
> org.apache.flink.runtime.scheduler.SharedSlot
> [] - Release shared slot externally
> (SlotRequestId{a840e61c33cb3f250cfb54652c87aa64})
> 2021-01-19 02:13:47,117 DEBUG
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Releasing
> slot [SlotRequestId{a840e61c33cb3f250cfb54652c87aa64}] because: Slot is
> being returned from SlotSharingExecutionSlotAllocator.
> 2021-01-19 02:13:47,117 DEBUG
> org.apache.flink.runtime.scheduler.SharedSlot
> [] - Release shared slot (SlotRequestId{a840e61c33cb3f250cfb54652c87aa64})
> 2021-01-19 02:13:47,117 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> Connecting websocket ...
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@23f03575
> 2021-01-19 02:13:47,117 DEBUG
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] -
> Fulfilling
> pending slot request [SlotRequestId{538f35a507cd0949bf547588eb436b49}] with
> slot [ebf6ebbb9abe3a9e6ccb56e235b00b53]
> 2021-01-19 02:13:47,117 DEBUG
> org.apache.flink.runtime.scheduler.SharedSlot
> [] - Allocated logical slot
> (SlotRequestId{a9bac8016853f9e86963b8ee11dea18f}) for execution vertex (id
> 605b35e407e90cda15ad084365733fdd_420) from the physical slot
> (SlotRequestId{538f35a507cd0949bf547588eb436b49})
> 2021-01-19 02:13:47,117 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> HiveSource-snmpprobe.p_snmp_ifXTable (421/458)
> (706d012e7a572e1e5786536df9ab3bbb) switched from SCHEDULED to DEPLOYING.
> 2021-01-19 02:13:47,117 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Deploying
> Source: HiveSource-snmpprobe.p_snmp_ifXTable (421/458) (attempt #0) with
> attempt id 706d012e7a572e1e5786536df9ab3bbb to 172.0.37.8:6122-694869 @
> 172-0-37-8.flink-taskmanager-query-state.gem-flink.svc.cluster.local
> (dataPort=32959) with allocation id ebf6ebbb9abe3a9e6ccb56e235b00b53
> 2021-01-19 02:13:47,117 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> HiveSource-snmpprobe.p_snmp_ifXTable (407/458)
> (396cb3fdd115a31d8575407fa9ee6e07) switched from RUNNING to FINISHED.
> 2021-01-19 02:13:47,117 DEBUG
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Ignoring
> transition of vertex Source: HiveSource-snmpprobe.p_snmp_ifXTable (407/458)
> - execution #0 to FAILED while being FINISHED.
> 2021-01-19 02:13:47,117 DEBUG
> org.apache.flink.runtime.scheduler.SharedSlot
> [] - Remove logical slot (SlotRequestId{88e116a3dd0a40ef692734548aac9682})
> for execution vertex (id 605b35e407e90cda15ad084365733fdd_406) from the
> physical slot (SlotRequestId{9bb6a1762363d3996aded34c82abab54})
> 2021-01-19 02:13:47,117 DEBUG
> org.apache.flink.runtime.scheduler.SharedSlot
> [] - Release shared slot externally
> (SlotRequestId{9bb6a1762363d3996aded34c82abab54})
> 2021-01-19 02:13:47,117 DEBUG
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Releasing
> slot [SlotRequestId{9bb6a1762363d3996aded34c82abab54}] because: Slot is
> being returned from SlotSharingExecutionSlotAllocator.
> 2021-01-19 02:13:47,117 DEBUG
> org.apache.flink.runtime.scheduler.SharedSlot
> [] - Release shared slot (SlotRequestId{9bb6a1762363d3996aded34c82abab54})
> 2021-01-19 02:13:47,117 DEBUG
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] -
> Fulfilling
> pending slot request [SlotRequestId{88a7127cc4a86be0a962b9aa68d4feff}] with
> slot [a30c9937af2c6de7ab471086cc9268f5]
> 2021-01-19 02:13:47,117 DEBUG
> org.apache.flink.runtime.scheduler.SharedSlot
> [] - Allocated logical slot
> (SlotRequestId{da65200b5d50dcaaaff3f4373dd824c4}) for execution vertex (id
> 605b35e407e90cda15ad084365733fdd_421) from the physical slot
> (SlotRequestId{88a7127cc4a86be0a962b9aa68d4feff})
> 2021-01-19 02:13:47,117 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> HiveSource-snmpprobe.p_snmp_ifXTable (422/458)
> (f12be47a0d11892e411d1afcb928b55a) switched from SCHEDULED to DEPLOYING.
> 2021-01-19 02:13:47,117 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Deploying
> Source: HiveSource-snmpprobe.p_snmp_ifXTable (422/458) (attempt #0) with
> attempt id f12be47a0d11892e411d1afcb928b55a to 172.0.37.8:6122-694869 @
> 172-0-37-8.flink-taskmanager-query-state.gem-flink.svc.cluster.local
> (dataPort=32959) with allocation id a30c9937af2c6de7ab471086cc9268f5
> 2021-01-19 02:13:47,117 DEBUG
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] -
> Cancel slot request 0a1cfa83b0d664615e0b9e1f938d7dee.
> 2021-01-19 02:13:47,117 DEBUG
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] -
> Cancel slot request c1d63c4ffdf6e17212c4ca6be4071850.
> 2021-01-19 02:13:47,120 DEBUG
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] -
> WebSocket successfully opened
> 2021-01-19 02:13:47,123 ERROR
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] -
> Fatal error occurred in ResourceManager.
> org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error
> while watching the ConfigMap
> test-flink-etl-cb1c647ea7488765fd3e8cc1dc691e46-jobmanager-leader
>         at
>
> org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
>
> org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
>
> io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
>
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
>
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
>
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws
> .RealWebSocket.onReadMessage(RealWebSocket.java:323)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws
> .WebSocketReader.readMessageFrame(WebSocketReader.java:219)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws
> .WebSocketReader.processNextFrame(WebSocketReader.java:105)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws
> .RealWebSocket.loopReader(RealWebSocket.java:274)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws
> .RealWebSocket$2.onResponse(RealWebSocket.java:214)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
>
> org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
>
> org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_275]
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_275]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]
> 2021-01-19 02:13:47,124 ERROR
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal
> error occurred in the cluster entrypoint.
> org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error
> while watching the ConfigMap
> test-flink-etl-cb1c647ea7488765fd3e8cc1dc691e46-jobmanager-leader
>         at
>
> org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
>
> org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
>
> io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
>
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
>
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
>
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws
> .RealWebSocket.onReadMessage(RealWebSocket.java:323)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws
> .WebSocketReader.readMessageFrame(WebSocketReader.java:219)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws
> .WebSocketReader.processNextFrame(WebSocketReader.java:105)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws
> .RealWebSocket.loopReader(RealWebSocket.java:274)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
> org.apache.flink.kubernetes.shaded.okhttp3.internal.ws
> .RealWebSocket$2.onResponse(RealWebSocket.java:214)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
>
> org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
>
> org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
> [flink-dist_2.11-1.12.1.jar:1.12.1]
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_275]
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_275]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]
> 2021-01-19 02:13:47,125 DEBUG
> org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Handling
> event from subtask 365 of source Source:
> HiveSource-snmpprobe.p_snmp_ifXTable: ReaderRegistrationEvent[subtaskId =
> 365, location = 172.0.37.16)
> 2021-01-19 02:13:47,125 DEBUG
> org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Handling
> event from subtask 365 of source Source:
> HiveSource-snmpprobe.p_snmp_ifXTable: RequestSplitEvent
> (host='172.0.37.16')
> 2021-01-19 02:13:47,125 INFO
> org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] -
> Subtask 365 (on host '172.0.37.16') is requesting a file source split
> 2021-01-19 02:13:47,125 INFO
> org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] - No
> more splits available for subtask 365
> 2021-01-19 02:13:47,125 DEBUG
> org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Handling
> event from subtask 379 of source Source:
> HiveSource-snmpprobe.p_snmp_ifXTable: ReaderRegistrationEvent[subtaskId =
> 379, location = 172.0.37.16)
> 2021-01-19 02:13:47,125 DEBUG
> org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Handling
> event from subtask 379 of source Source:
> HiveSource-snmpprobe.p_snmp_ifXTable: RequestSplitEvent
> (host='172.0.37.16')
> 2021-01-19 02:13:47,125 INFO
> org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] -
> Subtask 379 (on host '172.0.37.16') is requesting a file source split
> 2021-01-19 02:13:47,125 INFO
> org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] - No
> more splits available for subtask 379
> 2021-01-19 02:13:47,130 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> HiveSource-snmpprobe.p_snmp_ifXTable (389/458)
> (b0d8b877b1911ffca609f818693b68ad) switched from DEPLOYING to RUNNING.
> 2021-01-19 02:13:47,131 INFO  org.apache.flink.runtime.blob.BlobServer
>
> [] - Stopped BLOB server at 0.0.0.0:6124
> 2021-01-19 02:13:47,132 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> HiveSource-snmpprobe.p_snmp_ifXTable (421/458)
> (706d012e7a572e1e5786536df9ab3bbb) switched from DEPLOYING to RUNNING.
>
>
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
>

回复