Hi Yang Wang

先分享下我这边的环境版本


kubernetes:1.17.4.   CNI: weave  


1 2 3 是我的一些疑惑

4 是JM日志


1. 去掉taskmanager-query-state-service.yaml后确实不行  nslookup

kubectl exec -it busybox2 -- /bin/sh
/ # nslookup 10.47.96.2
Server:          10.96.0.10
Address:     10.96.0.10:53

** server can't find 2.96.47.10.in-addr.arpa: NXDOMAIN



2. Flink1.11和Flink1.10

detail subtasks taskmanagers xxx x 这行  
1.11变成了172-20-0-50。1.10是flink-taskmanager-7b5d6958b6-sfzlk:36459。这块的改动是?(目前这个集群跑着1.10和1.11,1.10可以正常运行,如果coredns有问题,1.10版本的flink应该也有一样的情况吧?)

3. coredns是否特殊配置?

在容器中解析域名是正常的,只是反向解析没有service才会有问题。coredns是否有什么需要配置?


4. time out时候的JM日志如下:



2020-07-23 13:53:00,228 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
ResourceManager 
akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_0 was granted 
leadership with fencing token 00000000000000000000000000000000
2020-07-23 13:53:00,232 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService  
           [] - Starting RPC endpoint for 
org.apache.flink.runtime.dispatcher.StandaloneDispatcher at 
akka://flink/user/rpc/dispatcher_1 .
2020-07-23 13:53:00,233 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - 
Starting the SlotManager.
2020-07-23 13:53:03,472 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Registering TaskManager with ResourceID 1f9ae0cd95a28943a73be26323588696 
(akka.tcp://flink@10.34.128.9:6122/user/rpc/taskmanager_0) at ResourceManager
2020-07-23 13:53:03,777 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Registering TaskManager with ResourceID cac09e751264e61615329c20713a84b4 
(akka.tcp://flink@10.32.160.6:6122/user/rpc/taskmanager_0) at ResourceManager
2020-07-23 13:53:03,787 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Registering TaskManager with ResourceID 93c72d01d09f9ae427c5fc980ed4c1e4 
(akka.tcp://flink@10.39.0.8:6122/user/rpc/taskmanager_0) at ResourceManager
2020-07-23 13:53:04,044 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Registering TaskManager with ResourceID 8adf2f8e81b77a16d5418a9e252c61e2 
(akka.tcp://flink@10.38.64.7:6122/user/rpc/taskmanager_0) at ResourceManager
2020-07-23 13:53:04,099 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Registering TaskManager with ResourceID 23e9d2358f6eb76b9ae718d879d4f330 
(akka.tcp://flink@10.42.160.6:6122/user/rpc/taskmanager_0) at ResourceManager
2020-07-23 13:53:04,146 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Registering TaskManager with ResourceID 092f8dee299e32df13db3111662b61f8 
(akka.tcp://flink@10.33.192.14:6122/user/rpc/taskmanager_0) at ResourceManager


2020-07-23 13:55:44,220 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Received 
JobGraph submission 99a030d0e3f428490a501c0132f27a56 (JobTest).
2020-07-23 13:55:44,222 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Submitting 
job 99a030d0e3f428490a501c0132f27a56 (JobTest).
2020-07-23 13:55:44,251 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService  
           [] - Starting RPC endpoint for 
org.apache.flink.runtime.jobmaster.JobMaster at 
akka://flink/user/rpc/jobmanager_2 .
2020-07-23 13:55:44,260 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Initializing job JobTest (99a030d0e3f428490a501c0132f27a56).
2020-07-23 13:55:44,278 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Using restart back off time strategy 
NoRestartBackoffTimeStrategy for JobTest (99a030d0e3f428490a501c0132f27a56).
2020-07-23 13:55:44,319 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Running initialization on master for job JobTest 
(99a030d0e3f428490a501c0132f27a56).
2020-07-23 13:55:44,319 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Successfully ran initialization on master in 0 ms.
2020-07-23 13:55:44,428 INFO  
org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built 
1 pipelined regions in 25 ms
2020-07-23 13:55:44,437 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Loading state backend via factory 
org.apache.flink.contrib.streaming.state.RocksDBStateBackendFactory
2020-07-23 13:55:44,456 INFO  
org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] - Using 
predefined options: DEFAULT.
2020-07-23 13:55:44,457 INFO  
org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] - Using default 
options factory: DefaultConfigurableOptionsFactory{configuredOptions={}}.
2020-07-23 13:55:44,466 WARN  org.apache.flink.runtime.util.HadoopUtils         
           [] - Could not find Hadoop configuration via any of the supported 
methods (Flink configuration, environment variables).
2020-07-23 13:55:45,276 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Using failover strategy 
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@72bd8533
 for JobTest (99a030d0e3f428490a501c0132f27a56).
2020-07-23 13:55:45,280 INFO  
org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl      [] - JobManager 
runner for job JobTest (99a030d0e3f428490a501c0132f27a56) was granted 
leadership with session id 00000000-0000-0000-0000-000000000000 at 
akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2.
2020-07-23 13:55:45,286 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Starting scheduling with scheduling strategy 
[org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy]



2020-07-23 13:55:45,436 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Cannot serve 
slot request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{e092b12b96b0a98bbf057e71b9705c23}]
2020-07-23 13:55:45,436 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Cannot serve 
slot request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{4ad15f417716c9e07fca383990c0f52a}]
2020-07-23 13:55:45,436 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Cannot serve 
slot request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{345fdb427a893b7fc3f4f040f93445d2}]
2020-07-23 13:55:45,437 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Cannot serve 
slot request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{e559485ea7b0b7e17367816882538d90}]
2020-07-23 13:55:45,437 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Cannot serve 
slot request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{7be8f6c1aedb27b04e7feae68078685c}]
2020-07-23 13:55:45,437 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Cannot serve 
slot request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{582a86197884206652dff3aea2306bb3}]
2020-07-23 13:55:45,437 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Cannot serve 
slot request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{0cc24260eda3af299a0b321feefaf2cb}]
2020-07-23 13:55:45,437 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Cannot serve 
slot request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{240ca6f3d3b5ece6a98243ec8cadf616}]
2020-07-23 13:55:45,438 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Cannot serve 
slot request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{c35033d598a517acc108424bb9f809fb}]
2020-07-23 13:55:45,438 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Cannot serve 
slot request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{ad35013c3b532d4b4df1be62395ae0cf}]
2020-07-23 13:55:45,438 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Cannot serve 
slot request, no ResourceManager connected. Adding as pending request 
[SlotRequestId{c929bd5e8daf432d01fad1ece3daec1a}]
2020-07-23 13:55:45,487 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Connecting to ResourceManager 
akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000)
2020-07-23 13:55:45,492 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Resolved ResourceManager address, beginning registration
2020-07-23 13:55:45,493 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Registering job manager 
00000000000000000000000000000...@akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2
 for job 99a030d0e3f428490a501c0132f27a56.
2020-07-23 13:55:45,499 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Registered job manager 
00000000000000000000000000000...@akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2
 for job 99a030d0e3f428490a501c0132f27a56.
2020-07-23 13:55:45,501 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - JobManager successfully registered at ResourceManager, leader 
id: 00000000000000000000000000000000.
2020-07-23 13:55:45,501 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{15fd2a9565c2b080748c1d1592b1cbbc}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-07-23 13:55:45,502 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request 
slot with profile ResourceProfile{UNKNOWN} for job 
99a030d0e3f428490a501c0132f27a56 with allocation id 
d420d08bf2654d9ea76955c70db18b69.
2020-07-23 13:55:45,502 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{8cd72cc16f0e319d915a9a096a1096d7}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-07-23 13:55:45,503 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{e7e422409acebdb385014a9634af6a90}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-07-23 13:55:45,503 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{cef1af73546ca1fc27ca7a3322e9e815}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-07-23 13:55:45,503 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{108fe0b3086567ad79275eccef2fdaf8}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-07-23 13:55:45,503 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{265e67985eab7a6dc08024e53bf2708d}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-07-23 13:55:45,503 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{7087497a17c441f1a1d6fefcbc7cd0ea}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-07-23 13:55:45,503 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{14ac08438e79c8db8d25d93b99d62725}] and profile 
ResourceProfile{UNKNOWN} from resource manager.

2020-07-23 13:55:45,514 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request 
slot with profile ResourceProfile{UNKNOWN} for job 
99a030d0e3f428490a501c0132f27a56 with allocation id 
fce526bbe3e1be91caa3e4b536b20e35.
2020-07-23 13:55:45,514 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{40c7abbb12514c405323b0569fb21647}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-07-23 13:55:45,514 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{a4985a9647b65b30a571258b45c8f2ce}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-07-23 13:55:45,515 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{c52a6eb2fa58050e71e7903590019fd1}] and profile 
ResourceProfile{UNKNOWN} from resource manager.

2020-07-23 13:55:45,517 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request 
slot with profile ResourceProfile{UNKNOWN} for job 
99a030d0e3f428490a501c0132f27a56 with allocation id 
18ac7ec802ebfcfed8c05ee9324a55a4.

2020-07-23 13:55:45,518 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request 
slot with profile ResourceProfile{UNKNOWN} for job 
99a030d0e3f428490a501c0132f27a56 with allocation id 
7ec76cbe689eb418b63599e90ade19be.
2020-07-23 13:55:45,518 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{46d65692a8b5aad11b51f9a74a666a74}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-07-23 13:55:45,518 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{3670bb4f345eedf941cc18e477ba1e9d}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-07-23 13:55:45,518 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{4a12467d76b9e3df8bc3412c0be08e14}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-07-23 13:55:45,518 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{e092b12b96b0a98bbf057e71b9705c23}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-07-23 13:55:45,518 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{4ad15f417716c9e07fca383990c0f52a}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-07-23 13:55:45,518 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{345fdb427a893b7fc3f4f040f93445d2}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-07-23 13:55:45,519 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Requesting 
new slot [SlotRequestId{e559485ea7b0b7e17367816882538d90}] and profile 
ResourceProfile{UNKNOWN} from resource manager.
2020-07-23 13:55:45,519 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request 
slot with profile ResourceProfile{UNKNOWN} for job 
99a030d0e3f428490a501c0132f27a56 with allocation id 
b78837a29b4032924ac25be70ed21a3c.


2020-07-23 13:58:18,037 WARN  
org.apache.flink.runtime.taskmanager.TaskManagerLocation     [] - No hostname 
could be resolved for the IP address 10.47.96.2, using IP address as host name. 
Local input split assignment (such as for HDFS files) may be impacted.
2020-07-23 13:58:22,192 WARN  
org.apache.flink.runtime.taskmanager.TaskManagerLocation     [] - No hostname 
could be resolved for the IP address 10.34.64.14, using IP address as host 
name. Local input split assignment (such as for HDFS files) may be impacted.
2020-07-23 13:58:22,358 WARN  
org.apache.flink.runtime.taskmanager.TaskManagerLocation     [] - No hostname 
could be resolved for the IP address 10.34.128.9, using IP address as host 
name. Local input split assignment (such as for HDFS files) may be impacted.
2020-07-23 13:58:24,562 WARN  
org.apache.flink.runtime.taskmanager.TaskManagerLocation     [] - No hostname 
could be resolved for the IP address 10.32.160.6, using IP address as host 
name. Local input split assignment (such as for HDFS files) may be impacted.
2020-07-23 13:58:25,487 WARN  
org.apache.flink.runtime.taskmanager.TaskManagerLocation     [] - No hostname 
could be resolved for the IP address 10.38.64.7, using IP address as host name. 
Local input split assignment (such as for HDFS files) may be impacted.
2020-07-23 13:58:27,636 WARN  
org.apache.flink.runtime.taskmanager.TaskManagerLocation     [] - No hostname 
could be resolved for the IP address 10.42.160.6, using IP address as host 
name. Local input split assignment (such as for HDFS files) may be impacted.
2020-07-23 13:58:27,767 WARN  
org.apache.flink.runtime.taskmanager.TaskManagerLocation     [] - No hostname 
could be resolved for the IP address 10.43.64.12, using IP address as host 
name. Local input split assignment (such as for HDFS files) may be impacted.
2020-07-23 13:58:29,651 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - The 
heartbeat of JobManager with id 456a18b6c404cb11a359718e16de1c6b timed out.
2020-07-23 13:58:29,651 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Disconnect job manager 
00000000000000000000000000000...@akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2
 for job 99a030d0e3f428490a501c0132f27a56 from the resource manager.
2020-07-23 13:58:29,854 WARN  
org.apache.flink.runtime.taskmanager.TaskManagerLocation     [] - No hostname 
could be resolved for the IP address 10.39.0.8, using IP address as host name. 
Local input split assignment (such as for HDFS files) may be impacted.
2020-07-23 13:58:33,623 WARN  
org.apache.flink.runtime.taskmanager.TaskManagerLocation     [] - No hostname 
could be resolved for the IP address 10.35.0.10, using IP address as host name. 
Local input split assignment (such as for HDFS files) may be impacted.
2020-07-23 13:58:35,756 WARN  
org.apache.flink.runtime.taskmanager.TaskManagerLocation     [] - No hostname 
could be resolved for the IP address 10.36.32.8, using IP address as host name. 
Local input split assignment (such as for HDFS files) may be impacted.
2020-07-23 13:58:36,694 WARN  
org.apache.flink.runtime.taskmanager.TaskManagerLocation     [] - No hostname 
could be resolved for the IP address 10.42.128.6, using IP address as host 
name. Local input split assignment (such as for HDFS files) may be impacted.


2020-07-23 14:01:17,814 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Close ResourceManager connection 
83b1ff14900abfd54418e7fa3efb3f8a: The heartbeat of JobManager with id 
456a18b6c404cb11a359718e16de1c6b timed out..
2020-07-23 14:01:17,815 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Connecting to ResourceManager 
akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000)
2020-07-23 14:01:17,816 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Resolved ResourceManager address, beginning registration
2020-07-23 14:01:17,816 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Registering job manager 
00000000000000000000000000000...@akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2
 for job 99a030d0e3f428490a501c0132f27a56.
2020-07-23 14:01:17,836 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
host_relation -> Timestamps/Watermarks -> Map (1/1) 
(302ca9640e2d209a543d843f2996ccd2) switched from SCHEDULED to FAILED on not 
deployed.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
Could not allocate the required slot within slot request timeout. Please make 
sure that the cluster has enough resources.
     at 
org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_242]
     at 
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_242]
     at 
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_242]
     at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_242]
     at 
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
Caused by: java.util.concurrent.CompletionException: 
java.util.concurrent.TimeoutException
     at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
 ~[?:1.8.0_242]
     ... 25 more
Caused by: java.util.concurrent.TimeoutException
     ... 23 more
2020-07-23 14:01:17,848 INFO  
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy
 [] - Calculating tasks to restart to recover the failed task 
cbc357ccb763df2852fee8c4fc7d55f2_0.
2020-07-23 14:01:17,910 INFO  
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy
 [] - 902 tasks should be restarted to recover the failed task 
cbc357ccb763df2852fee8c4fc7d55f2_0.
2020-07-23 14:01:17,913 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job JobTest 
(99a030d0e3f428490a501c0132f27a56) switched from state RUNNING to FAILING.
org.apache.flink.runtime.JobException: Recovery is suppressed by 
NoRestartBackoffTimeStrategy
     at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:49)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1710)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1287)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1255)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:1086)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:748)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:435)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_242]
     at 
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_242]
     at 
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_242]
     at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_242]
     at 
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
Caused by: 
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
Could not allocate the required slot within slot request timeout. Please make 
sure that the cluster has enough resources.
     at 
org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     ... 45 more
Caused by: java.util.concurrent.CompletionException: 
java.util.concurrent.TimeoutException
     at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
 ~[?:1.8.0_242]
     ... 25 more
Caused by: java.util.concurrent.TimeoutException
     ... 23 more



2020-07-23 14:01:18,109 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding 
the results produced by task execution 1809eb912d69854f2babedeaf879df6a.
2020-07-23 14:01:18,110 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job JobTest 
(99a030d0e3f428490a501c0132f27a56) switched from state FAILING to FAILED.
org.apache.flink.runtime.JobException: Recovery is suppressed by 
NoRestartBackoffTimeStrategy
     at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:49)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1710)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1287)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1255)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:1086)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:748)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:435)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_242]
     at 
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_242]
     at 
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_242]
     at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_242]
     at 
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.dispatch.Mailbox.run(Mailbox.scala:225) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.dispatch.Mailbox.exec(Mailbox.scala:235) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
     at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 
[flink-dist_2.11-1.11.1.jar:1.11.1]
Caused by: 
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
Could not allocate the required slot within slot request timeout. Please make 
sure that the cluster has enough resources.
     at 
org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441)
 ~[flink-dist_2.11-1.11.1.jar:1.11.1]
     ... 45 more
Caused by: java.util.concurrent.CompletionException: 
java.util.concurrent.TimeoutException
     at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
 ~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) 
~[?:1.8.0_242]
     at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
 ~[?:1.8.0_242]
     ... 25 more
Caused by: java.util.concurrent.TimeoutException
     ... 23 more
2020-07-23 14:01:18,114 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Stopping 
checkpoint coordinator for job 99a030d0e3f428490a501c0132f27a56.
2020-07-23 14:01:18,117 INFO  
org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore [] - 
Shutting down
2020-07-23 14:01:18,118 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding 
the results produced by task execution 302ca9640e2d209a543d843f2996ccd2.
2020-07-23 14:01:18,120 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Pending slot 
request [SlotRequestId{15fd2a9565c2b080748c1d1592b1cbbc}] timed out.
2020-07-23 14:01:18,120 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Pending slot 
request [SlotRequestId{8cd72cc16f0e319d915a9a096a1096d7}] timed out.
2020-07-23 14:01:18,120 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Pending slot 
request [SlotRequestId{e7e422409acebdb385014a9634af6a90}] timed out.
2020-07-23 14:01:18,121 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Pending slot 
request [SlotRequestId{cef1af73546ca1fc27ca7a3322e9e815}] timed out.
2020-07-23 14:01:18,121 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Pending slot 
request [SlotRequestId{108fe0b3086567ad79275eccef2fdaf8}] timed out.
2020-07-23 14:01:18,121 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Pending slot 
request [SlotRequestId{265e67985eab7a6dc08024e53bf2708d}] timed out.
2020-07-23 14:01:18,122 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Pending slot 
request [SlotRequestId{7087497a17c441f1a1d6fefcbc7cd0ea}] timed out.
2020-07-23 14:01:18,122 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Pending slot 
request [


2020-07-23 14:01:18,151 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Registering job manager 
00000000000000000000000000000...@akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2
 for job 99a030d0e3f428490a501c0132f27a56.
2020-07-23 14:01:18,157 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Registered job manager 
00000000000000000000000000000...@akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2
 for job 99a030d0e3f428490a501c0132f27a56.
2020-07-23 14:01:18,157 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Registered job manager 
00000000000000000000000000000...@akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2
 for job 99a030d0e3f428490a501c0132f27a56.
2020-07-23 14:01:18,157 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Job 
99a030d0e3f428490a501c0132f27a56 reached globally terminal state FAILED.
2020-07-23 14:01:18,162 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Registered job manager 
00000000000000000000000000000...@akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2
 for job 99a030d0e3f428490a501c0132f27a56.
2020-07-23 14:01:18,162 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - JobManager successfully registered at ResourceManager, leader 
id: 00000000000000000000000000000000.
2020-07-23 14:01:18,225 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Stopping the JobMaster for job 
JobTest(99a030d0e3f428490a501c0132f27a56).
2020-07-23 14:01:18,381 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Suspending 
SlotPool.
2020-07-23 14:01:18,382 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Close ResourceManager connection 
83b1ff14900abfd54418e7fa3efb3f8a: JobManager is shutting down..
2020-07-23 14:01:18,382 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Stopping 
SlotPool.
2020-07-23 14:01:18,382 INFO  
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
Disconnect job manager 
00000000000000000000000000000...@akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2
 for job 99a030d0e3f428490a501c0132f27a56 from the resource manager.


| |
a511955993
|
|
邮箱:a511955...@163.com
|

签名由 网易邮箱大师 定制

On 07/23/2020 13:26, Yang Wang wrote:
很高兴你的问题解决了,但我觉得根本原因应该不是加上了taskmanager-query-state-service.yaml的关系。
我这边不创建这个服务也是正常的,而且nslookup {tm_ip_address}是可以正常反解析到hostname的。

注意这里不是解析hostname,而是通过ip地址来反解析进行验证


回答你说的两个问题:
1. 不是必须的,我这边验证不需要创建,集群也是可以正常运行任务的。Rest
service的暴露方式是ClusterIP、NodePort、LoadBalancer都正常
2. 如果没有配置taskmanager.bind-host,
[Flink-15911][Flink-15154]这两个JIRA并不会影响TM向RM注册时候的使用的地址

如果你想找到根本原因,那可能需要你这边提供JM/TM的完整log,这样方便分析


Best,
Yang

SmileSmile <a511955...@163.com> 于2020年7月23日周四 上午11:30写道:

>
> Hi Yang Wang
>
> 刚刚在测试环境测试了一下,taskManager没有办法nslookup出来,JM可以nslookup,这两者的差别在于是否有service。
>
> 解决方案:我这边给集群加上了taskmanager-query-state-service.yaml(按照官网上是可选服务)。就不会刷No
> hostname could be resolved for ip
> address,将NodePort改为ClusterIp,作业就可以成功提交,不会出现time out的问题了,问题得到了解决。
>
>
> 1. 如果按照上面的情况,那么这个配置文件是必须配置的?
>
> 2. 在1.11的更新中,发现有 [Flink-15911][Flink-15154]
> 支持分别配置用于本地监听绑定的网络接口和外部访问的地址和端口。是否是这块的改动,
> 需要JM去通过TM上报的ip反向解析出service?
>
>
> Bset!
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html
>
> a511955993
> 邮箱:a511955...@163.com
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D&gt;
>
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88&gt; 定制
>
> On 07/23/2020 10:11, Yang Wang <danrtsey...@gmail.com> wrote:
> 我的意思就是你在Flink任务运行的过程中,然后下面的命令在集群里面起一个busybox的pod,
> 在里面执行 nslookup {ip_address},看看是否能够正常解析到。如果不能应该就是coredns的
> 问题了
>
> kubectl run -i -t busybox --image=busybox --restart=Never
>
> 你需要确认下集群的coredns pod是否正常,一般是部署在kube-system这个namespace下的
>
>
>
> Best,
> Yang
>
>
> SmileSmile <a511955...@163.com> 于2020年7月22日周三 下午7:57写道:
>
> >
> > Hi,Yang Wang!
> >
> > 很开心可以收到你的回复,你的回复帮助很大,让我知道了问题的方向。我再补充些信息,希望可以帮我进一步判断一下问题根源。
> >
> > 在JM报错的地方,No hostname could be resolved for ip address xxxxx
> > ,报出来的ip是k8s分配给flink pod的内网ip,不是宿主机的ip。请问这个问题可能出在哪里呢
> >
> > Best!
> >
> >
> > a511955993
> > 邮箱:a511955...@163.com
> >
> > <
> https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D&gt;
>
> >
> > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88&gt; 定制
> >
> > On 07/22/2020 18:18, Yang Wang <danrtsey...@gmail.com> wrote:
> > 如果你的日志里面一直在刷No hostname could be resolved for the IP
> address,应该是集群的coredns
> > 有问题,由ip地址反查hostname查不到。你可以起一个busybox验证一下是不是这个ip就解析不了,有
> > 可能是coredns有问题
> >
> >
> > Best,
> > Yang
> >
> > Congxian Qiu <qcx978132...@gmail.com> 于2020年7月21日周二 下午7:29写道:
> >
> > > Hi
> > >    不确定 k8s 环境中能否看到 pod 的完整日志?类似 Yarn 的 NM 日志一样,如果有的话,可以尝试看一下这个 pod
> > > 的完整日志有没有什么发现
> > > Best,
> > > Congxian
> > >
> > >
> > > SmileSmile <a511955...@163.com> 于2020年7月21日周二 下午3:19写道:
> > >
> > > > Hi,Congxian
> > > >
> > > > 因为是测试环境,没有配置HA,目前看到的信息,就是JM刷出来大量的no hostname could be
> > > > resolved,jm失联,作业提交失败。
> > > > 将jm内存配置为10g也是一样的情况(jobmanager.memory.pprocesa.size:10240m)。
> > > >
> > > > 在同一个环境将版本回退到1.10没有出现该问题,也不会刷如上报错。
> > > >
> > > >
> > > > 是否有其他排查思路?
> > > >
> > > > Best!
> > > >
> > > >
> > > >
> > > >
> > > > | |
> > > > a511955993
> > > > |
> > > > |
> > > > 邮箱:a511955...@163.com
> > > > |
> > > >
> > > > 签名由 网易邮箱大师 定制
> > > >
> > > > On 07/16/2020 13:17, Congxian Qiu wrote:
> > > > Hi
> > > >   如果没有异常,GC 情况也正常的话,或许可以看一下 pod 的相关日志,如果开启了 HA 也可以看一下 zk
> 的日志。之前遇到过一次在
> > > Yarn
> > > > 环境中类似的现象是由于其他原因导致的,通过看 NM 日志以及 zk 日志发现的原因。
> > > >
> > > > Best,
> > > > Congxian
> > > >
> > > >
> > > > SmileSmile <a511955...@163.com> 于2020年7月15日周三 下午5:20写道:
> > > >
> > > > > Hi Roc
> > > > >
> > > > > 该现象在1.10.1版本没有,在1.11版本才出现。请问这个该如何查比较合适
> > > > >
> > > > >
> > > > >
> > > > > | |
> > > > > a511955993
> > > > > |
> > > > > |
> > > > > 邮箱:a511955...@163.com
> > > > > |
> > > > >
> > > > > 签名由 网易邮箱大师 定制
> > > > >
> > > > > On 07/15/2020 17:16, Roc Marshal wrote:
> > > > > Hi,SmileSmile.
> > > > > 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。
> > > > > 希望这对你有帮助。
> > > > >
> > > > >
> > > > > 祝好。
> > > > > Roc Marshal
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 在 2020-07-15 17:04:18,"SmileSmile" <a511955...@163.com> 写道:
> > > > > >
> > > > > >Hi
> > > > > >
> > > > > >使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 job
> > > > > 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP
> > address,JM
> > > > time
> > > > > out,作业提交失败。web ui也会卡主无响应。
> > > > > >
> > > > > >用wordCount,并行度只有1提交也会刷,no hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。
> > > > > >
> > > > > >
> > > > > >部分日志如下:
> > > > > >
> > > > > >2020-07-15 16:58:46,460 WARN
> > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation     [] -
> No
> > > > > hostname could be resolved for the IP address 10.32.160.7, using
> IP
> > > > address
> > > > > as host name. Local input split assignment (such as for HDFS
> files)
> > may
> > > > be
> > > > > impacted.
> > > > > >2020-07-15 16:58:46,460 WARN
> > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation     [] -
> No
> > > > > hostname could be resolved for the IP address 10.44.224.7, using
> IP
> > > > address
> > > > > as host name. Local input split assignment (such as for HDFS
> files)
> > may
> > > > be
> > > > > impacted.
> > > > > >2020-07-15 16:58:46,461 WARN
> > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation     [] -
> No
> > > > > hostname could be resolved for the IP address 10.40.32.9, using IP
> > > > address
> > > > > as host name. Local input split assignment (such as for HDFS
> files)
> > may
> > > > be
> > > > > impacted.
> > > > > >
> > > > > >2020-07-15 16:59:10,236 INFO
> > > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager
> > [] -
> > > > The
> > > > > heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a
> > timed
> > > > out.
> > > > > >2020-07-15 16:59:10,236 INFO
> > > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager
> > [] -
> > > > > Disconnect job manager 00000000000000000000000000000000
> > > > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for
> > job
> > > > > e1554c737e37ed79688a15c746b6e9ef from the resource manager.
> > > > > >
> > > > > >
> > > > > >how to deal with ?
> > > > > >
> > > > > >
> > > > > >beset !
> > > > > >
> > > > > >| |
> > > > > >a511955993
> > > > > >|
> > > > > >|
> > > > > >邮箱:a511955...@163.com
> > > > > >|
> > > > > >
> > > > > >签名由 网易邮箱大师 定制
> > > > >
> > > >
> > >
> >
> >
>
>

回复