Hi group, I met with a problem that sometimes when I running my MapReduce job, some YARN process will go to status D, which means uninterruptable status. In this case this YARN process cannot be killed, and only reboot server can recover. Since in this situation, some command such as ps, reboot, lsof will be stuck and I cannot investigate more. In my observation this occur on the server with 8T hard disk.
Here is the output from top command. 8382 yarn 20 0 3456408 277324 28040 D 0.0 0.2 11:16.21 java Here is my environment. · Cloudera CDH 5.9.0 · The disk on server is 8T volume · My MapReduce job is getting data from HBase, and store the result in HBase I have used fsck command to check the disk, and no error found with the disk. I am not sure if there is some configuration to be tuned for large-volume disk for Hadoop / YARN. Is there any idea about this issue? Thanks in advance. Thanks, Eric