sandflee created YARN-6854: ------------------------------ Summary: many job failed if NM couldn't detect disk error Key: YARN-6854 URL: https://issues.apache.org/jira/browse/YARN-6854 Project: Hadoop YARN Issue Type: Bug Reporter: sandflee Priority: Critical
checkDiskHealthy is enabled, but it couldn't find this error. leading containers failed and new containers assigned to this node then failed again. the disk error seems a filesystem error, all io operation (such as ls) failed on $localdir/usercache/userFoo, and no effect on other dir. Any suggestion? -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org