[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lujie updated YARN-8381: ------------------------ Description: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still strongly recommend adding error log messages for unhealthy nodemanger.*{color} was: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to the "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still strongly recommend adding error log messages for unhealthy nodemanger.*{color} > Job got stuck while node was unhealthy, but without log messages to indicate > such case > -------------------------------------------------------------------------------------- > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: lujie > Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. > After reading log messages for long time, I waked up to check the node > health . The Yarn web UI showed that the nodemanager is unhealthy, due to > "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. > {color:#d04437}*But I still strongly recommend adding error log messages for > unhealthy nodemanger.*{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org