[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case
[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8381: Description: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still strongly recommend adding error log messages for unhealthy nodemanger(especially startup).*{color} was: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still strongly recommend adding error log messages for unhealthy nodemanger(especially for startup).*{color} > Job got stuck while node was unhealthy, but without log messages to indicate > such case > -- > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: lujie >Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. > After reading log messages for long time, I waked up to check the node > health . The Yarn web UI showed that the nodemanager is unhealthy, due to > "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. > {color:#d04437}*But I still strongly recommend adding error log messages for > unhealthy nodemanger(especially startup).*{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case
[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8381: Description: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still strongly recommend adding error log messages for unhealthy nodemanger(especially for startup).*{color} was: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still strongly recommend adding error log messages for unhealthy nodemanger.*{color} > Job got stuck while node was unhealthy, but without log messages to indicate > such case > -- > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: lujie >Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. > After reading log messages for long time, I waked up to check the node > health . The Yarn web UI showed that the nodemanager is unhealthy, due to > "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. > {color:#d04437}*But I still strongly recommend adding error log messages for > unhealthy nodemanger(especially for startup).*{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case
[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8381: Description: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still strongly recommend adding error log messages for unhealthy nodemanger.*{color} was: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to the "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still strongly recommend adding error log messages for unhealthy nodemanger.*{color} > Job got stuck while node was unhealthy, but without log messages to indicate > such case > -- > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: lujie >Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. > After reading log messages for long time, I waked up to check the node > health . The Yarn web UI showed that the nodemanager is unhealthy, due to > "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. > {color:#d04437}*But I still strongly recommend adding error log messages for > unhealthy nodemanger.*{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case
[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8381: Description: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to the "l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still strongly recommend adding error log messages for unhealthy nodemanger.*{color} was: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to the "l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. But I still strongly recommend adding error log messages for unhealthy nodemanger. > Job got stuck while node was unhealthy, but without log messages to indicate > such case > -- > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: lujie >Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. > After reading log messages for long time, I waked up to check the node > health . The Yarn web UI showed that the nodemanager is unhealthy, due to the > "l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure > the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. > {color:#d04437}*But I still strongly recommend adding error log messages for > unhealthy nodemanger.*{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case
[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8381: Description: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to the "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still strongly recommend adding error log messages for unhealthy nodemanger.*{color} was: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to the "l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still strongly recommend adding error log messages for unhealthy nodemanger.*{color} > Job got stuck while node was unhealthy, but without log messages to indicate > such case > -- > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: lujie >Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. > After reading log messages for long time, I waked up to check the node > health . The Yarn web UI showed that the nodemanager is unhealthy, due to the > "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. > {color:#d04437}*But I still strongly recommend adding error log messages for > unhealthy nodemanger.*{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case
[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8381: Summary: Job got stuck while node was unhealthy, but without log messages to indicate such case (was: Job got stuck while node is unhealthy, but without log messages to indicate such case) > Job got stuck while node was unhealthy, but without log messages to indicate > such case > -- > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: lujie >Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. Then I waked up to check the node health after > reading log message for long time. The Yarn web UI showed that the > nodemanager is unhealthy, due to the "l{{ocal-dirs are bad: > /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. But I still strongly recommend adding error > log messages for unhealthy nodemanger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case
[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8381: Description: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to the "l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. But I still strongly recommend adding error log messages for unhealthy nodemanger. was:I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. Then I waked up to check the node health after reading log message for long time. The Yarn web UI showed that the nodemanager is unhealthy, due to the "l{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. But I still strongly recommend adding error log messages for unhealthy nodemanger. > Job got stuck while node was unhealthy, but without log messages to indicate > such case > -- > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: lujie >Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. > After reading log messages for long time, I waked up to check the node > health . The Yarn web UI showed that the nodemanager is unhealthy, due to the > "l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure > the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. But I still strongly recommend adding error > log messages for unhealthy nodemanger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8381) Job got stuck while node is unhealthy, but without log messages to indicate such case
[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8381: Summary: Job got stuck while node is unhealthy, but without log messages to indicate such case (was: Job get stuck while node is unhealthy, but without log messages to indicate such case) > Job got stuck while node is unhealthy, but without log messages to indicate > such case > - > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: lujie >Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. Then I waked up to check the node health after > reading log message for long time. The Yarn web UI showed that the > nodemanager is unhealthy, due to the "l{{ocal-dirs are bad: > /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. But I still strongly recommend adding error > log messages for unhealthy nodemanger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org