[ https://issues.apache.org/jira/browse/YARN-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139451#comment-17139451 ]
Jim Brennan commented on YARN-9809: ----------------------------------- [~eyang], [~ebadger] changing the behavior of health-check scripts seems pretty dangerous. We looked into this issue a few years ago, because we had some cases where the health-check scripts were not installed properly, and some bad nodes were erroneously reporting healthy status. Rather than try to change the contract for how health-check scripts behave, which has been around for a very long time, we instead added a wrapper script that we ship with hadoop. The wrapper checks that the real health-check script exists and is executable, and if it's not, it prints an "ERROR" message so the NM will mark the node unhealthy. If the health-check script is good, we just exec it. I agree that changing the handling of health check script output/return value is beyond the scope of this Jira. > NMs should supply a health status when registering with RM > ---------------------------------------------------------- > > Key: YARN-9809 > URL: https://issues.apache.org/jira/browse/YARN-9809 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Eric Badger > Assignee: Eric Badger > Priority: Major > Attachments: YARN-9809.001.patch, YARN-9809.002.patch, > YARN-9809.003.patch, YARN-9809.004.patch > > > Currently if the NM registers with the RM and it is unhealthy, it can be > scheduled many containers before the first heartbeat. After the first > heartbeat, the RM will mark the NM as unhealthy and kill all of the > containers. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org