[ https://issues.apache.org/jira/browse/YARN-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Bacsko updated YARN-6715: ------------------------------- Attachment: YARN-6715-002.patch > Fix documentation about NodeHealthScriptRunner > ----------------------------------------------- > > Key: YARN-6715 > URL: https://issues.apache.org/jira/browse/YARN-6715 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, nodemanager > Reporter: Peter Bacsko > Assignee: Peter Bacsko > Priority: Major > Attachments: YARN-6715-001.patch, YARN-6715-002.patch > > > NodeHealthScriptRunner does *not* report a bad health if the script exits > with an exit code other than 0. Look at the {{FAILED_WITH_EXIT_CODE}} case: > {noformat} > void reportHealthStatus(HealthCheckerExitStatus status) { > long now = System.currentTimeMillis(); > switch (status) { > case SUCCESS: > setHealthStatus(true, "", now); > break; > case TIMED_OUT: > setHealthStatus(false, NODE_HEALTH_SCRIPT_TIMED_OUT_MSG); > break; > case FAILED_WITH_EXCEPTION: > setHealthStatus(false, exceptionStackTrace); > break; > case FAILED_WITH_EXIT_CODE: > setHealthStatus(true, "", now); > break; > case FAILED: > setHealthStatus(false, shexec.getOutput()); > break; > } > } > {noformat} > Based on the discussion in YARN-5567, this is intentional, but conflicts with > the upstream document, which says: > "If the script *exits with a non-zero exit code*, times out or results in an > exception being thrown, the node is marked as unhealthy" > This statement can be extremely misleading and must be corrected. We might > also add an extra comment to {{reportHealthStatus()}} which explains that > {{FAILED_WITH_EXIT_CODE}} is not buggy. > This case also lacks unit test coverage. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org