Zhankun Tang created YARN-8823: ---------------------------------- Summary: Monitor the healthy state of GPU Key: YARN-8823 URL: https://issues.apache.org/jira/browse/YARN-8823 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhankun Tang
We have GPU resource discovered when the NM bootstrap but not updated through later heatbeat with RM. There should be a monitoring mechanism to check GPU healthy status from time to time and also the corresponding handling. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org