[ https://issues.apache.org/jira/browse/YARN-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhankun Tang reassigned YARN-8823: ---------------------------------- Assignee: Zhankun Tang > Monitor the healthy state of GPU > -------------------------------- > > Key: YARN-8823 > URL: https://issues.apache.org/jira/browse/YARN-8823 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Zhankun Tang > Assignee: Zhankun Tang > Priority: Major > > We have GPU resource discovered when the NM bootstrap but not updated through > later heatbeat with RM. There should be a monitoring mechanism to check GPU > healthy status from time to time and also the corresponding handling. > And YARN-8851 will also handle device's monitoring. There could be some > common part between the two. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org