[ https://issues.apache.org/jira/browse/YARN-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tao Yang updated YARN-8729: --------------------------- Attachment: YARN-8729.001.patch > Node status updater thread could be lost after it restarted > ----------------------------------------------------------- > > Key: YARN-8729 > URL: https://issues.apache.org/jira/browse/YARN-8729 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.2.0 > Reporter: Tao Yang > Assignee: Tao Yang > Priority: Critical > Attachments: YARN-8729.001.patch, YARN-8729.001.patch > > > Today I found a lost NM whose node status updater thread was not exist after > this thread restarted. In > {{NodeStatusUpdaterImpl#rebootNodeStatusUpdaterAndRegisterWithRM}}, isStopped > flag is not updated to be false before executing {{statusUpdater.start()}}, > so that if the thread is immediately started and found isStopped==true, it > will exit without any log. > Key codes in > {{NodeStatusUpdaterImpl#rebootNodeStatusUpdaterAndRegisterWithRM}}: > {code:java} > statusUpdater.join(); > registerWithRM(); > statusUpdater = new Thread(statusUpdaterRunnable, "Node Status Updater"); > statusUpdater.start(); > this.isStopped = false; //this line should be moved before > statusUpdater.start(); > LOG.info("NodeStatusUpdater thread is reRegistered and restarted"); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org