[ https://issues.apache.org/jira/browse/YARN-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598112#comment-16598112 ]
Tao Yang commented on YARN-8729: -------------------------------- Thanks [~cheersyang] for your mention. {quote} That patch puts this.isStopped=false after statusUpdater.start() {quote} [~ebadger], [~cheersyang], I don't understand why YARN-4686 exchange these two lines. can you give a hint for that? Thanks. > Node status updater thread could be lost after it restarted > ----------------------------------------------------------- > > Key: YARN-8729 > URL: https://issues.apache.org/jira/browse/YARN-8729 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.2.0 > Reporter: Tao Yang > Assignee: Tao Yang > Priority: Critical > Attachments: YARN-8729.001.patch > > > Today I found a lost NM whose node status updater thread was not exist after > this thread restarted. In > {{NodeStatusUpdaterImpl#rebootNodeStatusUpdaterAndRegisterWithRM}}, isStopped > flag is not updated to be false before executing {{statusUpdater.start()}}, > so that if the thread is immediately started and found isStopped==true, it > will exit without any log. > Key codes in > {{NodeStatusUpdaterImpl#rebootNodeStatusUpdaterAndRegisterWithRM}}: > {code:java} > statusUpdater.join(); > registerWithRM(); > statusUpdater = new Thread(statusUpdaterRunnable, "Node Status Updater"); > statusUpdater.start(); > this.isStopped = false; //this line should be moved before > statusUpdater.start(); > LOG.info("NodeStatusUpdater thread is reRegistered and restarted"); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org