[ https://issues.apache.org/jira/browse/YARN-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048088#comment-15048088 ]
Hudson commented on YARN-4431: ------------------------------ FAILURE: Integrated in Hadoop-trunk-Commit #8945 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8945/]) YARN-4431. Not necessary to do unRegisterNM() if NM get stop due to (rohithsharmaks: rev 15c3e7ffe3d1c57ad36afd993f09fc47889c93bd) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java > Not necessary to do unRegisterNM() if NM get stop due to failed to connect to > RM > -------------------------------------------------------------------------------- > > Key: YARN-4431 > URL: https://issues.apache.org/jira/browse/YARN-4431 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Reporter: Junping Du > Assignee: Junping Du > Fix For: 2.8.0 > > Attachments: YARN-4431.patch > > > {noformat} > 2015-12-07 12:16:57,873 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 12:16:58,874 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 12:16:58,876 WARN > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: > Unregistration of the Node 10.200.10.53:25454 failed. > java.net.ConnectException: Call From jduMBP.local/10.200.10.53 to > 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: > Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:408) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) > at org.apache.hadoop.ipc.Client.call(Client.java:1452) > at org.apache.hadoop.ipc.Client.call(Client.java:1385) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at com.sun.proxy.$Proxy74.unRegisterNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.unRegisterNodeManager(ResourceTrackerPBClientImpl.java:98) > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:255) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy75.unRegisterNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:267) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStop(NodeStatusUpdaterImpl.java:245) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:377) > {noformat} > If RM down for some reason, NM's NodeStatusUpdaterImpl will retry the > connection with proper retry policy. After retry the maximum times (15 > minutes by default), it will send NodeManagerEventType.SHUTDOWN to shutdown > NM. But NM shutdown will call NodeStatusUpdaterImpl.serviceStop() which will > call unRegisterNM() to unregister NM from RM and get retry again (another 15 > minutes). This is completely unnecessary and we should skip unRegisterNM when > NM get shutdown because of connection issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)