[ https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098000#comment-16098000 ]
Hudson commented on YARN-6102: ------------------------------ SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12047 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/12047/]) YARN-6102. RMActiveService context to be updated with new RMContext on (sunilg: rev e3153284288d6cfa7a28511dfefe1c8a7d6b4eda) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TimelineServiceV2Publisher.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisherForV2.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/CuratorBasedElectorService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ActiveStandbyElectorBasedElectorService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServiceContext.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/timelineservice/RMTimelineCollectorManager.java > RMActiveService context to be updated with new RMContext on failover > -------------------------------------------------------------------- > > Key: YARN-6102 > URL: https://issues.apache.org/jira/browse/YARN-6102 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.8.0, 2.7.3 > Reporter: Ajith S > Assignee: Rohith Sharma K S > Priority: Critical > Attachments: eventOrder.JPG, YARN-6102.01.patch, YARN-6102.02.patch, > YARN-6102.03.patch, YARN-6102.04.patch, YARN-6102.05.patch, > YARN-6102.06.patch, YARN-6102.07.patch > > > {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in > dispatcher thread > java.lang.Exception: No handler for registered for class > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120) > at java.lang.Thread.run(Thread.java:745) > 2017-01-17 16:42:17,914 INFO [AsyncDispatcher ShutDown handler] > event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code} > The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits > abnormally, after some analysis, i was able to reproduce. > Once the nodeHeartBeat is sent to RM, inside > {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}}, > before sending it to dispatcher through > {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} > if RM failover is called, the dispatcher is reset > The new dispatcher is however first started and then the events are > registered at > {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}} > So event order will look like > 1. Send Node heartbeat to {{ResourceTrackerService}} > 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher > call RM failover > 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( > {{resetDispatcher();}} + {{createAndInitActiveServices();}} ) > Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , > the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher > This will cause the above error as at point of time when {{STATUS_UPDATE}} > event is given to dispatcher in {{ResourceTrackerService}} , the new > dispatcher(from the failover) may be started but not yet registered for events > Using same steps(with pausing JVM at debug), i was able to reproduce this in > production cluster also. for {{STATUS_UPDATE}} active service event, when the > service is yet to forward the event to RM dispatcher but a failover is called > and dispatcher reset is between {{resetDispatcher();}} & > {{createAndInitActiveServices();}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org