[ https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101643#comment-16101643 ]
Hadoop QA commented on YARN-6102: --------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 6s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 44m 11s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_131. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}106m 4s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_131 Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation | | JDK v1.7.0_131 Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:5e40efe | | JIRA Issue | YARN-6102 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12878964/YARN-6102-branch-2.002-addednum.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 91c0aaba5b95 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / 554c3cd | | Default Java | 1.7.0_131 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_131 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_131 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/16554/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_131.txt | | JDK v1.7.0_131 Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16554/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16554/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > RMActiveService context to be updated with new RMContext on failover > -------------------------------------------------------------------- > > Key: YARN-6102 > URL: https://issues.apache.org/jira/browse/YARN-6102 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.8.0, 2.7.3 > Reporter: Ajith S > Assignee: Rohith Sharma K S > Priority: Critical > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: eventOrder.JPG, YARN-6102.01.patch, YARN-6102.02.patch, > YARN-6102.03.patch, YARN-6102.04.patch, YARN-6102.05.patch, > YARN-6102.06.patch, YARN-6102.07.patch, YARN-6102-branch-2.001.patch, > YARN-6102-branch-2.002-addednum.patch, YARN-6102-branch-2.002.patch > > > {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in > dispatcher thread > java.lang.Exception: No handler for registered for class > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120) > at java.lang.Thread.run(Thread.java:745) > 2017-01-17 16:42:17,914 INFO [AsyncDispatcher ShutDown handler] > event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code} > The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits > abnormally, after some analysis, i was able to reproduce. > Once the nodeHeartBeat is sent to RM, inside > {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}}, > before sending it to dispatcher through > {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} > if RM failover is called, the dispatcher is reset > The new dispatcher is however first started and then the events are > registered at > {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}} > So event order will look like > 1. Send Node heartbeat to {{ResourceTrackerService}} > 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher > call RM failover > 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( > {{resetDispatcher();}} + {{createAndInitActiveServices();}} ) > Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , > the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher > This will cause the above error as at point of time when {{STATUS_UPDATE}} > event is given to dispatcher in {{ResourceTrackerService}} , the new > dispatcher(from the failover) may be started but not yet registered for events > Using same steps(with pausing JVM at debug), i was able to reproduce this in > production cluster also. for {{STATUS_UPDATE}} active service event, when the > service is yet to forward the event to RM dispatcher but a failover is called > and dispatcher reset is between {{resetDispatcher();}} & > {{createAndInitActiveServices();}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org