[ https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839464#comment-16839464 ]
Hadoop QA commented on YARN-9552: --------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 39s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 1s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}130m 34s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9552 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12968676/YARN-9552-002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux dadf97efb955 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6bcc1dc | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/24089/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24089/testReport/ | | Max. process+thread count | 927 (vs. ulimit of 10000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24089/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > FairScheduler: NODE_UPDATE can cause NoSuchElementException > ----------------------------------------------------------- > > Key: YARN-9552 > URL: https://issues.apache.org/jira/browse/YARN-9552 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Reporter: Peter Bacsko > Assignee: Peter Bacsko > Priority: Major > Attachments: YARN-9552-001.patch, YARN-9552-002.patch > > > We observed a race condition inside YARN with the following stack trace: > {noformat} > 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR > EventDispatcher: Error in handling event type NODE_UPDATE to the Event > Dispatcher > java.util.NoSuchElementException > at > java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036) > at > java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This is basically the same as the one described in YARN-7382, but the root > cause is different. > When we create an application attempt, we create an {{FSAppAttempt}} object. > This contains an {{AppSchedulingInfo}} which contains a set of > {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a > bit later on a separate thread during a state transition: > {noformat} > 2019-05-07 15:58:02,659 INFO [RM StateStore dispatcher] > recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for > app: application_1557237478804_0001 > 2019-05-07 15:58:02,684 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED > 2019-05-07 15:58:02,690 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted > application application_1557237478804_0001 from user: bacskop, in queue: > root.bacskop, currently num of applications: 1 > 2019-05-07 15:58:02,698 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from SUBMITTED to ACCEPTED on event = APP_ACCEPTED > 2019-05-07 15:58:02,731 INFO [RM Event dispatcher] > resourcemanager.ApplicationMasterService > (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app > attempt : appattempt_1557237478804_0001_000001 > 2019-05-07 15:58:02,732 INFO [RM Event dispatcher] attempt.RMAppAttemptImpl > (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_000001 > State change from NEW to SUBMITTED on event = START > 2019-05-07 15:58:02,746 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:<init>(207)) - *** In the constructor of > SchedulerApplicationAttempt > 2019-05-07 15:58:02,747 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:<init>(230)) - *** Contents of > appSchedulingInfo: [] > 2019-05-07 15:58:02,752 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplicationAttempt(546)) - Added > Application Attempt appattempt_1557237478804_0001_000001 to scheduler from > user: bacskop > 2019-05-07 15:58:02,756 INFO [RM Event dispatcher] > scheduler.AppSchedulingInfo > (AppSchedulingInfo.java:updatePendingResources(257)) - *** Adding scheduler > key: SchedulerRequestKey{priority=0, allocationRequestId=-1, > containerToUpdate=null} for attempt: appattempt_1557237478804_0001_000001 > 2019-05-07 15:58:02,759 INFO [RM Event dispatcher] attempt.RMAppAttemptImpl > (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_000001 > State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED > 2019-05-07 15:58:02,892 INFO [main] impl.YarnClientImpl > (YarnClientImpl.java:submitApplication(310)) - Submitted application > application_1557237478804_0001 > {noformat} > (some extra lines are printed with ***). > So at 15:58:02,747 the set is empty and populated with a single element at > 15:58:02,756 on "RM Event dispatcher". This means there's a tiny time window > during which a {{NODE_UPDATE}} can cause a {{NoSuchElementException}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org