[ https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839356#comment-16839356 ]
Hadoop QA commented on YARN-9552: --------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 3s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 82m 57s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}132m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9552 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12968664/YARN-9552-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a10208900547 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6bcc1dc | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24088/testReport/ | | Max. process+thread count | 884 (vs. ulimit of 10000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24088/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > FairScheduler: NODE_UPDATE can cause NoSuchElementException > ----------------------------------------------------------- > > Key: YARN-9552 > URL: https://issues.apache.org/jira/browse/YARN-9552 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Reporter: Peter Bacsko > Assignee: Peter Bacsko > Priority: Major > Attachments: YARN-9552-001.patch > > > We observed a race condition inside YARN with the following stack trace: > {noformat} > 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR > EventDispatcher: Error in handling event type NODE_UPDATE to the Event > Dispatcher > java.util.NoSuchElementException > at > java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036) > at > java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This is basically the same as the one described in YARN-7382, but the root > cause is different. > When we create an application attempt, we create an {{FSAppAttempt}} object. > This contains an {{AppSchedulingInfo}} which contains a set of > {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a > bit later on a separate thread during a state transition: > {noformat} > 2019-05-07 15:58:02,659 INFO [RM StateStore dispatcher] > recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for > app: application_1557237478804_0001 > 2019-05-07 15:58:02,684 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED > 2019-05-07 15:58:02,690 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted > application application_1557237478804_0001 from user: bacskop, in queue: > root.bacskop, currently num of applications: 1 > 2019-05-07 15:58:02,698 INFO [RM Event dispatcher] rmapp.RMAppImpl > (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change > from SUBMITTED to ACCEPTED on event = APP_ACCEPTED > 2019-05-07 15:58:02,731 INFO [RM Event dispatcher] > resourcemanager.ApplicationMasterService > (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app > attempt : appattempt_1557237478804_0001_000001 > 2019-05-07 15:58:02,732 INFO [RM Event dispatcher] attempt.RMAppAttemptImpl > (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_000001 > State change from NEW to SUBMITTED on event = START > 2019-05-07 15:58:02,746 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:<init>(207)) - *** In the constructor of > SchedulerApplicationAttempt > 2019-05-07 15:58:02,747 INFO [SchedulerEventDispatcher:Event Processor] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:<init>(230)) - *** Contents of > appSchedulingInfo: [] > 2019-05-07 15:58:02,752 INFO [SchedulerEventDispatcher:Event Processor] > fair.FairScheduler (FairScheduler.java:addApplicationAttempt(546)) - Added > Application Attempt appattempt_1557237478804_0001_000001 to scheduler from > user: bacskop > 2019-05-07 15:58:02,756 INFO [RM Event dispatcher] > scheduler.AppSchedulingInfo > (AppSchedulingInfo.java:updatePendingResources(257)) - *** Adding scheduler > key: SchedulerRequestKey{priority=0, allocationRequestId=-1, > containerToUpdate=null} for attempt: appattempt_1557237478804_0001_000001 > 2019-05-07 15:58:02,759 INFO [RM Event dispatcher] attempt.RMAppAttemptImpl > (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_000001 > State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED > 2019-05-07 15:58:02,892 INFO [main] impl.YarnClientImpl > (YarnClientImpl.java:submitApplication(310)) - Submitted application > application_1557237478804_0001 > {noformat} > (some extra lines are printed with ***). > So at 15:58:02,747 the set is empty and populated with a single element at > 15:58:02,756 on "RM Event dispatcher". This means there's a tiny time window > during which a {{NODE_UPDATE}} can cause a {{NoSuchElementException}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org