[ https://issues.apache.org/jira/browse/YARN-10966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432150#comment-17432150 ]
Wilfred Spiegelenburg commented on YARN-10966: ---------------------------------------------- I can see the issue at a slightly different point than we had seen it happening before. It is similar to what was seen in YARN-4677. Instead of adding the RMNode to the method calls to just get the node ID can we not pass in just the node ID? It is the only part needed for further calls and can be used for logging also. Can you also look at the other schedulers (Fifo and Fair) as the same test as you extended for the capacity scheduler also exist in those schedulers and we should not break those schedulers and have the same tests. > nodeUpdate will make NPE when node decomissioning trans to decomissed at > same time > ----------------------------------------------------------------------------------- > > Key: YARN-10966 > URL: https://issues.apache.org/jira/browse/YARN-10966 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager > Affects Versions: 3.1.1, 3.2.1, 3.3.1 > Reporter: tuyu > Priority: Major > Fix For: 3.1.1, 3.2.1 > > Attachments: YARN-10966.001.patch > > > [YARN-4677|https://issues.apache.org/jira/browse/YARN-4677] fix race > condition, but not fix complete, it will cause NPE exception when > containerLaunchedOnNode call node.getNodeID but the node is null > {code:java} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.containerLaunchedOnNode(AbstractYarnScheduler.java:366) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNewContainerInfo(AbstractYarnScheduler.java:1029) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.nodeUpdate(AbstractYarnScheduler.java:1130) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:1480) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1938) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:173) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler.testRemovedNodeDecomissioningNode > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org