[jira] [Commented] (YARN-10966) nodeUpdate will make NPE when node decomissioning trans to decomissed at same time

Wilfred Spiegelenburg (Jira) Wed, 20 Oct 2021 18:11:21 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-10966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432150#comment-17432150
 ]


Wilfred Spiegelenburg commented on YARN-10966:
----------------------------------------------

I can see the issue at a slightly different point than we had seen it happening 
before. It is similar to what was seen in YARN-4677.

Instead of adding the RMNode to the method calls to just get the node ID can we 
not pass in just the node ID? It is the only part needed for further calls and 
can be used for logging also.

Can you also look at the other schedulers (Fifo and Fair) as the same test as 
you extended for the capacity scheduler also exist in those schedulers and we 
should not break those schedulers and have the same tests.

> nodeUpdate will make NPE  when node decomissioning trans to decomissed at 
> same time
> -----------------------------------------------------------------------------------
>
>                 Key: YARN-10966
>                 URL: https://issues.apache.org/jira/browse/YARN-10966
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 3.1.1, 3.2.1, 3.3.1
>            Reporter: tuyu
>            Priority: Major
>             Fix For: 3.1.1, 3.2.1
>
>         Attachments: YARN-10966.001.patch
>
>
> [YARN-4677|https://issues.apache.org/jira/browse/YARN-4677] fix race 
> condition, but not fix complete, it will cause NPE exception when 
> containerLaunchedOnNode call node.getNodeID but the node is null 
> {code:java}
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.containerLaunchedOnNode(AbstractYarnScheduler.java:366)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNewContainerInfo(AbstractYarnScheduler.java:1029)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.nodeUpdate(AbstractYarnScheduler.java:1130)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:1480)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1938)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:173)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler.testRemovedNodeDecomissioningNode
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10966) nodeUpdate will make NPE when node decomissioning trans to decomissed at same time

Reply via email to