[ https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046954#comment-15046954 ]
Sunil G commented on YARN-4386: ------------------------------- Hi [~kshukla] Sorry for replying late here. bq. Unless there are 2 refreshNodes done in parallel such that the first deactivateNodeTransition has not finished and the other refreshNodes is also trying to do the same transition Since the transitions are happening under write lock, this may not happen. I have one suggestion here. I feel You could mark a node for GRACEFUL DECOMMISSION and ensure that node is in DECOMMISSIONING state. (can try to fire event to RMNodeImpl directly to do this). Later invoke {{refreshNodesGracefully}} and verify that an event named RECOMMISSION is raised to dispatcher or not. Similarly mark a node as DECOMMISSIONED and then invoke {{refreshNodesGracefully}} and verify the event RECOMMISSION is *NOT* raised. In second case, it will not enter *for* loop. but I feel this will clear cover our case here though its not direct. Pls correct me if I am wrong. > refreshNodesGracefully() looks at active RMNode list for recommissioning > decommissioned nodes > --------------------------------------------------------------------------------------------- > > Key: YARN-4386 > URL: https://issues.apache.org/jira/browse/YARN-4386 > Project: Hadoop YARN > Issue Type: Bug > Components: graceful > Affects Versions: 3.0.0 > Reporter: Kuhu Shukla > Assignee: Kuhu Shukla > Priority: Minor > Attachments: YARN-4386-v1.patch > > > In refreshNodesGracefully(), during recommissioning, the entryset from > getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is > used for checking 'decommissioned' nodes which are present in > getInactiveRMNodes() map alone. > {code} > for (Entry<NodeId, RMNode> entry:rmContext.getRMNodes().entrySet()) { > ......................... > // Recommissioning the nodes > if (entry.getValue().getState() == NodeState.DECOMMISSIONING > || entry.getValue().getState() == NodeState.DECOMMISSIONED) { > this.rmContext.getDispatcher().getEventHandler() > .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION)); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)