[ https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256045#comment-16256045 ]
ASF GitHub Bot commented on YARN-6483: -------------------------------------- Github user xslogic commented on a diff in the pull request: https://github.com/apache/hadoop/pull/289#discussion_r151552105 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java --- @@ -1160,6 +1160,11 @@ public void transition(RMNodeImpl rmNode, RMNodeEvent event) { // Update NM metrics during graceful decommissioning. rmNode.updateMetricsForGracefulDecommission(initState, finalState); rmNode.decommissioningTimeout = timeout; + // Notify NodesListManager to notify all RMApp so that each Application Master + // could take any required actions. + rmNode.context.getDispatcher().getEventHandler().handle( + new NodesListManagerEvent( + NodesListManagerEventType.NODE_USABLE, rmNode)); --- End diff -- I feel we should make intentions explicit - having a separate event type might make the code cleaner and easier to follow - rather than overloading. It could be that the assumption in the testcase is wrong (will have to double check though), in which case - it is perfectly alright to modify the testcasse with the new event. > Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes > returned by the Resource Manager as a response to the Application Master > heartbeat > ---------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: YARN-6483 > URL: https://issues.apache.org/jira/browse/YARN-6483 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager > Affects Versions: 2.8.0 > Reporter: Juan RodrĂguez Hortalá > Attachments: YARN-6483-v1.patch > > > The DECOMMISSIONING node state is currently used as part of the graceful > decommissioning mechanism to give time for tasks to complete in a node that > is scheduled for decommission, and for reducer tasks to read the shuffle > blocks in that node. Also, YARN effectively blacklists nodes in > DECOMMISSIONING state by assigning them a capacity of 0, to prevent > additional containers to be launched in those nodes, so no more shuffle > blocks are written to the node. This blacklisting is not effective for > applications like Spark, because a Spark executor running in a YARN container > will keep receiving more tasks after the corresponding node has been > blacklisted at the YARN level. We would like to propose a modification of the > YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added > to the list of updated nodes returned by the Resource Manager as a response > to the Application Master heartbeat. This way a Spark application master > would be able to blacklist a DECOMMISSIONING at the Spark level. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org