[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258890#comment-16258890
 ] 

Arun Suresh edited comment on YARN-6483 at 11/20/17 7:33 AM:
-------------------------------------------------------------

Thanks for working on this [~juanrh],

Apart from the above checkstyle and findbugs warnings that would need to be 
fixed, Couple of comments.
* The NodeReport is unfortunately marked as Public and Stable. This means that 
older versions of the client should not have to be re-compiled even if they use 
the newer version of the class - which would be the case now - since the new 
setter and getter are abstract. As a work around, what we do in such situations 
is have default/no-op implementations in the base class (NodeReport here) and 
override them in the PBImpl class.
* Given that we are taking the trouble of notifying the AM now of 
Decommissioned / decommisioning nodes. Maybe we should include the update type 
as well in the NodeReport ?
* Looks like the only change in {{RMNodeDecommissioningEvent}} is from Integer 
to int. Can we revert this ?


was (Author: asuresh):
Thanks for working on this [~juanrh],

Apart from the above checkstyle and findbugs warnings that would need to be 
fixed, Couple of comments.
* The NodeReport is unfortunately marked as Public and Stable. This means that 
older versions of the client should not have to be re-compiled even if they use 
the newer version of the class - which will happen now, since the new setter 
and getter are abstract. As a work around, what we do now is have default/no-op 
implementations in the NodeReport class and override them in the PBImpl class.
* Given that we are taking the trouble of notifying the AM now of 
Decommissioned / decommisioning nodes. Maybe we should include the update type 
as well in the NodeReport ?
* Looks like the only change in {{RMNodeDecommissioningEvent}} is from Integer 
to int. Can we revert this ?

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6483
>                 URL: https://issues.apache.org/jira/browse/YARN-6483
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 3.1.0
>            Reporter: Juan Rodríguez Hortalá
>            Assignee: Juan Rodríguez Hortalá
>         Attachments: YARN-6483-v1.patch, YARN-6483.002.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to