[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

Robert Kanter (JIRA) Mon, 04 Dec 2017 18:08:16 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277895#comment-16277895
 ]


Robert Kanter commented on YARN-6483:
-------------------------------------

[~asuresh], did you mean to commit this to branch-3.0?  The fix version for 
this JIRA says 3.1.0.
Plus, the 
{{TestResourceTrackerService#testGracefulDecommissionDefaultTimeoutResolution}} 
added here is relying on an XML excludes file, which is currently only 
supported in trunk (YARN-7162), so it fails when run in branch-3.0 because it 
reads each line of XML as a separate host (e.g. {{<?xml}}, 
{{<name>host1</name>}}, etc):
{noformat}
Running org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
Tests run: 35, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 52.706 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
testGracefulDecommissionDefaultTimeoutResolution(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
  Time elapsed: 23.913 sec  <<< FAILURE!
java.lang.AssertionError: Node state is not correct (timedout) 
expected:<DECOMMISSIONING> but was:<RUNNING>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:908)
        at 
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testGracefulDecommissionDefaultTimeoutResolution(TestResourceTrackerService.java:345)
{noformat}

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned to the AM
> ------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6483
>                 URL: https://issues.apache.org/jira/browse/YARN-6483
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Juan Rodríguez Hortalá
>            Assignee: Juan Rodríguez Hortalá
>             Fix For: 3.1.0
>
>         Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM

Reply via email to