[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263334#comment-16263334
 ] 

Juan Rodríguez Hortalá commented on YARN-6483:
--

Thanks again [~asuresh]. I have tried to reproduce 
TestNodeLabelContainerAllocation in my local environment, and the test fails 
only intermittently. I have found 
[YARN-7507|https://issues.apache.org/jira/browse/YARN-7507], that is still 
unresolved, and that reports that the same tests methods 
testPreferenceOfNeedyPrioritiesUnderSameAppTowardsNodePartitions, 
testPreferenceOfQueuesTowardsNodePartitions, testQueueMetricsWithLabels and 
testQueueMetricsWithLabelsOnDefaultLabelNode for 
TestNodeLabelContainerAllocation are failing in trunk. So it looks like that 
test failure is not releated to this patch.

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-22 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262983#comment-16262983
 ] 

Arun Suresh commented on YARN-6483:
---

Thanks [~juanrh]... The patch LGTM - I like the testcases as well.
The findbugs warning is extant and I believe is being tracked in another JIRA. 
I can fix the remaining checkstyles when committing.
Can you verify if the lone TestCase failure is related to the patch ?
+1 pending the above.

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262039#comment-16262039
 ] 

Hadoop QA commented on YARN-6483:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
46s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 10 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 56s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
10s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
18s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  4s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 5 new + 763 unchanged - 20 fixed = 768 total (was 783) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
8s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 7 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
41s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
15s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
11s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 48s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m  7s{color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}169m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 

[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261927#comment-16261927
 ] 

Juan Rodríguez Hortalá commented on YARN-6483:
--

Hi [~asuresh],

I have added a new patch, and also updated the pull request for easy 
visualization. I have:
  - provided default implementations for the new methods of NodeReport
  - reverted the changes to RMNodeDecommissioningEvent to use an Integer again
  - defined a new enum `org.apache.hadoop.yarn.api.records.NodeUpdateType` that 
is used as a new optional field for NodeReport, that is only set to a non null 
value for NodeReports corresponding to node transitions, and it's null for node 
reports not associated to node transitions like those requested by 
ClientRMService. I have added assertions for the former case to 
TestAMRMRPCNodeUpdates, and for the latter to TestClientRMService

Thanks again for taking a look!

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch, 
> YARN-6483.003.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-20 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259749#comment-16259749
 ] 

Arun Suresh commented on YARN-6483:
---

bq. by "update type" you mean adding an additional field with type 
`RMAppNodeUpdateType` to `NodeReport`?
Yup - but not necessarily the 'RMAppNodeUpdateType' object itself. We should 
add another public facing enum in the proto file that maps to values in 
RMAppNodeUPdateType enum and add that as a field in the NodeReport.

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259564#comment-16259564
 ] 

Juan Rodríguez Hortalá commented on YARN-6483:
--

Hi [~asuresh], 

Thanks a lot for all your comments, I'll send a new patch attending all your 
suggestions. I'd just need a small clarification, by "update type" you mean 
adding an additional field with type `RMAppNodeUpdateType` to `NodeReport`?

Thanks, 

Juan

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-19 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258890#comment-16258890
 ] 

Arun Suresh commented on YARN-6483:
---

Thanks for working on this [~juanrh],

Apart from the above checkstyle and findbugs warnings that would need to be 
fixed, Couple of comments.
* The NodeReport is unfortunately marked as Public and Stable. This means that 
older versions of the client should not have to be re-compiled even if they use 
the newer version of the class - which will happen now, since the new setter 
and getter are abstract. As a work around, what we do now is have default/no-op 
implementations in the NodeReport class and override them in the PBImpl class.
* Given that we are taking the trouble of notifying the AM now of 
Decommissioned / decommisioning nodes. Maybe we should include the update type 
as well in the NodeReport ?
* Looks like the only change in {{RMNodeDecommissioningEvent}} is from Integer 
to int. Can we revert this ?

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Juan Rodríguez Hortalá
>Assignee: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258614#comment-16258614
 ] 

Hadoop QA commented on YARN-6483:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
57s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
11s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
7s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
45s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  4s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 33 new + 606 unchanged - 3 fixed = 639 total (was 609) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
9s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
1s{color} | {color:red} The patch has 32 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  6s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
15s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
38s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  2m 53s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
4s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 14s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
58s{color} | {color:green} hadoop-yarn-client in the 

[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257452#comment-16257452
 ] 

ASF GitHub Bot commented on YARN-6483:
--

Github user juanrh commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/289#discussion_r151770833
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
 ---
@@ -1160,6 +1160,11 @@ public void transition(RMNodeImpl rmNode, 
RMNodeEvent event) {
   // Update NM metrics during graceful decommissioning.
   rmNode.updateMetricsForGracefulDecommission(initState, finalState);
   rmNode.decommissioningTimeout = timeout;
+  // Notify NodesListManager to notify all RMApp so that each 
Application Master
+  // could take any required actions.
+  rmNode.context.getDispatcher().getEventHandler().handle(
+  new NodesListManagerEvent(
+  NodesListManagerEventType.NODE_USABLE, rmNode));
--- End diff --

I have just attached the patch. Thanks a lot for taking a look!


> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch, YARN-6483.002.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257426#comment-16257426
 ] 

ASF GitHub Bot commented on YARN-6483:
--

Github user xslogic commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/289#discussion_r151767367
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
 ---
@@ -1160,6 +1160,11 @@ public void transition(RMNodeImpl rmNode, 
RMNodeEvent event) {
   // Update NM metrics during graceful decommissioning.
   rmNode.updateMetricsForGracefulDecommission(initState, finalState);
   rmNode.decommissioningTimeout = timeout;
+  // Notify NodesListManager to notify all RMApp so that each 
Application Master
+  // could take any required actions.
+  rmNode.context.getDispatcher().getEventHandler().handle(
+  new NodesListManagerEvent(
+  NodesListManagerEventType.NODE_USABLE, rmNode));
--- End diff --

Thanks - looking at the patch.. can you also attach a consolidated patch on 
the JIRA ? So as to kick Jenkins.


> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257393#comment-16257393
 ] 

ASF GitHub Bot commented on YARN-6483:
--

Github user juanrh commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/289#discussion_r151762680
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
 ---
@@ -1160,6 +1160,11 @@ public void transition(RMNodeImpl rmNode, 
RMNodeEvent event) {
   // Update NM metrics during graceful decommissioning.
   rmNode.updateMetricsForGracefulDecommission(initState, finalState);
   rmNode.decommissioningTimeout = timeout;
+  // Notify NodesListManager to notify all RMApp so that each 
Application Master
+  // could take any required actions.
+  rmNode.context.getDispatcher().getEventHandler().handle(
+  new NodesListManagerEvent(
+  NodesListManagerEventType.NODE_USABLE, rmNode));
--- End diff --

Makes sense, I have added a new commit for adding a new value 
`NODE_DECOMMISSIONING` for `NodesListManagerEventType`


> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-17 Thread Brad (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257379#comment-16257379
 ] 

Brad commented on YARN-6483:


I think this would would be useful for another spark change I am working on, 
SPARK-21097. Looking forward to see how this turns out.

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256061#comment-16256061
 ] 

ASF GitHub Bot commented on YARN-6483:
--

Github user xslogic commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/289#discussion_r151554769
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeReport.java
 ---
@@ -53,15 +53,15 @@ public static NodeReport newInstance(NodeId nodeId, 
NodeState nodeState,
   String httpAddress, String rackName, Resource used, Resource 
capability,
   int numContainers, String healthReport, long lastHealthReportTime) {
 return newInstance(nodeId, nodeState, httpAddress, rackName, used,
-capability, numContainers, healthReport, lastHealthReportTime, 
null);
+capability, numContainers, healthReport, lastHealthReportTime, 
null, null);
   }
 
   @Private
   @Unstable
   public static NodeReport newInstance(NodeId nodeId, NodeState nodeState,
   String httpAddress, String rackName, Resource used, Resource 
capability,
   int numContainers, String healthReport, long lastHealthReportTime,
-  Set nodeLabels) {
+  Set nodeLabels, Integer decommissioningTimeout) {
--- End diff --

Yeah - sorry, just noticed that. maybe stick with Integer (although I dont 
like it much). Maybe we can raise another JIRA to fix it properly via your 
second suggestion.


> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256045#comment-16256045
 ] 

ASF GitHub Bot commented on YARN-6483:
--

Github user xslogic commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/289#discussion_r151552105
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
 ---
@@ -1160,6 +1160,11 @@ public void transition(RMNodeImpl rmNode, 
RMNodeEvent event) {
   // Update NM metrics during graceful decommissioning.
   rmNode.updateMetricsForGracefulDecommission(initState, finalState);
   rmNode.decommissioningTimeout = timeout;
+  // Notify NodesListManager to notify all RMApp so that each 
Application Master
+  // could take any required actions.
+  rmNode.context.getDispatcher().getEventHandler().handle(
+  new NodesListManagerEvent(
+  NodesListManagerEventType.NODE_USABLE, rmNode));
--- End diff --

I feel we should make intentions explicit - having a separate event type 
might make the code cleaner and easier to follow - rather than overloading. It 
could be that the assumption in the testcase is wrong (will have to double 
check though), in which case - it is perfectly alright to modify the testcasse 
with the new event.


> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254382#comment-16254382
 ] 

ASF GitHub Bot commented on YARN-6483:
--

Github user juanrh commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/289#discussion_r151276382
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
 ---
@@ -1160,6 +1160,11 @@ public void transition(RMNodeImpl rmNode, 
RMNodeEvent event) {
   // Update NM metrics during graceful decommissioning.
   rmNode.updateMetricsForGracefulDecommission(initState, finalState);
   rmNode.decommissioningTimeout = timeout;
+  // Notify NodesListManager to notify all RMApp so that each 
Application Master
+  // could take any required actions.
+  rmNode.context.getDispatcher().getEventHandler().handle(
+  new NodesListManagerEvent(
+  NodesListManagerEventType.NODE_USABLE, rmNode));
--- End diff --

I wasn't very sure about using `NODE_USABLE`, but while I was making the 
changes to follow your suggestion, I have found that in the current code 
[`TestRMNodeTransitions.testResourceUpdateOnDecommissioningNode`](https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java#L994)
 is asserting that `NodesListManagerEventType.NODE_USABLE` is expected for a 
node that transitions to `DECOMMISSIONING`. Also, NodesListManagerEventType is 
transformed into the corresponding `RMAppNodeUpdateType` in 
`NodesListManager.handle` to build a `RMAppNodeUpdateEvent` that is processed 
in `RMAppImpl.processNodeUpdate` which just uses the `RMAppNodeUpdateType` for 
logging.

So it looks like it is ok to use `NodesListManagerEventType.NODE_USABLE` 
for nodes in decommissioning state. Do you still think it's worth adding some 
additional value for `NodesListManagerEventType` and `RMAppNodeUpdateType`?


> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254274#comment-16254274
 ] 

ASF GitHub Bot commented on YARN-6483:
--

Github user juanrh commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/289#discussion_r151262546
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeReport.java
 ---
@@ -53,15 +53,15 @@ public static NodeReport newInstance(NodeId nodeId, 
NodeState nodeState,
   String httpAddress, String rackName, Resource used, Resource 
capability,
   int numContainers, String healthReport, long lastHealthReportTime) {
 return newInstance(nodeId, nodeState, httpAddress, rackName, used,
-capability, numContainers, healthReport, lastHealthReportTime, 
null);
+capability, numContainers, healthReport, lastHealthReportTime, 
null, null);
   }
 
   @Private
   @Unstable
   public static NodeReport newInstance(NodeId nodeId, NodeState nodeState,
   String httpAddress, String rackName, Resource used, Resource 
capability,
   int numContainers, String healthReport, long lastHealthReportTime,
-  Set nodeLabels) {
+  Set nodeLabels, Integer decommissioningTimeout) {
--- End diff --

This `Integer` comes from 
[`RMNode.getDecommissioningTimeout()`](https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java#L189)
 that was already returning an Integer before this patch, because only nodes in 
DECOMMISSIONING state have an associated decommission timeout, so `null` is 
used to express absent timeout. In this patch 
`RMNode.getDecommissioningTimeout()` is used in 
`DefaultAMSProcessor.handleNodeUpdates` to get the argument 
`decommissioningTimeout` for `BuilderUtils.newNodeReport`. If we use a `int` 
for `decommissioningTimeout` in `NodeReport.newInstance` I think we should also 
use an `int` for the same argument in `BuilderUtils.newNodeReport` for 
uniformity, which implies a conversion from `null` to `-1` in 
`DefaultAMSProcessor.handleNodeUpdates`.

So I think we should either keep using `Integer decommissioningTimeout` 
everywhere, enconding absent timeout with `null`, or use `int 
decommissioningTimeout` everywhere, enconding absent timeout with a negative 
timeout, which is coherent with `message NodeReportProto` using an unsigned int 
for `decommissioning_timeout`. What do you think about these 2 alternatives?


> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254273#comment-16254273
 ] 

Juan Rodríguez Hortalá commented on YARN-6483:
--

Thanks a lot for the comments [~asuresh], I'll answer in the pull request

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254131#comment-16254131
 ] 

ASF GitHub Bot commented on YARN-6483:
--

Github user xslogic commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/289#discussion_r151244254
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
 ---
@@ -1160,6 +1160,11 @@ public void transition(RMNodeImpl rmNode, 
RMNodeEvent event) {
   // Update NM metrics during graceful decommissioning.
   rmNode.updateMetricsForGracefulDecommission(initState, finalState);
   rmNode.decommissioningTimeout = timeout;
+  // Notify NodesListManager to notify all RMApp so that each 
Application Master
+  // could take any required actions.
+  rmNode.context.getDispatcher().getEventHandler().handle(
+  new NodesListManagerEvent(
+  NodesListManagerEventType.NODE_USABLE, rmNode));
--- End diff --

Hmmm.. don't think we should be sending NODE_USABLE event here.. since 
technically, it is not usable.
Maybe consider creating a new NODE_DECOMMISSIONING event ?


> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-15 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254132#comment-16254132
 ] 

Arun Suresh commented on YARN-6483:
---

Left some comments on the pull request.

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254103#comment-16254103
 ] 

ASF GitHub Bot commented on YARN-6483:
--

Github user xslogic commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/289#discussion_r151241885
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeReport.java
 ---
@@ -53,15 +53,15 @@ public static NodeReport newInstance(NodeId nodeId, 
NodeState nodeState,
   String httpAddress, String rackName, Resource used, Resource 
capability,
   int numContainers, String healthReport, long lastHealthReportTime) {
 return newInstance(nodeId, nodeState, httpAddress, rackName, used,
-capability, numContainers, healthReport, lastHealthReportTime, 
null);
+capability, numContainers, healthReport, lastHealthReportTime, 
null, null);
   }
 
   @Private
   @Unstable
   public static NodeReport newInstance(NodeId nodeId, NodeState nodeState,
   String httpAddress, String rackName, Resource used, Resource 
capability,
   int numContainers, String healthReport, long lastHealthReportTime,
-  Set nodeLabels) {
+  Set nodeLabels, Integer decommissioningTimeout) {
--- End diff --

Can we use int ? Am not confortable using Integer - given the sometimes 
arbitrary boxing/unboxing rules.


> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253912#comment-16253912
 ] 

Juan Rodríguez Hortalá commented on YARN-6483:
--

Hi, 

Any thoughts on this proposal? [~tgraves] do you think this could be 
interesting to trigger the blacklisting of executors in decommissioning nodes 
as in https://github.com/apache/spark/pull/19267/? 

Thanks,

Juan 

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-11-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242634#comment-16242634
 ] 

ASF GitHub Bot commented on YARN-6483:
--

GitHub user juanrh opened a pull request:

https://github.com/apache/hadoop/pull/289

[YARN-6483] Add nodes transitioning to DECOMMISSIONING state to the list of 
updated nodes returned by the Resource Manager as a response to the Application 
Master heartbeat

This is an alternative approach to 
https://issues.apache.org/jira/browse/YARN-3224 for notifying all affected 
application masters when a node transitions into the DECOMMISSIONING state. 
This change modifies the AllocateResponse that the YARN Resource Manager uses 
to respond to heartbeat request from application masters, to add any node that 
has transitioned to DECOMMISSIONING state since the last heartbeat to the list 
of NodeReport objects that is part of the AllocateResponse object. We also add 
a new field to each NodeReport to add the decommission timeout for 
DECOMMISSIONING nodes, thus covering the same functionality of the original 
proposal in YARN-3224.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/juanrh/hadoop hortala-YARN-6483

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/289.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #289


commit c85b7012b90fff2cf323009664b14db1e998408a
Author: Juan Rodriguez Hortala 
Date:   2017-10-27T23:59:57Z

Notify affected Application Masters when a node enters DECOMMISSIONING state

commit ed2561e3bfb821f6b981ac5e76761c7ae8475e01
Author: Juan Rodriguez Hortala 
Date:   2017-11-02T00:41:27Z

Add decommission timeout field to NodeReport

commit 506f7defbf217c6b8b525a90604e2484573bba8d
Author: Juan Rodriguez Hortala 
Date:   2017-11-02T01:19:49Z

fix TestClientRMService

adapt test to decommission timeout checks being independent
of received heartbeats

commit 9a82f314feca6e06e067ceb1b64e0e4e4fad3882
Author: Juan Rodriguez Hortala 
Date:   2017-11-02T18:07:29Z

use xml format for excludes files with timeouts

in this version that is the only way to specify a timeout in the
excludes file

commit e17bc8c132ae3fb70fe273e6bee12956142b5345
Author: Juan Rodriguez Hortala 
Date:   2017-11-02T20:41:53Z

load dynamic timeout like in hadoop trunk

replace dynamic conf by using the configuration
passed by AdminService

cr https://cr.amazon.com/r/7919988/




> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-05-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997111#comment-15997111
 ] 

Juan Rodríguez Hortalá commented on YARN-6483:
--

Hi [~Naganarasimha], sorry for the late answer. I have attached a patch for 
commit 103dfae7b9bbcec673dd7c53d38df034b78693a2 that is the current head of 
branch-2.8 in https://github.com/apache/hadoop. However since I opened this 
issue I had the opportunity to study the umbrella YARN-914 for decommissioning 
and I see it already has a sub-task YARN-3224 for notifying the AM of the node 
decommission. The design proposed on YARN-914 uses the preemption framework 
described in the umbrella YARN-45. There has been not much activity lately in 
YARN-3224, and it seems to be blocked on the sub-task YARN-3784 of the 
preemption umbrella YARN-45. However my patch here is not able to send the 
decommission timeout, as it is based on `NodeState` that is an enum that 
doesn't have any field. I don't think it would be a good idea to add a timeout 
field to `NodeState` because it doesn't make sense for all enum values. So I 
think the options here are: 
# Either modify `NodeState` to replace the enum by a class hierarchy, so the 
subclass for decommissioning might have a timeout. This looks like a 
significant modification, that anyway diverges from the design of YARN-914 
# Or implement the missing issues for the preemption framework and the 
decommissioning umbrella, in particular YARN-3784 and then YARN-3224, that 
would replace this issue

Which one of those do you think it would be the best option?

> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Juan Rodríguez Hortalá
> Attachments: YARN-6483-v1.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-04-17 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972101#comment-15972101
 ] 

Naganarasimha G R commented on YARN-6483:
-

Hi [~webxcreen], Seems like this information is required for any of the long 
running services, Would you please share a valid patch so that i can understand 
your approach ?


> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Juan Rodríguez Hortalá
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6483) Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned by the Resource Manager as a response to the Application Master heartbeat

2017-04-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969326#comment-15969326
 ] 

Hadoop QA commented on YARN-6483:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 13s{color} 
| {color:red} YARN-6483 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6483 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12863488/YARN-6483.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/15641/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes 
> returned by the Resource Manager as a response to the Application Master 
> heartbeat
> 
>
> Key: YARN-6483
> URL: https://issues.apache.org/jira/browse/YARN-6483
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Juan Rodríguez Hortalá
> Attachments: YARN-6483.patch
>
>
> The DECOMMISSIONING node state is currently used as part of the graceful 
> decommissioning mechanism to give time for tasks to complete in a node that 
> is scheduled for decommission, and for reducer tasks to read the shuffle 
> blocks in that node. Also, YARN effectively blacklists nodes in 
> DECOMMISSIONING state by assigning them a capacity of 0, to prevent 
> additional containers to be launched in those nodes, so no more shuffle 
> blocks are written to the node. This blacklisting is not effective for 
> applications like Spark, because a Spark executor running in a YARN container 
> will keep receiving more tasks after the corresponding node has been 
> blacklisted at the YARN level. We would like to propose a modification of the 
> YARN heartbeat mechanism so nodes transitioning to DECOMMISSIONING are added 
> to the list of updated nodes returned by the Resource Manager as a response 
> to the Application Master heartbeat. This way a Spark application master 
> would be able to blacklist a DECOMMISSIONING at the Spark level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org