[jira] [Commented] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster

2015-11-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026917#comment-15026917
 ] 

Hadoop QA commented on YARN-4389:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 48s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
43s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 5s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s 
{color} | {color:red} Patch generated 7 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 160, now 167). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 0s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 48s 
{color} | {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 8m 47s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.7.0_85 with JDK v1.7.0_85 
generated 1 new issues (was 0, now 1). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 37s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 29s {color} 
| {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 50s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_85. 
{color} |
| {color:red}-1{color} | {color:red} unit 

[jira] [Commented] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently

2015-11-25 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027096#comment-15027096
 ] 

Tsuyoshi Ozawa commented on YARN-4393:
--

+1, checking this in.

> TestResourceLocalizationService#testFailedDirsResourceRelease fails 
> intermittently
> --
>
> Key: YARN-4393
> URL: https://issues.apache.org/jira/browse/YARN-4393
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4393.01.patch
>
>
> [~ozawa] pointed out this failure on YARN-4380.
> Check 
> https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773
> {noformat}
> sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>  Time elapsed: 0.093 sec <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> eventHandler.handle(
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> Actual invocation has different arguments:
> eventHandler.handle(
> EventType: APPLICATION_INITED
> );
> -> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes

2015-11-25 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026916#comment-15026916
 ] 

Kuhu Shukla commented on YARN-4386:
---

Sure that would help. I will update with revised patch soon. Thank you!

> refreshNodesGracefully() looks at active RMNode list for recommissioning 
> decommissioned nodes
> -
>
> Key: YARN-4386
> URL: https://issues.apache.org/jira/browse/YARN-4386
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: graceful
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-4386-v1.patch
>
>
> In refreshNodesGracefully(), during recommissioning, the entryset from 
> getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is 
> used for checking 'decommissioned' nodes which are present in 
> getInactiveRMNodes() map alone. 
> {code}
> for (Entry entry:rmContext.getRMNodes().entrySet()) { 
> .
>  // Recommissioning the nodes
> if (entry.getValue().getState() == NodeState.DECOMMISSIONING
> || entry.getValue().getState() == NodeState.DECOMMISSIONED) {
>   this.rmContext.getDispatcher().getEventHandler()
>   .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes

2015-11-25 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026926#comment-15026926
 ] 

Kuhu Shukla commented on YARN-4386:
---

[~djp], without the patch, the decommissioned node is looked up in the list 
returned by getRMNodes() which will never have any node with 
nodestate=DECOMMISSIONED; this means that currently a decommissioned node is 
not even looked at for recommissioning since its part of inactiveNodes list and 
not the getRMNodes() list. I will continue to think of a test case for this. 
Appreciate your comments and inputs.

> refreshNodesGracefully() looks at active RMNode list for recommissioning 
> decommissioned nodes
> -
>
> Key: YARN-4386
> URL: https://issues.apache.org/jira/browse/YARN-4386
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: graceful
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-4386-v1.patch
>
>
> In refreshNodesGracefully(), during recommissioning, the entryset from 
> getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is 
> used for checking 'decommissioned' nodes which are present in 
> getInactiveRMNodes() map alone. 
> {code}
> for (Entry entry:rmContext.getRMNodes().entrySet()) { 
> .
>  // Recommissioning the nodes
> if (entry.getValue().getState() == NodeState.DECOMMISSIONING
> || entry.getValue().getState() == NodeState.DECOMMISSIONED) {
>   this.rmContext.getDispatcher().getEventHandler()
>   .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4318) Test failure: TestAMAuthorization

2015-11-25 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026936#comment-15026936
 ] 

Tsuyoshi Ozawa commented on YARN-4318:
--

[~kshukla] please go ahead :-)

> Test failure: TestAMAuthorization
> -
>
> Key: YARN-4318
> URL: https://issues.apache.org/jira/browse/YARN-4318
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Kuhu Shukla
>
> {quote}
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.208 sec  <<< ERROR!
> java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
> destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For 
> more details see:  http://wiki.apache.org/hadoop/UnknownHost
>   at org.apache.hadoop.ipc.Client$Connection.(Client.java:403)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1400)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently

2015-11-25 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa reopened YARN-4393:
--

oops, commentted on wrong jira. Reopening.

> TestResourceLocalizationService#testFailedDirsResourceRelease fails 
> intermittently
> --
>
> Key: YARN-4393
> URL: https://issues.apache.org/jira/browse/YARN-4393
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4393.01.patch
>
>
> [~ozawa] pointed out this failure on YARN-4380.
> Check 
> https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773
> {noformat}
> sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>  Time elapsed: 0.093 sec <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> eventHandler.handle(
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> Actual invocation has different arguments:
> eventHandler.handle(
> EventType: APPLICATION_INITED
> );
> -> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently

2015-11-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027045#comment-15027045
 ] 

Hudson commented on YARN-4380:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8887 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8887/])
YARN-4380. (ozawa: rev 0656d2dc83af6a48a8d8d0e37cdf1f813124f366)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java


> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently
> 
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt,
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-11-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027088#comment-15027088
 ] 

Hadoop QA commented on YARN-3226:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 5 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 104, now 107). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 48s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 55s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 154m 1s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_85 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12774347/0003-YARN-3226.patch |
| JIRA 

[jira] [Updated] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently

2015-11-25 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4380:
-
Hadoop Flags: Reviewed

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently
> 
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt,
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently

2015-11-25 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027113#comment-15027113
 ] 

Tsuyoshi Ozawa commented on YARN-4393:
--

[~varun_saxena], before committing, I found that there are some missing 
dispatcher.await():

testResourceRelease:
{code}
  //Send Cleanup Event
  spyService.handle(new ContainerLocalizationCleanupEvent(c, req)); // <-- 
here!
  verify(mockLocallilzerTracker)
.cleanupPrivLocalizers("container_314159265358979_0003_01_42");
  req2.remove(LocalResourceVisibility.PRIVATE);
  spyService.handle(new ContainerLocalizationCleanupEvent(c, req2));
  dispatcher.await();
{code}

testFailedDirsResourceRelease:
{code}
  // Send Cleanup Event
  spyService.handle(new ContainerLocalizationCleanupEvent(c, req)); // <- 
here!
  verify(mockLocallilzerTracker).cleanupPrivLocalizers(
"container_314159265358979_0003_01_42");
{code}

testRecovery:
{code}
  assertNotNull("Localization not started", privLr1.getLocalPath());
  privTracker1.handle(new ResourceLocalizedEvent(privReq1,
  privLr1.getLocalPath(), privLr1.getSize() + 5));
  assertNotNull("Localization not started", privLr2.getLocalPath());
  privTracker1.handle(new ResourceLocalizedEvent(privReq2,
  privLr2.getLocalPath(), privLr2.getSize() + 10));
  assertNotNull("Localization not started", appLr1.getLocalPath());
  appTracker1.handle(new ResourceLocalizedEvent(appReq1,
  appLr1.getLocalPath(), appLr1.getSize()));
  assertNotNull("Localization not started", appLr3.getLocalPath());
  appTracker2.handle(new ResourceLocalizedEvent(appReq3,
  appLr3.getLocalPath(), appLr3.getSize() + 7));
  assertNotNull("Localization not started", pubLr1.getLocalPath());
  pubTracker.handle(new ResourceLocalizedEvent(pubReq1,
  pubLr1.getLocalPath(), pubLr1.getSize() + 1000));
  assertNotNull("Localization not started", pubLr2.getLocalPath());
  pubTracker.handle(new ResourceLocalizedEvent(pubReq2,
  pubLr2.getLocalPath(), pubLr2.getSize() + 9));
{code}

Could you update them?

> TestResourceLocalizationService#testFailedDirsResourceRelease fails 
> intermittently
> --
>
> Key: YARN-4393
> URL: https://issues.apache.org/jira/browse/YARN-4393
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4393.01.patch
>
>
> [~ozawa] pointed out this failure on YARN-4380.
> Check 
> https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773
> {noformat}
> sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>  Time elapsed: 0.093 sec <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> eventHandler.handle(
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> Actual invocation has different arguments:
> eventHandler.handle(
> EventType: APPLICATION_INITED
> );
> -> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-25 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027145#comment-15027145
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

Kicking Jenkins again.

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348.001.patch, YARN-4348.001.patch, 
> log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids

2015-11-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026923#comment-15026923
 ] 

Hadoop QA commented on YARN-4371:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client (total was 15, now 16). 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 39s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 42s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_85. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 115m 32s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.8.0_66 Timed out junit tests | 
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestNMClient |
| JDK v1.7.0_85 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.7.0_85 Timed out junit tests | 
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | 

[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes

2015-11-25 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027016#comment-15027016
 ] 

Sunil G commented on YARN-4386:
---

{{getRMNodes}} will have only Active/Decommissioning nodes. hence as you 
mentioned, its highly unlikely that a node will be getRMNodes list which is 
also DECOMMISSIONED.

For test case, you can try add a node which is DECOMMISSIONED to active node 
list forcefully. But this seems again not a very valid case. [~djp], will this 
happen only if a race condition exits in active->decommisioned window.

> refreshNodesGracefully() looks at active RMNode list for recommissioning 
> decommissioned nodes
> -
>
> Key: YARN-4386
> URL: https://issues.apache.org/jira/browse/YARN-4386
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: graceful
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-4386-v1.patch
>
>
> In refreshNodesGracefully(), during recommissioning, the entryset from 
> getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is 
> used for checking 'decommissioned' nodes which are present in 
> getInactiveRMNodes() map alone. 
> {code}
> for (Entry entry:rmContext.getRMNodes().entrySet()) { 
> .
>  // Recommissioning the nodes
> if (entry.getValue().getState() == NodeState.DECOMMISSIONING
> || entry.getValue().getState() == NodeState.DECOMMISSIONED) {
>   this.rmContext.getDispatcher().getEventHandler()
>   .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Consider user limit when calculating total pending resource for preemption policy in Capacity Scheduler

2015-11-25 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027198#comment-15027198
 ] 

Sangjin Lee commented on YARN-3769:
---

Could you check if the 2.7 commit applies cleanly to branch-2.6? If not, it 
would be great if you could post a 2.6 patch. Thanks.

> Consider user limit when calculating total pending resource for preemption 
> policy in Capacity Scheduler
> ---
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 2.7.3
>
> Attachments: YARN-3769-branch-2.002.patch, 
> YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch, 
> YARN-3769-branch-2.7.005.patch, YARN-3769-branch-2.7.006.patch, 
> YARN-3769-branch-2.7.007.patch, YARN-3769.001.branch-2.7.patch, 
> YARN-3769.001.branch-2.8.patch, YARN-3769.003.patch, YARN-3769.004.patch, 
> YARN-3769.005.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids

2015-11-25 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027005#comment-15027005
 ] 

Sunil G commented on YARN-4371:
---

Test case failures are not related to this patch, and it happened because of 
hostname problem. we can see one more report whether this same tests 
{{TestGetGroups}} are failing or not.

> "yarn application -kill" should take multiple application ids
> -
>
> Key: YARN-4371
> URL: https://issues.apache.org/jira/browse/YARN-4371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Sunil G
> Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch
>
>
> Currently we cannot pass multiple applications to "yarn application -kill" 
> command. The command should take multiple application ids at the same time. 
> Each entries should be separated with whitespace like:
> {code}
> yarn application -kill application_1234_0001 application_1234_0007 
> application_1234_0012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4318) Test failure: TestAMAuthorization

2015-11-25 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4318:
-
Assignee: Kuhu Shukla

> Test failure: TestAMAuthorization
> -
>
> Key: YARN-4318
> URL: https://issues.apache.org/jira/browse/YARN-4318
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Kuhu Shukla
>
> {quote}
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.208 sec  <<< ERROR!
> java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
> destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For 
> more details see:  http://wiki.apache.org/hadoop/UnknownHost
>   at org.apache.hadoop.ipc.Client$Connection.(Client.java:403)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1400)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API

2015-11-25 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4292:
--
Attachment: 0004-YARN-4292.patch

As YARN-3980 is in, making necessary changes and updating a new version of 
patch.
Also attaching REST o/p (xml)
{noformat}

  
   /default-rack
   RUNNING
   localhost:25006 
   localhost
   localhost:25008
   1448467127146
   3.0.0-SNAPSHOT
   
   0
   0
   8192
   0
   8
   
 4430
 4446
 18.49383544921875
 0
 0
 0.0
   
 

{noformat}
normal o/p
{noformat}
nodes: {
node: [1]
0:  {
rack: "/default-rack"
state: "RUNNING"
id: "localhost:25006"
nodeHostName: "localhost"
nodeHTTPAddress: "localhost:25008"
lastHealthUpdate: 1448467007146
version: "3.0.0-SNAPSHOT"
healthReport: ""
numContainers: 0
usedMemoryMB: 0
availMemoryMB: 8192
usedVirtualCores: 0
availableVirtualCores: 8
resourceUtilization: {
 nodePhysicalMemoryMB: 4384
 nodeVirtualMemoryMB: 4399
 nodeCPUUsage: 6.99766731262207
 containersPhysicalMemoryMB: 0
 containersVirtualMemoryMB: 0
 containersCPUUsage: 0
}-
}

[~leftnoteasy], could you please check.
-
}-
{noformat}

> ResourceUtilization should be a part of NodeInfo REST API
> -
>
> Key: YARN-4292
> URL: https://issues.apache.org/jira/browse/YARN-4292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0001-YARN-4292.patch, 0002-YARN-4292.patch, 
> 0003-YARN-4292.patch, 0004-YARN-4292.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Consider user limit when calculating total pending resource for preemption policy in Capacity Scheduler

2015-11-25 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027011#comment-15027011
 ] 

Eric Payne commented on YARN-3769:
--

bq. should this be backported to 2.6.x?
[~sjlee0], I would recommend it. We were seeing a lot of unnecessary preempting 
without this fix. 

> Consider user limit when calculating total pending resource for preemption 
> policy in Capacity Scheduler
> ---
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 2.7.3
>
> Attachments: YARN-3769-branch-2.002.patch, 
> YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch, 
> YARN-3769-branch-2.7.005.patch, YARN-3769-branch-2.7.006.patch, 
> YARN-3769-branch-2.7.007.patch, YARN-3769.001.branch-2.7.patch, 
> YARN-3769.001.branch-2.8.patch, YARN-3769.003.patch, YARN-3769.004.patch, 
> YARN-3769.005.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4390) Consider container request size during CS preemption

2015-11-25 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027020#comment-15027020
 ] 

Eric Payne commented on YARN-4390:
--

[~bikassaha], thank you for your comments.
{quote}
However, if YARN ends up preempting 8x1GB containers on different nodes then 
the under-allocated AM will not get its resources and may result in further 
avoidable preemptions. 
{quote}
This is the scenario I was documenting in the description.

> Consider container request size during CS preemption
> 
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently

2015-11-25 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027004#comment-15027004
 ] 

Tsuyoshi Ozawa commented on YARN-4380:
--

+1, checking this in.

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently
> 
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt,
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues

2015-11-25 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027026#comment-15027026
 ] 

Sunil G commented on YARN-3849:
---

Yes, I think so. This will be a good addition in 2.6 line, I will try see to 
back port the same to 2.6.

> Too much of preemption activity causing continuos killing of containers 
> across queues
> -
>
> Key: YARN-3849
> URL: https://issues.apache.org/jira/browse/YARN-3849
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0, 2.7.3
>
> Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, 
> 0003-YARN-3849.patch, 0004-YARN-3849-branch2-7.patch, 0004-YARN-3849.patch
>
>
> Two queues are used. Each queue has given a capacity of 0.5. Dominant 
> Resource policy is used.
> 1. An app is submitted in QueueA which is consuming full cluster capacity
> 2. After submitting an app in QueueB, there are some demand  and invoking 
> preemption in QueueA
> 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that 
> all containers other than AM is getting killed in QueueA
> 4. Now the app in QueueB is trying to take over cluster with the current free 
> space. But there are some updated demand from the app in QueueA which lost 
> its containers earlier, and preemption is kicked in QueueB now.
> Scenario in step 3 and 4 continuously happening in loop. Thus none of the 
> apps are completing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes

2015-11-25 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027929#comment-15027929
 ] 

Allen Wittenauer commented on YARN-3862:


bq. do you know what's going on? Thanks.

Yes.  As was announced on common-dev@, yetus is now using the dockerfile that 
ships with Hadoop.  So, for at least the 2nd time I'm aware of, this branch is 
missing build fixes  This time it looks like 
0ca8df716a1bb8e7f894914fb0d740a1d14df8e3 . FWIW, this is going to be happening 
a lot.  It'd might make your lives easier to keep track of changes in files 
that are directly build related...

> Support for fetching specific configs and metrics based on prefixes
> ---
>
> Key: YARN-3862
> URL: https://issues.apache.org/jira/browse/YARN-3862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3862-YARN-2928.wip.01.patch, 
> YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, 
> YARN-3862-feature-YARN-2928.wip.03.patch
>
>
> Currently, we will retrieve all the contents of the field if that field is 
> specified in the query API. In case of configs and metrics, this can become a 
> lot of data even though the user doesn't need it. So we need to provide a way 
> to query only a set of configs or metrics.
> As a comma spearated list of configs/metrics to be returned will be quite 
> cumbersome to specify, we have to support either of the following options :
> # Prefix match
> # Regex
> # Group the configs/metrics and query that group.
> We also need a facility to specify a metric time window to return metrics in 
> a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-11-25 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027907#comment-15027907
 ] 

Naganarasimha G R commented on YARN-4304:
-

Thanks [~wangda] for sharing your thoughts,
bq. Instead of putting max am to a separated object, I would prefer to put them 
to existing resourceUsageByPartition instead of introducing a new object. Even 
though max-am-limit is not usage, but it describes upper bound of usage. 
Thoughts?
By {{resourceUsageByPartition}} you refer to {{ResourceUsageInfo}} present in 
the parent class {{CapacitySchedulerQueueInfo}}, if so yes and any way as these 
new classes have not gone into any version yet, so we can rename them as 
appropriately like {{resourceUsagesByPartition}} => 
{{resourceInfoByPartition}}, {{ResourceUsageInfo}} => {{ResourceInfo}} & 
{{PartitionResourceUsageInfo}} =>  {{PartitionResourceInfo}}. Thoughts ?


> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently

2015-11-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027243#comment-15027243
 ] 

Hudson commented on YARN-4380:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2664 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2664/])
YARN-4380. (ozawa: rev 0656d2dc83af6a48a8d8d0e37cdf1f813124f366)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java


> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently
> 
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt,
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-11-25 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027863#comment-15027863
 ] 

Naganarasimha G R commented on YARN-3946:
-

Hi [~wangda],
Test case failures are not related to the jira and locally its passing with 
patch modifications and for some test case failures  jira is also raised 
already.

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2015-11-25 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028104#comment-15028104
 ] 

Bikas Saha commented on YARN-4108:
--

These problems will be hard to solve without involving the scheduler in the 
decision cycle. The preemption policy can determine how much to preempt from a 
queue at a macro level. But the actual containers to preempt would be selected 
by the scheduler.
That is where using the global node picture will help. For a given container 
request, if we can scan its nodes (if any) and make  either an allocation or 
preemption decision. 
Else, if we are doing container allocation on node heartbeat, then just like 
delay scheduling logic, we can mark a node for preemption but not preempt it 
and associate that node with the container request for which preemption is 
needed (request.nodeToPreempt). And we can cycle through all nodes like this 
and change the request->node association when we find better nodes to preempt. 
After cycling through all nodes, if when we again reach a node such that it 
matches the request.nodeToPreempt then we can execute the decision of actually 
preempting the node. If there are no nodes that can satisfy the request (e.g. 
request wants node A but preemptedQueue has no containers on node A) then 
scheduler should be able to callback to the preemption module and notify it so 
that some other queue can be picked to preempt.

> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3862) Support for fetching specific configs and metrics based on prefixes

2015-11-25 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3862:
---
Attachment: YARN-3862-feature-YARN-2928.04.patch

> Support for fetching specific configs and metrics based on prefixes
> ---
>
> Key: YARN-3862
> URL: https://issues.apache.org/jira/browse/YARN-3862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3862-YARN-2928.wip.01.patch, 
> YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, 
> YARN-3862-feature-YARN-2928.wip.03.patch
>
>
> Currently, we will retrieve all the contents of the field if that field is 
> specified in the query API. In case of configs and metrics, this can become a 
> lot of data even though the user doesn't need it. So we need to provide a way 
> to query only a set of configs or metrics.
> As a comma spearated list of configs/metrics to be returned will be quite 
> cumbersome to specify, we have to support either of the following options :
> # Prefix match
> # Regex
> # Group the configs/metrics and query that group.
> We also need a facility to specify a metric time window to return metrics in 
> a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4392) ApplicationCreatedEvent event time resets after RM restart/failover

2015-11-25 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027335#comment-15027335
 ] 

Jonathan Eagles commented on YARN-4392:
---

[~xgong], jason and I will be out until monday and will take a look at it then.

> ApplicationCreatedEvent event time resets after RM restart/failover
> ---
>
> Key: YARN-4392
> URL: https://issues.apache.org/jira/browse/YARN-4392
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-4392-2015-11-24.patch, YARN-4392.1.patch
>
>
> {code}2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - 
> Finished time 1437453994768 is ahead of started time 1440308399674 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437454008244 is ahead of started time 1440308399676 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444305171 is ahead of started time 1440308399653 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444293115 is ahead of started time 1440308399647 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444379645 is ahead of started time 1440308399656 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444361234 is ahead of started time 1440308399655 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444342029 is ahead of started time 1440308399654 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444323447 is ahead of started time 1440308399654 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 143730006 is ahead of started time 1440308399660 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 143715698 is ahead of started time 1440308399659 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 143719060 is ahead of started time 1440308399658 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444393931 is ahead of started time 1440308399657
> {code} . 
> From ATS logs, we would see a large amount of 'stale alerts' messages 
> periodically



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API

2015-11-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027350#comment-15027350
 ] 

Hadoop QA commented on YARN-4292:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} Patch generated 8 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 18, now 26). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 48s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 10s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 155m 18s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
| JDK v1.7.0_85 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 

[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes

2015-11-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027365#comment-15027365
 ] 

Hadoop QA commented on YARN-3862:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 16m 41s 
{color} | {color:red} Docker failed to build yetus/hadoop:123b3db. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12774383/YARN-3862-feature-YARN-2928.04.patch
 |
| JIRA Issue | YARN-3862 |
| Powered by | Apache Yetus   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9799/console |


This message was automatically generated.



> Support for fetching specific configs and metrics based on prefixes
> ---
>
> Key: YARN-3862
> URL: https://issues.apache.org/jira/browse/YARN-3862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3862-YARN-2928.wip.01.patch, 
> YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, 
> YARN-3862-feature-YARN-2928.wip.03.patch
>
>
> Currently, we will retrieve all the contents of the field if that field is 
> specified in the query API. In case of configs and metrics, this can become a 
> lot of data even though the user doesn't need it. So we need to provide a way 
> to query only a set of configs or metrics.
> As a comma spearated list of configs/metrics to be returned will be quite 
> cumbersome to specify, we have to support either of the following options :
> # Prefix match
> # Regex
> # Group the configs/metrics and query that group.
> We also need a facility to specify a metric time window to return metrics in 
> a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4293) ResourceUtilization should be a part of yarn node CLI

2015-11-25 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4293:
--
Attachment: 0001-YARN-4293.patch

Attaching an initial version.

- This became little trickier as {{ResourceUtilization}} was written in 
{{org.apache.hadoop.yarn.server.api.records}} where as we needed it in 
{{org.apache.hadoop.yarn.api.records}}. Since we need to pull this info to 
client as node report. So i moved these classes to *yarn.api*, hence all source 
files which used {{ResourceUtilization}} needed a change in import.
- This information is added to "node -status" command only.

[~leftnoteasy] could you please help to check this and let me know whether the 
approach is fine or not. Thank You.

> ResourceUtilization should be a part of yarn node CLI
> -
>
> Key: YARN-4293
> URL: https://issues.apache.org/jira/browse/YARN-4293
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0001-YARN-4293.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently

2015-11-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027275#comment-15027275
 ] 

Hudson commented on YARN-4380:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #733 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/733/])
YARN-4380. (ozawa: rev 0656d2dc83af6a48a8d8d0e37cdf1f813124f366)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* hadoop-yarn-project/CHANGES.txt


> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently
> 
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt,
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4297) TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 branch

2015-11-25 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027424#comment-15027424
 ] 

Sangjin Lee commented on YARN-4297:
---

+1. Committing shortly.

> TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 
> branch
> ---
>
> Key: YARN-4297
> URL: https://issues.apache.org/jira/browse/YARN-4297
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4297-YARN-2928.01.patch, 
> YARN-4297-feature-YARN-2928.02.patch, YARN-4297-feature-YARN-2928.03.patch
>
>
> {noformat}
> Tests run: 13, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 16.09 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 0.11 sec  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.mapreduce.v2.app.AppContext$$EnhancerByMockitoWithCGLIB$$95d3ddbe
>  cannot be cast to 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$RunningAppContext
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceInit(JobHistoryEventHandler.java:271)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:495)
> {noformat}
> {noformat}
> testRMContainerAllocatorResendsRequestsOnRMRestart(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
>   Time elapsed: 2.649 sec  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.mapreduce.v2.app.AppContext$$EnhancerByMockitoWithCGLIB$$8e08559a
>  cannot be cast to 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$RunningAppContext
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:802)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:269)
> Tests in error: 
>   TestRMContainerAllocator.testExcessReduceContainerAssign:669 » ClassCast 
> org.a...
>   TestRMContainerAllocator.testReportedAppProgress:970 » NullPointer
>   TestRMContainerAllocator.testBlackListedNodesWithSchedulingToThatNode:1578 
> » ClassCast
>   TestRMContainerAllocator.testBlackListedNodes:1292 » ClassCast 
> org.apache.hado...
>   TestRMContainerAllocator.testAMRMTokenUpdate:2691 » ClassCast 
> org.apache.hadoo...
>   TestRMContainerAllocator.testMapReduceAllocationWithNodeLabelExpression:722 
> » ClassCast
>   TestRMContainerAllocator.testReducerRampdownDiagnostics:443 » ClassCast 
> org.ap...
>   TestRMContainerAllocator.testReportedAppProgressWithOnlyMaps:1118 » 
> NullPointer
>   TestRMContainerAllocator.testMapReduceScheduling:819 » ClassCast 
> org.apache.ha...
>   TestRMContainerAllocator.testResource:390 » ClassCast 
> org.apache.hadoop.mapred...
>   TestRMContainerAllocator.testUpdatedNodes:1190 » ClassCast 
> org.apache.hadoop.m...
>   TestRMContainerAllocator.testCompletedTasksRecalculateSchedule:2249 » 
> ClassCast
>   TestRMContainerAllocator.testConcurrentTaskLimits:2779 » ClassCast 
> org.apache
>   TestRMContainerAllocator.testSimple:219 » ClassCast 
> org.apache.hadoop.mapreduc...
>   
> TestRMContainerAllocator.testIgnoreBlacklisting:1378->getContainerOnHost:1511 
> » ClassCast
>   TestRMContainerAllocator.testMapNodeLocality:310 » ClassCast 
> org.apache.hadoop...
>   
> TestRMContainerAllocator.testRMContainerAllocatorResendsRequestsOnRMRestart:2489
>  » ClassCast
> Tests run: 26, Failures: 0, Errors: 17, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2015-11-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027316#comment-15027316
 ] 

Varun Saxena commented on YARN-4238:


[~sjlee0], kindly review.
Mapreduce related test failures will be fixed by YARN-4297.
Checkstyle issue is due to indentation inside switch. This can only be fixed by 
changing earlier lines inside switch case. Do you want me to fix that ?

RM related test failures are not related. They  have been carried over from 
trunk. Have corresponding JIRAs' for them.
asf license warnings will be fixed once we merge MAPREDUCE-6557 from trunk. hag

The 2 javac issues are not related to code change either

> createdTime and modifiedTime is not reported while publishing entities to 
> ATSv2
> ---
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4238-YARN-2928.01.patch, 
> YARN-4238-feature-YARN-2928.02.patch
>
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4395) Typo in comment in ClientServiceDelegate

2015-11-25 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-4395:
--

 Summary: Typo in comment in ClientServiceDelegate
 Key: YARN-4395
 URL: https://issues.apache.org/jira/browse/YARN-4395
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Trivial


Line 337 in {{invoke()}} has the following comment:

{code}
// if it's AM shut down, do not decrement maxClientRetry as we wait for
// AM to be restarted.
{code}

Ideally it should be:

{code}
// If its AM shut down, do not decrement maxClientRetry while we wait
// for its AM to be restarted.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-11-25 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027384#comment-15027384
 ] 

Junping Du commented on YARN-3226:
--

The patch LGTM in overall. Some NITs:
{code}
+default :
+  LOG.debug("Unexpcted inital state");
...
+default :
+  LOG.debug("Unexpcted final state");
{code}
We should have warn as log level because this is unexpected. Also, a typo here: 
"inital" => "initial".

{code}
 case DECOMMISSIONED:
-metrics.incrDecommisionedNMs();
+metrics.incrDecommisionedNMs();
   break;
{code}
May be indentation problems.

Also, I need someone to review UI changes. [~xgong], can you take a look at it? 
Thanks!

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4297) TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 branch

2015-11-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027296#comment-15027296
 ] 

Varun Saxena commented on YARN-4297:


[~sjlee0], kindly review.

> TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 
> branch
> ---
>
> Key: YARN-4297
> URL: https://issues.apache.org/jira/browse/YARN-4297
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4297-YARN-2928.01.patch, 
> YARN-4297-feature-YARN-2928.02.patch, YARN-4297-feature-YARN-2928.03.patch
>
>
> {noformat}
> Tests run: 13, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 16.09 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 0.11 sec  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.mapreduce.v2.app.AppContext$$EnhancerByMockitoWithCGLIB$$95d3ddbe
>  cannot be cast to 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$RunningAppContext
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceInit(JobHistoryEventHandler.java:271)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:495)
> {noformat}
> {noformat}
> testRMContainerAllocatorResendsRequestsOnRMRestart(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
>   Time elapsed: 2.649 sec  <<< ERROR!
> java.lang.ClassCastException: 
> org.apache.hadoop.mapreduce.v2.app.AppContext$$EnhancerByMockitoWithCGLIB$$8e08559a
>  cannot be cast to 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$RunningAppContext
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:802)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:269)
> Tests in error: 
>   TestRMContainerAllocator.testExcessReduceContainerAssign:669 » ClassCast 
> org.a...
>   TestRMContainerAllocator.testReportedAppProgress:970 » NullPointer
>   TestRMContainerAllocator.testBlackListedNodesWithSchedulingToThatNode:1578 
> » ClassCast
>   TestRMContainerAllocator.testBlackListedNodes:1292 » ClassCast 
> org.apache.hado...
>   TestRMContainerAllocator.testAMRMTokenUpdate:2691 » ClassCast 
> org.apache.hadoo...
>   TestRMContainerAllocator.testMapReduceAllocationWithNodeLabelExpression:722 
> » ClassCast
>   TestRMContainerAllocator.testReducerRampdownDiagnostics:443 » ClassCast 
> org.ap...
>   TestRMContainerAllocator.testReportedAppProgressWithOnlyMaps:1118 » 
> NullPointer
>   TestRMContainerAllocator.testMapReduceScheduling:819 » ClassCast 
> org.apache.ha...
>   TestRMContainerAllocator.testResource:390 » ClassCast 
> org.apache.hadoop.mapred...
>   TestRMContainerAllocator.testUpdatedNodes:1190 » ClassCast 
> org.apache.hadoop.m...
>   TestRMContainerAllocator.testCompletedTasksRecalculateSchedule:2249 » 
> ClassCast
>   TestRMContainerAllocator.testConcurrentTaskLimits:2779 » ClassCast 
> org.apache
>   TestRMContainerAllocator.testSimple:219 » ClassCast 
> org.apache.hadoop.mapreduc...
>   
> TestRMContainerAllocator.testIgnoreBlacklisting:1378->getContainerOnHost:1511 
> » ClassCast
>   TestRMContainerAllocator.testMapNodeLocality:310 » ClassCast 
> org.apache.hadoop...
>   
> TestRMContainerAllocator.testRMContainerAllocatorResendsRequestsOnRMRestart:2489
>  » ClassCast
> Tests run: 26, Failures: 0, Errors: 17, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes

2015-11-25 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027348#comment-15027348
 ] 

Sangjin Lee commented on YARN-3862:
---

Yes, I think that would be better. Thanks!

> Support for fetching specific configs and metrics based on prefixes
> ---
>
> Key: YARN-3862
> URL: https://issues.apache.org/jira/browse/YARN-3862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3862-YARN-2928.wip.01.patch, 
> YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, 
> YARN-3862-feature-YARN-2928.wip.03.patch
>
>
> Currently, we will retrieve all the contents of the field if that field is 
> specified in the query API. In case of configs and metrics, this can become a 
> lot of data even though the user doesn't need it. So we need to provide a way 
> to query only a set of configs or metrics.
> As a comma spearated list of configs/metrics to be returned will be quite 
> cumbersome to specify, we have to support either of the following options :
> # Prefix match
> # Regex
> # Group the configs/metrics and query that group.
> We also need a facility to specify a metric time window to return metrics in 
> a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4394) ClientServiceDelegate doesn't handle retries during AM restart as intended

2015-11-25 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-4394:
--

 Summary: ClientServiceDelegate doesn't handle retries during AM 
restart as intended
 Key: YARN-4394
 URL: https://issues.apache.org/jira/browse/YARN-4394
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Daniel Templeton
Assignee: Daniel Templeton


In the {{invoke()}} method, I found the following code:

{code}
  private AtomicBoolean usingAMProxy = new AtomicBoolean(false);
...
// if it's AM shut down, do not decrement maxClientRetry as we wait for
// AM to be restarted.
if (!usingAMProxy.get()) {
  maxClientRetry--;
}
usingAMProxy.set(false);
{code}

When we create the AM proxy, we set the flag to true.  If we fail to connect, 
the impact of the flag being true is that the code will try one extra time, 
giving it 400ms instead of just 300ms.  I can't imagine that's the intended 
behavior.  After any failure, the flag will forever more be false, but 
fortunately (?!?) the flag is otherwise unused.

Looks like I need to do some archeology to figure out how we ended up here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4390) Consider container request size during CS preemption

2015-11-25 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027380#comment-15027380
 ] 

Carlo Curino commented on YARN-4390:


[~bikassaha] if the containers to be preempted (remember we respect the 
priority in the queue, and priority of containers) are in one box, the 8GB 
freed should be bundled and given to the AM, however the preemptionpolicy does 
not at the moment try to free resources to satisfy an exact request. 

This is an important philosophical point: 
I am quite convinced that preemption should be used to fix "large imbalances" 
in fairness/capacity between queues/users (hence the dead-zone in which we do 
not trigger preemption even if we are off balance), and not to micro-manage 
allocations. Keep in mind that preemption will take a while to kick in (by 
design), as it allows the application to respond to a preemption signal etc. As 
such in many cases the 8GB container request will be already otherwise 
satisfied before this preemption kicks in.  The current implementation follows 
this philosophy and only looks at the overall demand of resources, not at 
exactly which pending requests exists. 

I think this is correct and sufficient for large clusters with batch mostly 
workloads (like the one we were focusing on when we started preemption a few 
years back) since cluster conditions mutate too quickly for us to try to chase 
and micromanage allocations with preemption... In very small clusters, or in 
cluster running several long-running services, things might be different, as we 
can have potentially small, but very persistent imbalances which we might want 
to address with more surgical preemption actions.

my 2 cents.. 


> Consider container request size during CS preemption
> 
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4318) Test failure: TestAMAuthorization

2015-11-25 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027332#comment-15027332
 ] 

Kuhu Shukla commented on YARN-4318:
---

Although locally the tests pass, I see that the failure is coming from server 
not being able resolve the server hostname/address. I checked a few more 
pre-commits with this issue and somehow instead of 'localhost/127.0.0.1' the 
hostnames are 48 bit hex values .
{code}
public Connection(ConnectionId remoteId, int serviceClass) throws IOException {
  this.remoteId = remoteId;
  this.server = remoteId.getAddress();
  if (server.isUnresolved()) {
throw NetUtils.wrapException(server.getHostName(),
server.getPort(),
null,
0,
new UnknownHostException());
  }
{code}
{code}
testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
  Time elapsed: 2.784 sec  <<< ERROR!
java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
destination host is: "52b8ea35fca2":8030; java.net.UnknownHostException; For 
more details see:  http://wiki.apache.org/hadoop/UnknownHost
at org.apache.hadoop.ipc.Client$Connection.(Client.java:413)
{code}

These are not constant with respect to the machines that ran the precommit, 
that is, H5 host had two such runs with different 48 bit hex values.
The {{serviceaddr}} is using default config value, which means instead of an 
IP, the 0.0.0.0 in the default-config is picking a hex value from the 
environment of the machine/VM.
Could this be related to our latest Docker/Yetus migration? Asking 
[~ste...@apache.org] if he has any inputs on this. Appreciate it.

> Test failure: TestAMAuthorization
> -
>
> Key: YARN-4318
> URL: https://issues.apache.org/jira/browse/YARN-4318
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Kuhu Shukla
>
> {quote}
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.208 sec  <<< ERROR!
> java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
> destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For 
> more details see:  http://wiki.apache.org/hadoop/UnknownHost
>   at org.apache.hadoop.ipc.Client$Connection.(Client.java:403)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1400)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes

2015-11-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027378#comment-15027378
 ] 

Varun Saxena commented on YARN-3862:


I am not sure why the build failed.
Can it be submitted again ?

> Support for fetching specific configs and metrics based on prefixes
> ---
>
> Key: YARN-3862
> URL: https://issues.apache.org/jira/browse/YARN-3862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3862-YARN-2928.wip.01.patch, 
> YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, 
> YARN-3862-feature-YARN-2928.wip.03.patch
>
>
> Currently, we will retrieve all the contents of the field if that field is 
> specified in the query API. In case of configs and metrics, this can become a 
> lot of data even though the user doesn't need it. So we need to provide a way 
> to query only a set of configs or metrics.
> As a comma spearated list of configs/metrics to be returned will be quite 
> cumbersome to specify, we have to support either of the following options :
> # Prefix match
> # Regex
> # Group the configs/metrics and query that group.
> We also need a facility to specify a metric time window to return metrics in 
> a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4318) Test failure: TestAMAuthorization

2015-11-25 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027544#comment-15027544
 ] 

Steve Loughran commented on YARN-4318:
--

where's this happening? If jenkins, put jenkins in the env (and version too, 
please, + component==test). 

> Test failure: TestAMAuthorization
> -
>
> Key: YARN-4318
> URL: https://issues.apache.org/jira/browse/YARN-4318
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Kuhu Shukla
>
> {quote}
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.208 sec  <<< ERROR!
> java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
> destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For 
> more details see:  http://wiki.apache.org/hadoop/UnknownHost
>   at org.apache.hadoop.ipc.Client$Connection.(Client.java:403)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1400)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently

2015-11-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027668#comment-15027668
 ] 

Hudson commented on YARN-4380:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2580 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2580/])
YARN-4380. (ozawa: rev 0656d2dc83af6a48a8d8d0e37cdf1f813124f366)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* hadoop-yarn-project/CHANGES.txt


> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently
> 
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt,
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4318) Test failure: TestAMAuthorization

2015-11-25 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027565#comment-15027565
 ] 

Kuhu Shukla commented on YARN-4318:
---

Thank you Steve. I have updated the fields. Yes, it is seen on jenkins and not 
locally.

> Test failure: TestAMAuthorization
> -
>
> Key: YARN-4318
> URL: https://issues.apache.org/jira/browse/YARN-4318
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
> Environment: jenkins
>Reporter: Tsuyoshi Ozawa
>Assignee: Kuhu Shukla
>
> {quote}
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.208 sec  <<< ERROR!
> java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
> destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For 
> more details see:  http://wiki.apache.org/hadoop/UnknownHost
>   at org.apache.hadoop.ipc.Client$Connection.(Client.java:403)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1400)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes

2015-11-25 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027596#comment-15027596
 ] 

Sangjin Lee commented on YARN-3862:
---

Failed for the same reason:

{noformat}
Step 16 : ADD hadoop_env_checks.sh /root/hadoop_env_checks.sh
hadoop_env_checks.sh: no such file or directory

Total Elapsed time:  10m  4s

ERROR: Docker failed to build image.
{noformat}

[~aw], do you know what's going on? Thanks.

> Support for fetching specific configs and metrics based on prefixes
> ---
>
> Key: YARN-3862
> URL: https://issues.apache.org/jira/browse/YARN-3862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3862-YARN-2928.wip.01.patch, 
> YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, 
> YARN-3862-feature-YARN-2928.wip.03.patch
>
>
> Currently, we will retrieve all the contents of the field if that field is 
> specified in the query API. In case of configs and metrics, this can become a 
> lot of data even though the user doesn't need it. So we need to provide a way 
> to query only a set of configs or metrics.
> As a comma spearated list of configs/metrics to be returned will be quite 
> cumbersome to specify, we have to support either of the following options :
> # Prefix match
> # Regex
> # Group the configs/metrics and query that group.
> We also need a facility to specify a metric time window to return metrics in 
> a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently

2015-11-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027640#comment-15027640
 ] 

Hudson commented on YARN-4380:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #723 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/723/])
YARN-4380. (ozawa: rev 0656d2dc83af6a48a8d8d0e37cdf1f813124f366)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java


> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently
> 
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt,
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes

2015-11-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027590#comment-15027590
 ] 

Hadoop QA commented on YARN-3862:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 10m 4s 
{color} | {color:red} Docker failed to build yetus/hadoop:123b3db. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12774383/YARN-3862-feature-YARN-2928.04.patch
 |
| JIRA Issue | YARN-3862 |
| Powered by | Apache Yetus   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9801/console |


This message was automatically generated.



> Support for fetching specific configs and metrics based on prefixes
> ---
>
> Key: YARN-3862
> URL: https://issues.apache.org/jira/browse/YARN-3862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3862-YARN-2928.wip.01.patch, 
> YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, 
> YARN-3862-feature-YARN-2928.wip.03.patch
>
>
> Currently, we will retrieve all the contents of the field if that field is 
> specified in the query API. In case of configs and metrics, this can become a 
> lot of data even though the user doesn't need it. So we need to provide a way 
> to query only a set of configs or metrics.
> As a comma spearated list of configs/metrics to be returned will be quite 
> cumbersome to specify, we have to support either of the following options :
> # Prefix match
> # Regex
> # Group the configs/metrics and query that group.
> We also need a facility to specify a metric time window to return metrics in 
> a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently

2015-11-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027623#comment-15027623
 ] 

Hudson commented on YARN-4380:
--

ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #643 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/643/])
YARN-4380. (ozawa: rev 0656d2dc83af6a48a8d8d0e37cdf1f813124f366)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java


> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently
> 
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt,
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4184) Remove update reservation state api from state store as its not used by ReservationSystem

2015-11-25 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4184:
-
Parent Issue: YARN-2573  (was: YARN-2572)

> Remove update reservation state api from state store as its not used by 
> ReservationSystem
> -
>
> Key: YARN-4184
> URL: https://issues.apache.org/jira/browse/YARN-4184
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Sean Po
> Fix For: 2.8.0
>
> Attachments: YARN-4184.v1.patch
>
>
> ReservationSystem uses remove/add for updates and thus update api in state 
> store is not needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4318) Test failure: TestAMAuthorization

2015-11-25 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-4318:
--
Affects Version/s: 3.0.0
  Environment: jenkins
  Component/s: test

> Test failure: TestAMAuthorization
> -
>
> Key: YARN-4318
> URL: https://issues.apache.org/jira/browse/YARN-4318
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
> Environment: jenkins
>Reporter: Tsuyoshi Ozawa
>Assignee: Kuhu Shukla
>
> {quote}
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.208 sec  <<< ERROR!
> java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
> destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For 
> more details see:  http://wiki.apache.org/hadoop/UnknownHost
>   at org.apache.hadoop.ipc.Client$Connection.(Client.java:403)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1400)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently

2015-11-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027339#comment-15027339
 ] 

Hudson commented on YARN-4380:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1454 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1454/])
YARN-4380. (ozawa: rev 0656d2dc83af6a48a8d8d0e37cdf1f813124f366)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java


> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently
> 
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt,
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.

2015-11-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026873#comment-15026873
 ] 

Hadoop QA commented on YARN-3224:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 33s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 30s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 30s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 33s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 33s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 185, now 188). {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 34s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 34s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 31s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 32s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m 14s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12761436/0002-YARN-3224.patch |
| JIRA Issue | YARN-3224 |
| Optional Tests |  asflicense  compile  javac  javadoc  

[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes

2015-11-25 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026890#comment-15026890
 ] 

Kuhu Shukla commented on YARN-4386:
---

Thank you [~djp] for your comments. 
TestClientRMTokens is failing regardless of the patch, tracked through 
YARN-4306. 
TestAMAuthorization fails the same way and I believe is tracked through 
YARN-4318.
No tests were attached since the final outcome of this patch remains unchanged 
and a decommissioned node stays in that state regardless.
[~sunilg], [~djp], request for review. Thanks a lot.

> refreshNodesGracefully() looks at active RMNode list for recommissioning 
> decommissioned nodes
> -
>
> Key: YARN-4386
> URL: https://issues.apache.org/jira/browse/YARN-4386
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: graceful
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-4386-v1.patch
>
>
> In refreshNodesGracefully(), during recommissioning, the entryset from 
> getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is 
> used for checking 'decommissioned' nodes which are present in 
> getInactiveRMNodes() map alone. 
> {code}
> for (Entry entry:rmContext.getRMNodes().entrySet()) { 
> .
>  // Recommissioning the nodes
> if (entry.getValue().getState() == NodeState.DECOMMISSIONING
> || entry.getValue().getState() == NodeState.DECOMMISSIONED) {
>   this.rmContext.getDispatcher().getEventHandler()
>   .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4318) Test failure: TestAMAuthorization

2015-11-25 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026895#comment-15026895
 ] 

Kuhu Shukla commented on YARN-4318:
---

[~ozawa], are you looking at this test failure? I can work on it if this is 
unassigned. Thanks a lot.

> Test failure: TestAMAuthorization
> -
>
> Key: YARN-4318
> URL: https://issues.apache.org/jira/browse/YARN-4318
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>
> {quote}
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.208 sec  <<< ERROR!
> java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
> destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For 
> more details see:  http://wiki.apache.org/hadoop/UnknownHost
>   at org.apache.hadoop.ipc.Client$Connection.(Client.java:403)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1400)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes

2015-11-25 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026913#comment-15026913
 ] 

Junping Du commented on YARN-4386:
--

Thanks [~kshukla] for the patch. I agree these test failures are not related. 
However, can we add a test to verify no InvalidState get throw after the patch 
if recommission a decommissioned node when calling refreshNodesGracefully()? 
That test should get failed without applying your code here.

> refreshNodesGracefully() looks at active RMNode list for recommissioning 
> decommissioned nodes
> -
>
> Key: YARN-4386
> URL: https://issues.apache.org/jira/browse/YARN-4386
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: graceful
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-4386-v1.patch
>
>
> In refreshNodesGracefully(), during recommissioning, the entryset from 
> getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is 
> used for checking 'decommissioned' nodes which are present in 
> getInactiveRMNodes() map alone. 
> {code}
> for (Entry entry:rmContext.getRMNodes().entrySet()) { 
> .
>  // Recommissioning the nodes
> if (entry.getValue().getState() == NodeState.DECOMMISSIONING
> || entry.getValue().getState() == NodeState.DECOMMISSIONED) {
>   this.rmContext.getDispatcher().getEventHandler()
>   .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes

2015-11-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026481#comment-15026481
 ] 

Varun Saxena commented on YARN-3862:


[~sjlee0]
Looking at the code again, I am actually using getColumnPrefixBytes in 
FlowRunEntityReader and passing it to TimelineFilterUtils. However in 
ApplicationEntityReader and GenericEntityReader I am assuming columnprefix is 
null so haven't used it, which I agree is wrong.

However I have exposed a method as under. Here colPrefix is meant to take 
column prefix bytes as argument.
{{public static FilterList createHBaseFilterList(byte[] colPrefix, 
TimelineFilterList filterList)}}
I think this should be enough. However user can then pass any sequence of bytes 
as prefix. But it wont be as if current code will break due to change in the 
way column prefixes are encoded if the caller of this code is correct.

Another alternative would be to pass ColumnPrefix object to TimelineFilterUtils 
and call getColumnPrefixBytes from there.
Thoughts ?

> Support for fetching specific configs and metrics based on prefixes
> ---
>
> Key: YARN-3862
> URL: https://issues.apache.org/jira/browse/YARN-3862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3862-YARN-2928.wip.01.patch, 
> YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.wip.03.patch
>
>
> Currently, we will retrieve all the contents of the field if that field is 
> specified in the query API. In case of configs and metrics, this can become a 
> lot of data even though the user doesn't need it. So we need to provide a way 
> to query only a set of configs or metrics.
> As a comma spearated list of configs/metrics to be returned will be quite 
> cumbersome to specify, we have to support either of the following options :
> # Prefix match
> # Regex
> # Group the configs/metrics and query that group.
> We also need a facility to specify a metric time window to return metrics in 
> a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2882) Introducing container types

2015-11-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026488#comment-15026488
 ] 

Hadoop QA commented on YARN-2882:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
6s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s 
{color} | {color:green} yarn-2877 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s 
{color} | {color:green} yarn-2877 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
30s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s 
{color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
10s {color} | {color:green} yarn-2877 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 14s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
yarn-2877 has 3 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s 
{color} | {color:green} yarn-2877 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 22s 
{color} | {color:green} yarn-2877 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 11m 38s 
{color} | {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_66 with JDK 
v1.8.0_66 generated 1 new issues (was 14, now 14). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 13m 45s 
{color} | {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.7.0_85 with JDK 
v1.7.0_85 generated 1 new issues (was 15, now 15). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s 
{color} | {color:red} Patch generated 17 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 178, now 193). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
4s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 17 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 2s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 38s 
{color} | {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 21s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit 

[jira] [Updated] (YARN-3226) UI changes for decommissioning node

2015-11-25 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3226:
--
Attachment: ClusterMetricsOnNodes_UI.png

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3226) UI changes for decommissioning node

2015-11-25 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3226:
--
Attachment: (was: Decommissioning_MetricsPge.png)

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently

2015-11-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026391#comment-15026391
 ] 

Hadoop QA commented on YARN-4393:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
59s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 29s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 2s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 7s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12774282/YARN-4393.01.patch |
| JIRA Issue | YARN-4393 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux e1d35d909a40 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / b4c6b51 |
| findbugs | v3.0.0 |
| JDK v1.7.0_85  Test Results | 

[jira] [Updated] (YARN-3226) UI changes for decommissioning node

2015-11-25 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3226:
--
Attachment: 0002-YARN-3226.patch

Thank you [~djp] for the comments. 
Updating a new patch and also attached new UI with "Cluster Metrics on Nodes". 
Kindly help to check the same.

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes

2015-11-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026505#comment-15026505
 ] 

Varun Saxena commented on YARN-3862:


I think passing columnprefix would be better as we can then encode spaces as 
well.

> Support for fetching specific configs and metrics based on prefixes
> ---
>
> Key: YARN-3862
> URL: https://issues.apache.org/jira/browse/YARN-3862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3862-YARN-2928.wip.01.patch, 
> YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.wip.03.patch
>
>
> Currently, we will retrieve all the contents of the field if that field is 
> specified in the query API. In case of configs and metrics, this can become a 
> lot of data even though the user doesn't need it. So we need to provide a way 
> to query only a set of configs or metrics.
> As a comma spearated list of configs/metrics to be returned will be quite 
> cumbersome to specify, we have to support either of the following options :
> # Prefix match
> # Regex
> # Group the configs/metrics and query that group.
> We also need a facility to specify a metric time window to return metrics in 
> a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-11-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026402#comment-15026402
 ] 

Hadoop QA commented on YARN-4304:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
47s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 19s 
{color} | {color:red} Patch generated 5 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 224, now 225). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 4s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 1s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
37s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 176m 28s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestSchedulingPolicy |
| JDK v1.7.0_85 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || 

[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-11-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026681#comment-15026681
 ] 

Hadoop QA commented on YARN-3226:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
52s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s 
{color} | {color:red} Patch generated 24 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 104, now 127). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 12s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 43s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 136m 7s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.webapp.TestNodesPage |
| JDK v1.7.0_85 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices |
|   | 

[jira] [Updated] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster

2015-11-25 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4389:
--
Attachment: 0001-YARN-4389.patch

Hi [~djp]
Attaching an initial version of patch. Kindly help to check the same.
once this ticket is reviewed, I will raise a mapreduce ticket to pass this 
information from RM side.

> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be app specific 
> rather than a setting for whole YARN cluster
> ---
>
> Key: YARN-4389
> URL: https://issues.apache.org/jira/browse/YARN-4389
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Reporter: Junping Du
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4389.patch
>
>
> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be application 
> specific rather than a setting in cluster level, or we should't maintain 
> amBlacklistingEnabled and blacklistDisableThreshold in per rmApp level. We 
> should allow each am to override this config, i.e. via submissionContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4384) updateNodeResource CLI should not accept negative values for resource

2015-11-25 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026736#comment-15026736
 ] 

Junping Du commented on YARN-4384:
--

Thanks [~leftnoteasy] for review and commit!

> updateNodeResource CLI should not accept negative values for resource
> -
>
> Key: YARN-4384
> URL: https://issues.apache.org/jira/browse/YARN-4384
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sushmitha Sreenivasan
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: YARN-4384.patch
>
>
> updateNodeResource CLI should not accept negative values for MemSize and 
> vCores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4371) "yarn application -kill" should take multiple application ids

2015-11-25 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4371:
--
Attachment: 0002-YARN-4371.patch

Hi [~ozawa]
Attaching an updated patch as per latest comment. As we call kill application 
one by one, we need to wait till each one is killed. I think this is fine. 
Kindly help to check the same.

> "yarn application -kill" should take multiple application ids
> -
>
> Key: YARN-4371
> URL: https://issues.apache.org/jira/browse/YARN-4371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Sunil G
> Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch
>
>
> Currently we cannot pass multiple applications to "yarn application -kill" 
> command. The command should take multiple application ids at the same time. 
> Each entries should be separated with whitespace like:
> {code}
> yarn application -kill application_1234_0001 application_1234_0007 
> application_1234_0012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes

2015-11-25 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026805#comment-15026805
 ] 

Junping Du commented on YARN-4386:
--

Thanks [~kshukla] to report this issue and [~sunilg] for review!
I think Recommission event shouldn't be applied on decommissioned nodes as it 
won't have any affect and we'd better to keep consistent with previous behavior 
before graceful decommission comes out.
Thus, I would prefer to change "if (entry.getValue().getState() == 
NodeState.DECOMMISSIONING || entry.getValue().getState() == 
NodeState.DECOMMISSIONED)" to "if (entry.getValue().getState() == 
NodeState.DECOMMISSIONING)" to get rid of InvalidState exception in state 
machine.

> refreshNodesGracefully() looks at active RMNode list for recommissioning 
> decommissioned nodes
> -
>
> Key: YARN-4386
> URL: https://issues.apache.org/jira/browse/YARN-4386
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-4386-v1.patch
>
>
> In refreshNodesGracefully(), during recommissioning, the entryset from 
> getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is 
> used for checking 'decommissioned' nodes which are present in 
> getInactiveRMNodes() map alone. 
> {code}
> for (Entry entry:rmContext.getRMNodes().entrySet()) { 
> .
>  // Recommissioning the nodes
> if (entry.getValue().getState() == NodeState.DECOMMISSIONING
> || entry.getValue().getState() == NodeState.DECOMMISSIONED) {
>   this.rmContext.getDispatcher().getEventHandler()
>   .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3226) UI changes for decommissioning node

2015-11-25 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3226:
--
Attachment: 0003-YARN-3226.patch

Attaching a new patch to address test failures and relevant checkstyles 
warnings.

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-914) (Umbrella) Support graceful decommission of nodemanager

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-914:

Component/s: graceful

> (Umbrella) Support graceful decommission of nodemanager
> ---
>
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: graceful
>Affects Versions: 2.0.4-alpha
>Reporter: Luke Lu
>Assignee: Junping Du
> Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
> Gracefully Decommission of NodeManager (v2).pdf, 
> GracefullyDecommissionofNodeManagerv3.pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1506:
-
Component/s: graceful

> Replace set resource change on RMNode/SchedulerNode directly with event 
> notification.
> -
>
> Key: YARN-1506
> URL: https://issues.apache.org/jira/browse/YARN-1506
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, 
> YARN-1506-v11.patch, YARN-1506-v12.patch, YARN-1506-v13.patch, 
> YARN-1506-v14.patch, YARN-1506-v15.patch, YARN-1506-v16.patch, 
> YARN-1506-v17.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, 
> YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, 
> YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch
>
>
> According to Vinod's comments on YARN-312 
> (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
>  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-999:

Component/s: graceful

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be hired here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-998) Persistent resource change during NM/RM restart

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-998:

Component/s: graceful

> Persistent resource change during NM/RM restart
> ---
>
> Key: YARN-998
> URL: https://issues.apache.org/jira/browse/YARN-998
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-998-sample.patch
>
>
> When NM is restarted by plan or from a failure, previous dynamic resource 
> setting should be kept for consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-996) REST API support for node resource configuration

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-996:

Component/s: graceful

> REST API support for node resource configuration
> 
>
> Key: YARN-996
> URL: https://issues.apache.org/jira/browse/YARN-996
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Tsuyoshi Ozawa
> Attachments: YARN-996-2.patch, YARN-996-sample.patch
>
>
> Besides admin protocol and CLI, REST API should also be supported for node 
> resource configuration



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-313:

Component/s: graceful

> Add Admin API for supporting node resource configuration in command line
> 
>
> Key: YARN-313
> URL: https://issues.apache.org/jira/browse/YARN-313
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, graceful
>Reporter: Junping Du
>Assignee: Inigo Goiri
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-313-sample.patch, YARN-313-v1.patch, 
> YARN-313-v10.patch, YARN-313-v11.patch, YARN-313-v2.patch, YARN-313-v3.patch, 
> YARN-313-v4.patch, YARN-313-v5.patch, YARN-313-v6.patch, YARN-313-v7.patch, 
> YARN-313-v8.patch, YARN-313-v9.patch
>
>
> We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" 
> to support changes of node's resource specified in a config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-312) Add updateNodeResource in ResourceManagerAdministrationProtocol

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-312:

Component/s: graceful

> Add updateNodeResource in ResourceManagerAdministrationProtocol
> ---
>
> Key: YARN-312
> URL: https://issues.apache.org/jira/browse/YARN-312
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, graceful
>Affects Versions: 2.2.0
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.3.0
>
> Attachments: YARN-312-v1.patch, YARN-312-v10.patch, 
> YARN-312-v2.patch, YARN-312-v3.patch, YARN-312-v4.1.patch, YARN-312-v4.patch, 
> YARN-312-v5.1.patch, YARN-312-v5.patch, YARN-312-v6.patch, 
> YARN-312-v7.1.patch, YARN-312-v7.1.patch, YARN-312-v7.patch, 
> YARN-312-v8.patch, YARN-312-v9.patch
>
>
> Add fundamental RPC (ResourceManagerAdministrationProtocol) to support node's 
> resource change. For design detail, please refer parent JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1000) Dynamic resource configuration feature can be configured to enable or disable and persistent on setting or not

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1000:
-
Component/s: graceful

> Dynamic resource configuration feature can be configured to enable or disable 
> and persistent on setting or not
> --
>
> Key: YARN-1000
> URL: https://issues.apache.org/jira/browse/YARN-1000
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-1000-sample.patch
>
>
> There are some configurations for feature of dynamic resource configuration:
> 1. enable or not: if enable, then setting node resource in runtime through 
> CLI/REST/JMX can be successful, else exceptions of "function not supported" 
> will be thrown out. In future, we may support to enable this feature in 
> partial nodes which has resource flexibility (like virtual nodes).
> 2. dynamic resource setting is persistent or not: it depends on users' 
> scenario to see if the life cycle of setting in runtime should be kept after 
> NM is down and restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-997) JMX support for node resource configuration

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-997:

Component/s: graceful

> JMX support for node resource configuration
> ---
>
> Key: YARN-997
> URL: https://issues.apache.org/jira/browse/YARN-997
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>
> Beside YARN CLI and REST API, we can enable JMX interface to change node's 
> resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3224:
-
Component/s: graceful

> Notify AM with containers (on decommissioning node) could be preempted after 
> timeout.
> -
>
> Key: YARN-3224
> URL: https://issues.apache.org/jira/browse/YARN-3224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3224.patch, 0002-YARN-3224.patch
>
>
> We should leverage YARN preemption framework to notify AM that some 
> containers will be preempted after a timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1394) RM to inform AMs when a container completed due to NM going offline -planned or unplanned

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1394:
-
Component/s: graceful

> RM to inform AMs when a container completed due to NM going offline -planned 
> or unplanned
> -
>
> Key: YARN-1394
> URL: https://issues.apache.org/jira/browse/YARN-1394
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Steve Loughran
>Assignee: Rohith Sharma K S
>
> YARN-914 proposes graceful decommission of an NM, and NMs already have the 
> right to go offline.
> If AMs could be told that a container completed from an NM option -offline vs 
> decommission, the AM could use that in its future blacklisting and placement 
> policy. 
> This matters in long-lived services which may like to place new instances 
> where they were placed before, and track hosts failure rates



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3212:
-
Component/s: graceful

> RMNode State Transition Update with DECOMMISSIONING state
> -
>
> Key: YARN-3212
> URL: https://issues.apache.org/jira/browse/YARN-3212
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, 
> YARN-3212-v2.patch, YARN-3212-v3.patch, YARN-3212-v4.1.patch, 
> YARN-3212-v4.patch, YARN-3212-v5.1.patch, YARN-3212-v5.patch, 
> YARN-3212-v6.1.patch, YARN-3212-v6.2.patch, YARN-3212-v6.patch
>
>
> As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and 
> can transition from “running” state triggered by a new event - 
> “decommissioning”. 
> This new state can be transit to state of “decommissioned” when 
> Resource_Update if no running apps on this NM or NM reconnect after restart. 
> Or it received DECOMMISSIONED event (after timeout from CLI).
> In addition, it can back to “running” if user decides to cancel previous 
> decommission by calling recommission on the same node. The reaction to other 
> events is similar to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3225:
-
Component/s: graceful

> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> ---
>
> Key: YARN-3225
> URL: https://issues.apache.org/jira/browse/YARN-3225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Devaraj K
> Fix For: 2.8.0
>
> Attachments: YARN-3225-1.patch, YARN-3225-2.patch, YARN-3225-3.patch, 
> YARN-3225-4.patch, YARN-3225-5.patch, YARN-3225.patch, YARN-914.patch
>
>
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3223) Resource update during NM graceful decommission

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3223:
-
Component/s: graceful

> Resource update during NM graceful decommission
> ---
>
> Key: YARN-3223
> URL: https://issues.apache.org/jira/browse/YARN-3223
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Junping Du
>Assignee: Brook Zhou
> Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch, 
> YARN-3223-v2.patch
>
>
> During NM graceful decommission, we should handle resource update properly, 
> include: make RMNode keep track of old resource for possible rollback, keep 
> available resource to 0 and used resource get updated when
> container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-666) [Umbrella] Support rolling upgrades in YARN

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-666:

Component/s: rolling upgrade
 graceful

> [Umbrella] Support rolling upgrades in YARN
> ---
>
> Key: YARN-666
> URL: https://issues.apache.org/jira/browse/YARN-666
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: graceful, rolling upgrade
>Affects Versions: 2.0.4-alpha
>Reporter: Siddharth Seth
> Fix For: 2.6.0
>
> Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf
>
>
> Jira to track changes required in YARN to allow rolling upgrades, including 
> documentation and possible upgrade routes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1197) Support changing resources of an allocated container

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1197:
-
Component/s: graceful

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-556) [Umbrella] RM Restart phase 2 - Work preserving restart

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-556:

Component/s: rolling upgrade
 graceful

> [Umbrella] RM Restart phase 2 - Work preserving restart
> ---
>
> Key: YARN-556
> URL: https://issues.apache.org/jira/browse/YARN-556
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: graceful, resourcemanager, rolling upgrade
>Reporter: Bikas Saha
> Attachments: Work Preserving RM Restart.pdf, 
> WorkPreservingRestartPrototype.001.patch, YARN-1372.prelim.patch
>
>
> YARN-128 covered storing the state needed for the RM to recover critical 
> information. This umbrella jira will track changes needed to recover the 
> running state of the cluster so that work can be preserved across RM restarts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4386:
-
Component/s: graceful

> refreshNodesGracefully() looks at active RMNode list for recommissioning 
> decommissioned nodes
> -
>
> Key: YARN-4386
> URL: https://issues.apache.org/jira/browse/YARN-4386
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: graceful
>Affects Versions: 3.0.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-4386-v1.patch
>
>
> In refreshNodesGracefully(), during recommissioning, the entryset from 
> getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is 
> used for checking 'decommissioned' nodes which are present in 
> getInactiveRMNodes() map alone. 
> {code}
> for (Entry entry:rmContext.getRMNodes().entrySet()) { 
> .
>  // Recommissioning the nodes
> if (entry.getValue().getState() == NodeState.DECOMMISSIONING
> || entry.getValue().getState() == NodeState.DECOMMISSIONED) {
>   this.rmContext.getDispatcher().getEventHandler()
>   .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-291) [Umbrella] Dynamic resource configuration

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-291:

Component/s: graceful

> [Umbrella] Dynamic resource configuration
> -
>
> Key: YARN-291
> URL: https://issues.apache.org/jira/browse/YARN-291
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
>  Labels: features
> Attachments: Elastic Resources for YARN-v0.2.pdf, 
> YARN-291-AddClientRMProtocolToSetNodeResource-03.patch, 
> YARN-291-CoreAndAdmin.patch, YARN-291-JMXInterfaceOnNM-02.patch, 
> YARN-291-OnlyUpdateWhenResourceChange-01-fix.patch, 
> YARN-291-YARNClientCommandline-04.patch, YARN-291-all-v1.patch, 
> YARN-291-core-HeartBeatAndScheduler-01.patch
>
>
> The current Hadoop YARN resource management logic assumes per node resource 
> is static during the lifetime of the NM process. Allowing run-time 
> configuration on per node resource will give us finer granularity of resource 
> elasticity. This allows Hadoop workloads to coexist with other workloads on 
> the same hardware efficiently, whether or not the environment is virtualized. 
> More background and design details can be found in attached proposal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-311) Dynamic node resource configuration: core scheduler changes

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-311:

Component/s: graceful

> Dynamic node resource configuration: core scheduler changes
> ---
>
> Key: YARN-311
> URL: https://issues.apache.org/jira/browse/YARN-311
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, resourcemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.3.0
>
> Attachments: YARN-311-v1.patch, YARN-311-v10.patch, 
> YARN-311-v11.patch, YARN-311-v12.patch, YARN-311-v12b.patch, 
> YARN-311-v13.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, 
> YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, 
> YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, 
> YARN-311-v9.patch
>
>
> As the first step, we go for resource change on RM side and expose admin APIs 
> (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
> contain changes in scheduler. 
> The flow to update node's resource and awareness in resource scheduling is: 
> 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
> 2. When next NM heartbeat for updating status comes, the RMNode's resource 
> change will be aware and the delta resource is added to schedulerNode's 
> availableResource before actual scheduling happens.
> 3. Scheduler do resource allocation according to new availableResource in 
> SchedulerNode.
> For more design details, please refer proposal and discussions in parent 
> JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1949) Add admin ACL check to AdminService#updateNodeResource()

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1949:
-
Component/s: graceful

> Add admin ACL check to AdminService#updateNodeResource()
> 
>
> Key: YARN-1949
> URL: https://issues.apache.org/jira/browse/YARN-1949
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, resourcemanager
>Reporter: Kenji Kikushima
>Assignee: Kenji Kikushima
> Attachments: YARN-1949.patch
>
>
> At present, updateNodeResource() doesn't check ACL. We should call 
> checkAcls() before setResourceOption().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4160) Dynamic NM Resources Configuration file should be simplified.

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4160:
-
Component/s: graceful

> Dynamic NM Resources Configuration file should be simplified.
> -
>
> Key: YARN-4160
> URL: https://issues.apache.org/jira/browse/YARN-4160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
>
> In YARN-313, we provide CLI to refresh NMs' resources dynamically. The format 
> of dynamic-resources.xml is something like following:
> {noformat}
> 
>   
> yarn.resource.dynamic.node_id_1.vcores
> 16
>   
>   
> yarn.resource.dynamic.node_id_1.memory
> 1024
>   
> 
> {noformat}
> This looks too redundant from review comments of YARN-313. We should have a 
> better, concisely format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4345:
-
Component/s: graceful

> yarn rmadmin -updateNodeResource doesn't work
> -
>
> Key: YARN-4345
> URL: https://issues.apache.org/jira/browse/YARN-4345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sushmitha Sreenivasan
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-4345-v2.patch, YARN-4345-v3.patch, YARN-4345.patch
>
>
> YARN-313 add CLI to update node resource. It works fine for batch mode 
> update. However, for single node update "yarn rmadmin -updateNodeResource" 
> failed to work because resource is not set properly in sending request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4384) updateNodeResource CLI should not accept negative values for resource

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4384:
-
Component/s: graceful

> updateNodeResource CLI should not accept negative values for resource
> -
>
> Key: YARN-4384
> URL: https://issues.apache.org/jira/browse/YARN-4384
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sushmitha Sreenivasan
>Assignee: Junping Du
> Fix For: 2.8.0
>
> Attachments: YARN-4384.patch
>
>
> updateNodeResource CLI should not accept negative values for MemSize and 
> vCores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2489) ResouceOption's overcommitTimeout should be respected during resource update on NM

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2489:
-
Component/s: graceful

> ResouceOption's overcommitTimeout should be respected during resource update 
> on NM
> --
>
> Key: YARN-2489
> URL: https://issues.apache.org/jira/browse/YARN-2489
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
>
> The ResourceOption to update NM's resource has two properties: Resource and 
> OvercommitTimeout. The later property is used to guarantee resource is 
> withdrawn after timeout is hit if resource is reduced to a value and current 
> resource consumption exceeds the new value. It currently use default value -1 
> which means no timeout, and we should make this property work when updating 
> NM resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1508) Rename ResourceOption and document resource over-commitment cases

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1508:
-
Component/s: graceful

> Rename ResourceOption and document resource over-commitment cases
> -
>
> Key: YARN-1508
> URL: https://issues.apache.org/jira/browse/YARN-1508
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Minor
>
> Per Vinod's comment in 
> YARN-312(https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087)
>  and Bikas' comment in 
> YARN-311(https://issues.apache.org/jira/browse/YARN-311?focusedCommentId=13848615),
>  the name of ResourceOption is not good enough for being understood. Also, we 
> need to document more on resource overcommitment time and use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1336) [Umbrella] Work-preserving nodemanager restart

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1336:
-
Component/s: rolling upgrade
 graceful

> [Umbrella] Work-preserving nodemanager restart
> --
>
> Key: YARN-1336
> URL: https://issues.apache.org/jira/browse/YARN-1336
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: graceful, nodemanager, rolling upgrade
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: NMRestartDesignOverview.pdf, YARN-1336-rollup-v2.patch, 
> YARN-1336-rollup.patch
>
>
> This serves as an umbrella ticket for tasks related to work-preserving 
> nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3226) UI changes for decommissioning node

2015-11-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3226:
-
Component/s: graceful

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4377) Confusing logs when killing container process

2015-11-25 Thread Jaromir Vanek (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaromir Vanek updated YARN-4377:

Environment: Debian 7  (was: Debian linux)
   Priority: Minor  (was: Critical)
Description: 
Debug logs seem to be confusing when stating {{Sending signal to pid 20748 as 
user _submitter_}}.

Nodemanager actually sends signals as a user {{yarn}} when using 
{{DefaultContainerExecutor}}.

Complete nodemanager log:
{quote}
2015-11-20 15:38:22,063 DEBUG 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Got pid 20748 for container container_1443786884805_2298_01_03
2015-11-20 15:38:22,063 DEBUG 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Sending signal to pid 20748 as user _submitter_ for container 
container_1443786884805_2298_01_03
2015-11-20 15:38:22,063 DEBUG 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Sending 
signal 15 to pid 20748 as user _submitter_
2015-11-20 15:38:22,069 DEBUG 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Sent signal SIGTERM to pid 20748 as user _submitter_ for container 
container_1443786884805_2298_01_03, result=failed
2015-11-20 15:38:22,319 DEBUG 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Sending 
signal 9 to pid 20748 as user _submitter_
{quote}

{{SIGTERM}} and following {{SIGKILL}} signals seem to be sent with the 
*submitter* user permissions, but this is not true when container process is 
running under *yarn* user by default.

What is the purpose of having submitter user in logs?

  was:
It seems my processes in containers are not killed when the whole job is 
killed. Containers will hang in {{KILLING}} state until forever.

The root of this problem is that signals sent to the container process are sent 
with wrong user permissions. 

>From the nodemanager log:
{quote}
2015-11-20 15:38:22,063 DEBUG 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Got pid 20748 for container container_1443786884805_2298_01_03
2015-11-20 15:38:22,063 DEBUG 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Sending signal to pid 20748 as user _submitter_ for container 
container_1443786884805_2298_01_03
2015-11-20 15:38:22,063 DEBUG 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Sending 
signal 15 to pid 20748 as user _submitter_
2015-11-20 15:38:22,069 DEBUG 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Sent signal SIGTERM to pid 20748 as user _submitter_ for container 
container_1443786884805_2298_01_03, result=failed
2015-11-20 15:38:22,319 DEBUG 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Sending 
signal 9 to pid 20748 as user _submitter_
{quote}

{{SIGTERM}} and following {{SIGKILL}} signals are sent with the *submitter* 
user permissions, but the container process is running under *yarn* user by 
default (when using {{DefaultContainerExecutor}} which is true in my case). The 
result is that signals are ignored and container will run forever.

Am I doing something wrong or is it a bug?

Summary: Confusing logs when killing container process  (was: Container 
process not killed)

> Confusing logs when killing container process
> -
>
> Key: YARN-4377
> URL: https://issues.apache.org/jira/browse/YARN-4377
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
> Environment: Debian 7
>Reporter: Jaromir Vanek
>Priority: Minor
>
> Debug logs seem to be confusing when stating {{Sending signal to pid 20748 as 
> user _submitter_}}.
> Nodemanager actually sends signals as a user {{yarn}} when using 
> {{DefaultContainerExecutor}}.
> Complete nodemanager log:
> {quote}
> 2015-11-20 15:38:22,063 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Got pid 20748 for container container_1443786884805_2298_01_03
> 2015-11-20 15:38:22,063 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Sending signal to pid 20748 as user _submitter_ for container 
> container_1443786884805_2298_01_03
> 2015-11-20 15:38:22,063 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Sending 
> signal 15 to pid 20748 as user _submitter_
> 2015-11-20 15:38:22,069 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Sent signal SIGTERM to pid 20748 as user _submitter_ for container 
> container_1443786884805_2298_01_03, result=failed
> 2015-11-20 15:38:22,319 DEBUG 
>