[jira] [Commented] (YARN-7581) HBase filters are not constructed correctly in ATSv2

2018-03-22 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410835#comment-16410835
 ] 

genericqa commented on YARN-7581:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
51s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
24s{color} | {color:green} hadoop-yarn-server-timelineservice-hbase-client in 
the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 19m  2s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:dbd69cb |
| JIRA Issue | YARN-7581 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12915598/YARN-7581-branch-2.05.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7c16b2ab8dec 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2 / 3db8c7f |
| maven | version: Apache Maven 3.3.9 
(bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) |
| Default Java | 1.7.0_151 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20060/testReport/ |
| Max. process+thread count | 75 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20060/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> HBase filters are not constructed correctly in ATSv2
> 
>
> Key: YARN-7581
> URL: https://issues.apache.org/jira/browse/YARN-7581
>  

[jira] [Updated] (YARN-8041) Federation: Implement multiple interfaces(14 interfaces), routing REST invocations transparently to multiple RMs

2018-03-22 Thread Yiran Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Description: 
Implement routing 
getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
 REST invocations transparently to multiple RMs 


I think we need add a new Web Protocol for Router, like is 

{code:java}
public interface RouterWebServiceProtocol extends RMWebServiceProtocol {
List getAllSubClusterInfo();
ClusterInfo  getSubClusterInfo(clusterId);
   SchedulerInfoType getSchedulerInfo(subClusterId);
}
{code}


cause the Router needed some protocol, such is getAllSubClusterInfo(): 
List 、 getSubClusterInfo(clusterId): ClusterInfo 
、getSchedulerInfo(subClusterId): SchedulerInfo  。 if needed i can do it.

  was:
Implement routing 
getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
 REST invocations transparently to multiple RMs 


I think we need add a new Web Protocol for Router, like is 
public interface RouterWebServiceProtocol extends RMWebServiceProtocol {
List getAllSubClusterInfo();
ClusterInfo  getSubClusterInfo(clusterId);
   SchedulerInfoType getSchedulerInfo(subClusterId);
}

cause the Router needed some protocol, such is getAllSubClusterInfo(): 
List 、 getSubClusterInfo(clusterId): ClusterInfo 
、getSchedulerInfo(subClusterId): SchedulerInfo  。 if needed i can do it.


> Federation: Implement multiple interfaces(14 interfaces), routing REST 
> invocations transparently to multiple RMs 
> -
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Yiran Wu
>Priority: Major
>  Labels: patch
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch
>
>
> Implement routing 
> getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
>  REST invocations transparently to multiple RMs 
> I think we need add a new Web Protocol for Router, like is 
> {code:java}
> public interface RouterWebServiceProtocol extends RMWebServiceProtocol {
> List getAllSubClusterInfo();
> ClusterInfo  getSubClusterInfo(clusterId);
>SchedulerInfoType getSchedulerInfo(subClusterId);
> }
> {code}
> cause the Router needed some protocol, such is getAllSubClusterInfo(): 
> List 、 getSubClusterInfo(clusterId): ClusterInfo 
> 、getSchedulerInfo(subClusterId): SchedulerInfo  。 if needed i can do it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) Federation: Implement multiple interfaces(14 interfaces), routing REST invocations transparently to multiple RMs

2018-03-22 Thread Yiran Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiran Wu updated YARN-8041:
---
Description: 
Implement routing 
getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
 REST invocations transparently to multiple RMs 


I think we need add a new Web Protocol for Router, like is 
public interface RouterWebServiceProtocol extends RMWebServiceProtocol {
List getAllSubClusterInfo();
ClusterInfo  getSubClusterInfo(clusterId);
   SchedulerInfoType getSchedulerInfo(subClusterId);
}

cause the Router needed some protocol, such is getAllSubClusterInfo(): 
List 、 getSubClusterInfo(clusterId): ClusterInfo 
、getSchedulerInfo(subClusterId): SchedulerInfo  。 if needed i can do it.

  was:Implement routing 
getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
 REST invocations transparently to multiple RMs 


> Federation: Implement multiple interfaces(14 interfaces), routing REST 
> invocations transparently to multiple RMs 
> -
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Yiran Wu
>Priority: Major
>  Labels: patch
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch
>
>
> Implement routing 
> getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
>  REST invocations transparently to multiple RMs 
> I think we need add a new Web Protocol for Router, like is 
> public interface RouterWebServiceProtocol extends RMWebServiceProtocol {
> List getAllSubClusterInfo();
> ClusterInfo  getSubClusterInfo(clusterId);
>SchedulerInfoType getSchedulerInfo(subClusterId);
> }
> cause the Router needed some protocol, such is getAllSubClusterInfo(): 
> List 、 getSubClusterInfo(clusterId): ClusterInfo 
> 、getSchedulerInfo(subClusterId): SchedulerInfo  。 if needed i can do it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8062) yarn rmadmin -getGroups returns group from which the user has been removed

2018-03-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409368#comment-16409368
 ] 

Sunil G edited comment on YARN-8062 at 3/23/18 5:21 AM:


Test steps:
{noformat}
[r...@abc.com hadoop-yarn]# sudo su - -c "groupadd testUser5" root
[r...@abc.com hadoop-yarn]# sudo su - -c "useradd testUser5 -g testUser5" root
[r...@abc.com hadoop-yarn]# sudo su - -c "groupadd Group5" root
[r...@abc.com hadoop-yarn]# sudo su - -c "usermod -a -G Group5 testUser5" root
[r...@abc.com hadoop-yarn]# groups testUser5
testUser5 : testUser5 Group5
[r...@abc.com hadoop-yarn]# sudo su - -c "yarn rmadmin 
-refreshUserToGroupsMappings" yarn
[r...@abc.com hadoop-yarn]# sudo su - -c "yarn rmadmin -getGroups testUser5" 
yarn
testUser5 : testUser5 Group5
[r...@abc.com hadoop-yarn]# sudo su - -c "gpasswd -d testUser5 Group5" root
Removing user testUser5 from group Group5
[r...@abc.com hadoop-yarn]# groups testUser5
testUser5 : testUser5
[r...@abc.com hadoop-yarn]# sudo su - -c "yarn rmadmin 
-refreshUserToGroupsMappings" yarn
[r...@abc.com hadoop-yarn]# sudo su - -c "yarn rmadmin -getGroups testUser5" 
yarn
testUser5 : testUser5{noformat}
In Summary, *sudo su - -c "yarn rmadmin -getGroups testUser5" yarn* and *groups 
testUser5* gives same o/p.

 

[~leftnoteasy] pls help to review the patch. We also had to make change in 
AdminService.java in addition to RM init call.


was (Author: sunilg):
Test steps:
{noformat}
[r...@abc.com hadoop-yarn]# sudo su - -c "groupadd testUser5" root
[r...@abc.com hadoop-yarn]# sudo su - -c "useradd testUser5 -g testUser5" root
[r...@abc.com hadoop-yarn]# sudo su - -c "groupadd Group5" root
[r...@abc.com hadoop-yarn]# sudo su - -c "usermod -a -G Group5 testUser5" root
[r...@abc.com hadoop-yarn]# groups testUser5
testUser5 : testUser5 Group5
[r...@abc.com hadoop-yarn]# sudo su - -c "yarn rmadmin 
-refreshUserToGroupsMappings" yarn
[r...@abc.com hadoop-yarn]# sudo su - -c "yarn rmadmin -getGroups testUser5" 
yarn
testUser5 : testUser5 Group5
[r...@abc.com hadoop-yarn]# sudo su - -c "gpasswd -d testUser5 Group5" root
Removing user testUser5 from group Group5
[r...@abc.com hadoop-yarn]# groups testUser5
testUser5 : testUser5
[r...@abc.com hadoop-yarn]# sudo su - -c "yarn rmadmin 
-refreshUserToGroupsMappings" yarn
[r...@abc.com hadoop-yarn]# sudo su - -c "yarn rmadmin -getGroups testUser5" 
yarn
testUser5 : testUser5{noformat}
In Summary, *sudo su - -c "/usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin 
-getGroups testUser5" yarn* and *groups testUser5* gives same o/p.

 

[~leftnoteasy] pls help to review the patch. We also had to make change in 
AdminService.java in addition to RM init call.

> yarn rmadmin -getGroups returns group from which the user has been removed
> --
>
> Key: YARN-8062
> URL: https://issues.apache.org/jira/browse/YARN-8062
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Sunil G
>Priority: Critical
> Attachments: YARN-8062.001.patch, YARN-8062.002.patch
>
>
> {code:title= adding group hrt_yarn_rmadmin_test}
> sudo su - -c "groupadd hrt_yarn_rmadmin_test" root
> {code}
> {Code:title=adding user hrt_yarn_rmadmin_test to group hrt_yarn_rmadmin_test}
> sudo su - -c "useradd hrt_yarn_rmadmin_test -g hrt_yarn_rmadmin_test" root
> {Code}
> {Code:title= adding group hrt_yarn_rmadmin_test_group2 }
> sudo su - -c "groupadd hrt_yarn_rmadmin_test_group2" root
> {Code}
> {Code:title=adding user hrt_yarn_rmadmin_test to group 
> hrt_yarn_rmadmin_test_group2}
> sudo su - -c "usermod -a -G hrt_yarn_rmadmin_test_group2 
> hrt_yarn_rmadmin_test" root
> {Code}
> Refresh and getGroups
> {code}
> yarn rmadmin -refreshUserToGroupsMappings
> /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups 
> hrt_yarn_rmadmin_test
> hrt_yarn_rmadmin_test : hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2
> {code}
> Delete group hrt_yarn_rmadmin_test_group2 from user hrt_yarn_rmadmin_test  
> and refresh and do getGroups.
> We can still see group hrt_yarn_rmadmin_test_group2
> {code}
> sudo su - -c "gpasswd -d hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2" 
> root
> {code}
> Removing user hrt_yarn_rmadmin_test from group hrt_yarn_rmadmin_test_group2
> {code}
> bash-4.2$  /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin 
> -refreshUserToGroupsMappings
> /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups 
> hrt_yarn_rmadmin_test
> hrt_yarn_rmadmin_test : hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (YARN-8062) yarn rmadmin -getGroups returns group from which the user has been removed

2018-03-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410820#comment-16410820
 ] 

Sunil G commented on YARN-8062:
---

Tested same test case and its working fine.

> yarn rmadmin -getGroups returns group from which the user has been removed
> --
>
> Key: YARN-8062
> URL: https://issues.apache.org/jira/browse/YARN-8062
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Sunil G
>Priority: Critical
> Attachments: YARN-8062.001.patch, YARN-8062.002.patch
>
>
> {code:title= adding group hrt_yarn_rmadmin_test}
> sudo su - -c "groupadd hrt_yarn_rmadmin_test" root
> {code}
> {Code:title=adding user hrt_yarn_rmadmin_test to group hrt_yarn_rmadmin_test}
> sudo su - -c "useradd hrt_yarn_rmadmin_test -g hrt_yarn_rmadmin_test" root
> {Code}
> {Code:title= adding group hrt_yarn_rmadmin_test_group2 }
> sudo su - -c "groupadd hrt_yarn_rmadmin_test_group2" root
> {Code}
> {Code:title=adding user hrt_yarn_rmadmin_test to group 
> hrt_yarn_rmadmin_test_group2}
> sudo su - -c "usermod -a -G hrt_yarn_rmadmin_test_group2 
> hrt_yarn_rmadmin_test" root
> {Code}
> Refresh and getGroups
> {code}
> yarn rmadmin -refreshUserToGroupsMappings
> /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups 
> hrt_yarn_rmadmin_test
> hrt_yarn_rmadmin_test : hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2
> {code}
> Delete group hrt_yarn_rmadmin_test_group2 from user hrt_yarn_rmadmin_test  
> and refresh and do getGroups.
> We can still see group hrt_yarn_rmadmin_test_group2
> {code}
> sudo su - -c "gpasswd -d hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2" 
> root
> {code}
> Removing user hrt_yarn_rmadmin_test from group hrt_yarn_rmadmin_test_group2
> {code}
> bash-4.2$  /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin 
> -refreshUserToGroupsMappings
> /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups 
> hrt_yarn_rmadmin_test
> hrt_yarn_rmadmin_test : hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8062) yarn rmadmin -getGroups returns group from which the user has been removed

2018-03-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410817#comment-16410817
 ] 

Sunil G commented on YARN-8062:
---

Thanks [~leftnoteasy] .

Ideally calling refresh from RM#init is not need at all as *refreshAll* will be 
invoked when RM switches to active. So i think lets remove the call from 
RM#init and keep admin service as it is. Updating patch.

> yarn rmadmin -getGroups returns group from which the user has been removed
> --
>
> Key: YARN-8062
> URL: https://issues.apache.org/jira/browse/YARN-8062
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Sunil G
>Priority: Critical
> Attachments: YARN-8062.001.patch, YARN-8062.002.patch
>
>
> {code:title= adding group hrt_yarn_rmadmin_test}
> sudo su - -c "groupadd hrt_yarn_rmadmin_test" root
> {code}
> {Code:title=adding user hrt_yarn_rmadmin_test to group hrt_yarn_rmadmin_test}
> sudo su - -c "useradd hrt_yarn_rmadmin_test -g hrt_yarn_rmadmin_test" root
> {Code}
> {Code:title= adding group hrt_yarn_rmadmin_test_group2 }
> sudo su - -c "groupadd hrt_yarn_rmadmin_test_group2" root
> {Code}
> {Code:title=adding user hrt_yarn_rmadmin_test to group 
> hrt_yarn_rmadmin_test_group2}
> sudo su - -c "usermod -a -G hrt_yarn_rmadmin_test_group2 
> hrt_yarn_rmadmin_test" root
> {Code}
> Refresh and getGroups
> {code}
> yarn rmadmin -refreshUserToGroupsMappings
> /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups 
> hrt_yarn_rmadmin_test
> hrt_yarn_rmadmin_test : hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2
> {code}
> Delete group hrt_yarn_rmadmin_test_group2 from user hrt_yarn_rmadmin_test  
> and refresh and do getGroups.
> We can still see group hrt_yarn_rmadmin_test_group2
> {code}
> sudo su - -c "gpasswd -d hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2" 
> root
> {code}
> Removing user hrt_yarn_rmadmin_test from group hrt_yarn_rmadmin_test_group2
> {code}
> bash-4.2$  /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin 
> -refreshUserToGroupsMappings
> /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups 
> hrt_yarn_rmadmin_test
> hrt_yarn_rmadmin_test : hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8062) yarn rmadmin -getGroups returns group from which the user has been removed

2018-03-22 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-8062:
--
Attachment: YARN-8062.002.patch

> yarn rmadmin -getGroups returns group from which the user has been removed
> --
>
> Key: YARN-8062
> URL: https://issues.apache.org/jira/browse/YARN-8062
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Sunil G
>Priority: Critical
> Attachments: YARN-8062.001.patch, YARN-8062.002.patch
>
>
> {code:title= adding group hrt_yarn_rmadmin_test}
> sudo su - -c "groupadd hrt_yarn_rmadmin_test" root
> {code}
> {Code:title=adding user hrt_yarn_rmadmin_test to group hrt_yarn_rmadmin_test}
> sudo su - -c "useradd hrt_yarn_rmadmin_test -g hrt_yarn_rmadmin_test" root
> {Code}
> {Code:title= adding group hrt_yarn_rmadmin_test_group2 }
> sudo su - -c "groupadd hrt_yarn_rmadmin_test_group2" root
> {Code}
> {Code:title=adding user hrt_yarn_rmadmin_test to group 
> hrt_yarn_rmadmin_test_group2}
> sudo su - -c "usermod -a -G hrt_yarn_rmadmin_test_group2 
> hrt_yarn_rmadmin_test" root
> {Code}
> Refresh and getGroups
> {code}
> yarn rmadmin -refreshUserToGroupsMappings
> /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups 
> hrt_yarn_rmadmin_test
> hrt_yarn_rmadmin_test : hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2
> {code}
> Delete group hrt_yarn_rmadmin_test_group2 from user hrt_yarn_rmadmin_test  
> and refresh and do getGroups.
> We can still see group hrt_yarn_rmadmin_test_group2
> {code}
> sudo su - -c "gpasswd -d hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2" 
> root
> {code}
> Removing user hrt_yarn_rmadmin_test from group hrt_yarn_rmadmin_test_group2
> {code}
> bash-4.2$  /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin 
> -refreshUserToGroupsMappings
> /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups 
> hrt_yarn_rmadmin_test
> hrt_yarn_rmadmin_test : hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7988) Refactor FSNodeLabelStore code for attributes store support

2018-03-22 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410801#comment-16410801
 ] 

genericqa commented on YARN-7988:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-3409 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
35s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 5s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
35s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
44s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
42s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} YARN-3409 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
8s{color} | {color:green} hadoop-yarn-project_hadoop-yarn generated 0 new + 86 
unchanged - 1 fixed = 86 total (was 87) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  6s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 24 new + 57 unchanged - 22 fixed = 81 total (was 79) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
39s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
9s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 35s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
37s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}141m 53s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
|  |  
org.apache.hadoop.yarn.nodelabels.store.FSStoreOpHandler.registerLog(FSStoreOpHandler$StoreType,
 int, Class) invokes inefficient new Integer(int) constructor; use 
Integer.valueOf(int) instead  At FSStoreOpHandler.java:new Integer(int) 
constructor; use Integer.valueOf(int) instead  At FSStoreOpHandler.java:[line 
64] |
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels |
|   | 

[jira] [Assigned] (YARN-7794) SLSRunner is not loading timeline service jars causing failure

2018-03-22 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-7794:
---

Assignee: Yufei Gu  (was: Rohith Sharma K S)

> SLSRunner is not loading timeline service jars causing failure
> --
>
> Key: YARN-7794
> URL: https://issues.apache.org/jira/browse/YARN-7794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.1.0
>Reporter: Sunil G
>Assignee: Yufei Gu
>Priority: Blocker
> Attachments: YARN-7794.001.patch
>
>
> {code:java}
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         ... 13 more
> Exception in thread "pool-2-thread-390" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollector
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:443)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:321)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:641){code}
> We are getting this error while running SLS. new patch of timelineservice 
> under share/hadoop/yarn is not loaded in SLS jvm (verified from slsrunner 
> classpath)
> cc/ [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7794) SLSRunner is not loading timeline service jars causing failure

2018-03-22 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410789#comment-16410789
 ] 

Rohith Sharma K S commented on YARN-7794:
-

{quote}can I take this?
{quote}
Sure, assigned to you.

> SLSRunner is not loading timeline service jars causing failure
> --
>
> Key: YARN-7794
> URL: https://issues.apache.org/jira/browse/YARN-7794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.1.0
>Reporter: Sunil G
>Assignee: Yufei Gu
>Priority: Blocker
> Attachments: YARN-7794.001.patch
>
>
> {code:java}
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         ... 13 more
> Exception in thread "pool-2-thread-390" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollector
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:443)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:321)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:641){code}
> We are getting this error while running SLS. new patch of timelineservice 
> under share/hadoop/yarn is not loaded in SLS jvm (verified from slsrunner 
> classpath)
> cc/ [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-03-22 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410746#comment-16410746
 ] 

Yuqi Wang commented on YARN-7872:
-

Thanks [~leftnoteasy] for YARN-6592.

Does the "Rich placement constraints" can also be worked for the labeled nodes?

If yes, I think it may also help on this JIRA.

I will take a look in details later.

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8032) Yarn service should expose failuresValidityInterval to users and use it for launching containers

2018-03-22 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410743#comment-16410743
 ] 

genericqa commented on YARN-8032:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 
43s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
48s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 52s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
35s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 90m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8032 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12915793/YARN-8032.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  

[jira] [Commented] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-22 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410738#comment-16410738
 ] 

Yuqi Wang commented on YARN-8012:
-

Thanks [~jlowe], I get your point in your last paragraph, so let me refine the 
patch according to your suggestions. Very appreciated.

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, 
> YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6578) Return container resource utilization from NM ContainerStatus call

2018-03-22 Thread Yang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated YARN-6578:

Description: 
When the applicationMaster wants to change(increase/decrease) resources of an 
allocated container, resource utilization is an important reference indicator 
for decision making. So, when AM call NMClient.getContainerStatus, resource 
utilization needs to be returned.

Also container resource utilization need to report to RM to make better 
scheduling.

  was:When the applicationMaster wants to change(increase/decrease) resources 
of an allocated container, resource utilization is an important reference 
indicator for decision making.  So, when AM call NMClient.getContainerStatus, 
resource utilization needs to be returned.


> Return container resource utilization from NM ContainerStatus call
> --
>
> Key: YARN-6578
> URL: https://issues.apache.org/jira/browse/YARN-6578
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Major
> Attachments: YARN-6578.001.patch
>
>
> When the applicationMaster wants to change(increase/decrease) resources of an 
> allocated container, resource utilization is an important reference indicator 
> for decision making. So, when AM call NMClient.getContainerStatus, resource 
> utilization needs to be returned.
> Also container resource utilization need to report to RM to make better 
> scheduling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6578) Return container resource utilization from NM ContainerStatus call

2018-03-22 Thread Yang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang reassigned YARN-6578:
---

Assignee: Yang Wang

> Return container resource utilization from NM ContainerStatus call
> --
>
> Key: YARN-6578
> URL: https://issues.apache.org/jira/browse/YARN-6578
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Major
> Attachments: YARN-6578.001.patch
>
>
> When the applicationMaster wants to change(increase/decrease) resources of an 
> allocated container, resource utilization is an important reference indicator 
> for decision making.  So, when AM call NMClient.getContainerStatus, resource 
> utilization needs to be returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7956) HOME/Services/ and HOME/Services//Components refer to same page

2018-03-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410718#comment-16410718
 ] 

Sunil G commented on YARN-7956:
---

Looks straight forward. +1

Committing shortly. Thank [~yeshavora]

> HOME/Services/ and HOME/Services//Components refer 
> to same page
> -
>
> Key: YARN-7956
> URL: https://issues.apache.org/jira/browse/YARN-7956
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-03-20 at 4.19.37 PM.png, 
> YARN-7956.001.patch
>
>
> Scenario:
> 1) Start Yarn service
> 2) Click on a Running yarn service ( example : yesha-sleeper)
> http://:8088/ui2/#/yarn-app/application_1518804855867_0002/components?service=yesha-sleeper
> 3) Now click on yesha-sleeper [application_1518804855867_0002] link
> Both components and yesha-sleeper [application_1518804855867_0002] link point 
> to one page. 
> HOME/Services/ and HOME/Services//Components refer 
> to same page.
> We should not need two links to refer to one page
> h2.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7988) Refactor FSNodeLabelStore code for attributes store support

2018-03-22 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410683#comment-16410683
 ] 

Bibin A Chundatt commented on YARN-7988:


[~sunilg]
Thank  you for comment. Personally i would have preferred the existing 
implementation ..Code looks in previous implementation.
Currently the  id seems plugged in for compatibility. 

> Refactor FSNodeLabelStore code for attributes store support
> ---
>
> Key: YARN-7988
> URL: https://issues.apache.org/jira/browse/YARN-7988
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-7988-YARN-3409.002.patch, 
> YARN-7988-YARN-3409.003.patch, YARN-7988-YARN-3409.004.patch, 
> YARN-7988-YARN-3409.005.patch, YARN-7988.001.patch
>
>
> # Abstract out file FileSystemStore operation
> # Define EditLog Operartions  and Mirror operation
> # Support compatibility with old nodelabel store



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7988) Refactor FSNodeLabelStore code for attributes store support

2018-03-22 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410683#comment-16410683
 ] 

Bibin A Chundatt edited comment on YARN-7988 at 3/23/18 1:52 AM:
-

[~sunilg]
Thank  you for comment. Personally i would have preferred the existing 
implementation ..
Currently the  id seems plugged in for compatibility. 


was (Author: bibinchundatt):
[~sunilg]
Thank  you for comment. Personally i would have preferred the existing 
implementation ..Code looks in previous implementation.
Currently the  id seems plugged in for compatibility. 

> Refactor FSNodeLabelStore code for attributes store support
> ---
>
> Key: YARN-7988
> URL: https://issues.apache.org/jira/browse/YARN-7988
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-7988-YARN-3409.002.patch, 
> YARN-7988-YARN-3409.003.patch, YARN-7988-YARN-3409.004.patch, 
> YARN-7988-YARN-3409.005.patch, YARN-7988.001.patch
>
>
> # Abstract out file FileSystemStore operation
> # Define EditLog Operartions  and Mirror operation
> # Support compatibility with old nodelabel store



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7988) Refactor FSNodeLabelStore code for attributes store support

2018-03-22 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-7988:
---
Attachment: YARN-7988-YARN-3409.005.patch

> Refactor FSNodeLabelStore code for attributes store support
> ---
>
> Key: YARN-7988
> URL: https://issues.apache.org/jira/browse/YARN-7988
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-7988-YARN-3409.002.patch, 
> YARN-7988-YARN-3409.003.patch, YARN-7988-YARN-3409.004.patch, 
> YARN-7988-YARN-3409.005.patch, YARN-7988.001.patch
>
>
> # Abstract out file FileSystemStore operation
> # Define EditLog Operartions  and Mirror operation
> # Support compatibility with old nodelabel store



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2018-03-22 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410664#comment-16410664
 ] 

genericqa commented on YARN-7221:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 53s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 58s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 69m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-7221 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12915788/YARN-7221.010.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 8d3ef2cb6eae 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 
21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8620d2b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/20057/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20057/testReport/ |
| Max. process+thread count | 408 (vs. ulimit of 1) |
| modules | C: 

[jira] [Updated] (YARN-8032) Yarn service should expose failuresValidityInterval to users and use it for launching containers

2018-03-22 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8032:

Attachment: YARN-8032.003.patch

> Yarn service should expose failuresValidityInterval to users and use it for 
> launching containers
> 
>
> Key: YARN-8032
> URL: https://issues.apache.org/jira/browse/YARN-8032
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8032.001.patch, YARN-8032.002.patch, 
> YARN-8032.003.patch
>
>
> With YARN-5015 the support for sliding window retry policy was added. Yarn 
> service should expose it via the api for the users to take advantage of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8016) Refine PlacementRule interface and add a app-name queue mapping rule as an example

2018-03-22 Thread Zian Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410631#comment-16410631
 ] 

Zian Chen commented on YARN-8016:
-

Quickly investigated the failed case, not related to the latest patch. 

[~leftnoteasy] , [~yufeigu] , any thoughts or suggestions for the latest patch? 
Thanks!

> Refine PlacementRule interface and add a app-name queue mapping rule as an 
> example
> --
>
> Key: YARN-8016
> URL: https://issues.apache.org/jira/browse/YARN-8016
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8016.001.patch, YARN-8016.002.patch, 
> YARN-8016.003.patch, YARN-8016.004.patch, YARN-8016.005.patch
>
>
> After YARN-3635/YARN-6689, PlacementRule becomes a common interface which can 
> be used by scheduler and can be dynamically updated by scheduler according to 
> configs. There're some other works. 
> - There's no way to initialize PlacementRule.
> - No example of PlacementRule except the user-group mapping one.
> This JIRA is targeted to refine PlacementRule interfaces and add another 
> PlacementRule example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8032) Yarn service should expose failuresValidityInterval to users and use it for launching containers

2018-03-22 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410630#comment-16410630
 ] 

Chandni Singh commented on YARN-8032:
-

Also please see:
https://github.com/apache/hadoop/blob/8620d2bdf9ddeaa28fd0f5ddce984c44025e02a7/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerRetryContext.java#L53

> Yarn service should expose failuresValidityInterval to users and use it for 
> launching containers
> 
>
> Key: YARN-8032
> URL: https://issues.apache.org/jira/browse/YARN-8032
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8032.001.patch, YARN-8032.002.patch
>
>
> With YARN-5015 the support for sliding window retry policy was added. Yarn 
> service should expose it via the api for the users to take advantage of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container

2018-03-22 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410627#comment-16410627
 ] 

Eric Yang commented on YARN-7654:
-

[~jlowe] Patch 005 requires YARN-7221 patch 10.  This patch has been rebased to 
current trunk with all your recommendation included.  Let me know if this works 
for you.  Thanks

> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7654) Support ENTRY_POINT for docker container

2018-03-22 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7654:

Attachment: YARN-7654.005.patch

> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8032) Yarn service should expose failuresValidityInterval to users and use it for launching containers

2018-03-22 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410624#comment-16410624
 ] 

Chandni Singh commented on YARN-8032:
-

[~eyang]
{quote}if this is exposed to user, shouldn't it add delay to failure of 
containers before retry?
{quote}
No this property doesn't add delay. I'll explain the function with an example. 
 Let's say {{failuresValidityInterval}} is set to 2 seconds. {{maxRetries}} = 
2. 
 At time = t1, the container fails. NM tries to restart the container 
automatically since maxRetries =2. After t1+ 2 seconds (failures validity 
interval), the failure count would not include the failure at t1 since 2 
seconds have elapsed. 

> Yarn service should expose failuresValidityInterval to users and use it for 
> launching containers
> 
>
> Key: YARN-8032
> URL: https://issues.apache.org/jira/browse/YARN-8032
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8032.001.patch, YARN-8032.002.patch
>
>
> With YARN-5015 the support for sliding window retry policy was added. Yarn 
> service should expose it via the api for the users to take advantage of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8032) Yarn service should expose failuresValidityInterval to users and use it for launching containers

2018-03-22 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410611#comment-16410611
 ] 

Eric Yang commented on YARN-8032:
-

{quote}
All the configurations in DockerContainers.md are related to docker 
configurations so I don't think it should be added there. 
I will add it to 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/Configurations.md.
 This has other such configurations.
{quote}

[~csingh] Yes, you are correct on the location to update document.  Thanks

> Yarn service should expose failuresValidityInterval to users and use it for 
> launching containers
> 
>
> Key: YARN-8032
> URL: https://issues.apache.org/jira/browse/YARN-8032
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8032.001.patch, YARN-8032.002.patch
>
>
> With YARN-5015 the support for sliding window retry policy was added. Yarn 
> service should expose it via the api for the users to take advantage of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8032) Yarn service should expose failuresValidityInterval to users and use it for launching containers

2018-03-22 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410610#comment-16410610
 ] 

Eric Yang commented on YARN-8032:
-

If this is exposed to user, shouldn't it add delay to failure of containers 
before retry?  I don't see this happening.  I see that failed container is 
immediately retried without wait for failure validity interval.  Am I missing 
something?

> Yarn service should expose failuresValidityInterval to users and use it for 
> launching containers
> 
>
> Key: YARN-8032
> URL: https://issues.apache.org/jira/browse/YARN-8032
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8032.001.patch, YARN-8032.002.patch
>
>
> With YARN-5015 the support for sliding window retry policy was added. Yarn 
> service should expose it via the api for the users to take advantage of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8032) Yarn service should expose failuresValidityInterval to users and use it for launching containers

2018-03-22 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410609#comment-16410609
 ] 

Chandni Singh commented on YARN-8032:
-

{quote}
Can you also add usage of this property into: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md
 file?
{quote}

All the configurations in DockerContainers.md are related to docker 
configurations so I don't think it should be added there. 
I will add it to 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/Configurations.md.
 This has other such configurations.

> Yarn service should expose failuresValidityInterval to users and use it for 
> launching containers
> 
>
> Key: YARN-8032
> URL: https://issues.apache.org/jira/browse/YARN-8032
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8032.001.patch, YARN-8032.002.patch
>
>
> With YARN-5015 the support for sliding window retry policy was added. Yarn 
> service should expose it via the api for the users to take advantage of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8016) Refine PlacementRule interface and add a app-name queue mapping rule as an example

2018-03-22 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410605#comment-16410605
 ] 

genericqa commented on YARN-8016:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
36s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
46s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 35s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 14 new + 364 unchanged - 0 fixed = 378 total (was 364) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
50s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m  5s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
24s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}168m  4s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy |
\\
\\
|| Subsystem || Report/Notes 

[jira] [Comment Edited] (YARN-8032) Yarn service should expose failuresValidityInterval to users and use it for launching containers

2018-03-22 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410600#comment-16410600
 ] 

Chandni Singh edited comment on YARN-8032 at 3/23/18 12:07 AM:
---

[~eyang] failure validity interval is used by Node Manager to keep track of 
containers failure in a sliding time window. Please see YARN-5015. You will not 
see any change in behavior when you submit an app. This Jira just exposes this 
to yarn service users.


was (Author: csingh):
[~eyang] failure validity interval is used by Node Manager to keep track of 
containers failure. Please see YARN-5015. You will not see any change in 
behavior when you submit an app. This Jira just exposes this to yarn service 
users.

> Yarn service should expose failuresValidityInterval to users and use it for 
> launching containers
> 
>
> Key: YARN-8032
> URL: https://issues.apache.org/jira/browse/YARN-8032
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8032.001.patch, YARN-8032.002.patch
>
>
> With YARN-5015 the support for sliding window retry policy was added. Yarn 
> service should expose it via the api for the users to take advantage of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8032) Yarn service should expose failuresValidityInterval to users and use it for launching containers

2018-03-22 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410600#comment-16410600
 ] 

Chandni Singh commented on YARN-8032:
-

[~eyang] failure validity interval is used by Node Manager to keep track of 
containers failure. Please see YARN-5015. You will not see any change is 
behavior when you submit an app. This Jira just exposes this to yarn service 
users.

> Yarn service should expose failuresValidityInterval to users and use it for 
> launching containers
> 
>
> Key: YARN-8032
> URL: https://issues.apache.org/jira/browse/YARN-8032
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8032.001.patch, YARN-8032.002.patch
>
>
> With YARN-5015 the support for sliding window retry policy was added. Yarn 
> service should expose it via the api for the users to take advantage of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8032) Yarn service should expose failuresValidityInterval to users and use it for launching containers

2018-03-22 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410600#comment-16410600
 ] 

Chandni Singh edited comment on YARN-8032 at 3/23/18 12:05 AM:
---

[~eyang] failure validity interval is used by Node Manager to keep track of 
containers failure. Please see YARN-5015. You will not see any change in 
behavior when you submit an app. This Jira just exposes this to yarn service 
users.


was (Author: csingh):
[~eyang] failure validity interval is used by Node Manager to keep track of 
containers failure. Please see YARN-5015. You will not see any change is 
behavior when you submit an app. This Jira just exposes this to yarn service 
users.

> Yarn service should expose failuresValidityInterval to users and use it for 
> launching containers
> 
>
> Key: YARN-8032
> URL: https://issues.apache.org/jira/browse/YARN-8032
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8032.001.patch, YARN-8032.002.patch
>
>
> With YARN-5015 the support for sliding window retry policy was added. Yarn 
> service should expose it via the api for the users to take advantage of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2018-03-22 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410577#comment-16410577
 ] 

Eric Yang commented on YARN-7221:
-

- Patch 10, fixed formatting issue.

> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, 
> YARN-7221.009.patch, YARN-7221.010.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7221) Add security check for privileged docker container

2018-03-22 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7221:

Attachment: YARN-7221.010.patch

> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, 
> YARN-7221.009.patch, YARN-7221.010.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8054) Improve robustness of the LocalDirsHandlerService MonitoringTimerTask thread

2018-03-22 Thread Zian Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen reassigned YARN-8054:
---

Assignee: Jason Lowe  (was: Zian Chen)

> Improve robustness of the LocalDirsHandlerService MonitoringTimerTask thread
> 
>
> Key: YARN-8054
> URL: https://issues.apache.org/jira/browse/YARN-8054
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jason Lowe
>Priority: Major
> Fix For: 2.10.0, 2.9.1, 2.8.4, 3.0.2, 3.1.1
>
> Attachments: YARN-8054.001.patch, YARN-8054.002.patch
>
>
> The DeprecatedRawLocalFileStatus#loadPermissionInfo can throw a 
> RuntimeException which can kill the MonitoringTimerTask thread. This can 
> leave the node is a bad state where all NM local directories are marked "bad" 
> and there is no automatic recovery. In the below can the error was "too many 
> open files",  but could be a number of other recoverable states.
> {noformat}
> 2018-03-18 02:37:42,960 [DiskHealthMonitor-Timer] ERROR 
> yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[DiskHealthMonitor-Timer,5,main] threw an Exception.
> java.lang.RuntimeException: Error while running command to get file 
> permissions : java.io.IOException: Cannot run program "ls": error=24, Too 
> many open files
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:942)
> at org.apache.hadoop.util.Shell.run(Shell.java:898)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:1307)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:1289)
> at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1078)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:697)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1556)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkAndInitializeLocalDirs(ResourceLocalizationService.java:1521)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$1.onDirsChanged(ResourceLocalizationService.java:271)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:381)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:449)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$500(LocalDirsHandlerService.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.run(LocalDirsHandlerService.java:166)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> Caused by: java.io.IOException: error=24, Too many open files
> at java.lang.UNIXProcess.forkAndExec(Native Method)
> at java.lang.UNIXProcess.(UNIXProcess.java:247)
> at java.lang.ProcessImpl.start(ProcessImpl.java:134)
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
> ... 17 more
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:737)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1556)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkAndInitializeLocalDirs(ResourceLocalizationService.java:1521)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$1.onDirsChanged(ResourceLocalizationService.java:271)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:381)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:449)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$500(LocalDirsHandlerService.java:52)
> at 
> 

[jira] [Assigned] (YARN-8054) Improve robustness of the LocalDirsHandlerService MonitoringTimerTask thread

2018-03-22 Thread Zian Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen reassigned YARN-8054:
---

Assignee: Zian Chen  (was: Jonathan Eagles)

> Improve robustness of the LocalDirsHandlerService MonitoringTimerTask thread
> 
>
> Key: YARN-8054
> URL: https://issues.apache.org/jira/browse/YARN-8054
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Zian Chen
>Priority: Major
> Fix For: 2.10.0, 2.9.1, 2.8.4, 3.0.2, 3.1.1
>
> Attachments: YARN-8054.001.patch, YARN-8054.002.patch
>
>
> The DeprecatedRawLocalFileStatus#loadPermissionInfo can throw a 
> RuntimeException which can kill the MonitoringTimerTask thread. This can 
> leave the node is a bad state where all NM local directories are marked "bad" 
> and there is no automatic recovery. In the below can the error was "too many 
> open files",  but could be a number of other recoverable states.
> {noformat}
> 2018-03-18 02:37:42,960 [DiskHealthMonitor-Timer] ERROR 
> yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[DiskHealthMonitor-Timer,5,main] threw an Exception.
> java.lang.RuntimeException: Error while running command to get file 
> permissions : java.io.IOException: Cannot run program "ls": error=24, Too 
> many open files
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:942)
> at org.apache.hadoop.util.Shell.run(Shell.java:898)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:1307)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:1289)
> at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1078)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:697)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1556)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkAndInitializeLocalDirs(ResourceLocalizationService.java:1521)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$1.onDirsChanged(ResourceLocalizationService.java:271)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:381)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:449)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$500(LocalDirsHandlerService.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.run(LocalDirsHandlerService.java:166)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> Caused by: java.io.IOException: error=24, Too many open files
> at java.lang.UNIXProcess.forkAndExec(Native Method)
> at java.lang.UNIXProcess.(UNIXProcess.java:247)
> at java.lang.ProcessImpl.start(ProcessImpl.java:134)
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
> ... 17 more
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:737)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1556)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkAndInitializeLocalDirs(ResourceLocalizationService.java:1521)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$1.onDirsChanged(ResourceLocalizationService.java:271)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:381)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:449)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$500(LocalDirsHandlerService.java:52)
> at 
> 

[jira] [Commented] (YARN-7794) SLSRunner is not loading timeline service jars causing failure

2018-03-22 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410561#comment-16410561
 ] 

genericqa commented on YARN-7794:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
10s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
19s{color} | {color:green} hadoop-sls in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 56m  7s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-7794 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12915763/YARN-7794.001.patch |
| Optional Tests |  asflicense  mvnsite  unit  shellcheck  shelldocs  |
| uname | Linux a1c52cbbdce4 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8620d2b |
| maven | version: Apache Maven 3.3.9 |
| shellcheck | v0.4.6 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20055/testReport/ |
| Max. process+thread count | 408 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20055/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> SLSRunner is not loading timeline service jars causing failure
> --
>
> Key: YARN-7794
> URL: https://issues.apache.org/jira/browse/YARN-7794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.1.0
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-7794.001.patch
>
>
> {code:java}
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         ... 13 more
> Exception in thread "pool-2-thread-390" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollector
>         at 
> 

[jira] [Commented] (YARN-8032) Yarn service should expose failuresValidityInterval to users and use it for launching containers

2018-03-22 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410554#comment-16410554
 ] 

Eric Yang commented on YARN-8032:
-

[~csingh] Can you also add usage of this property into: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md
 file?  I tried to launch application, but the failure validity interval does 
not seem to do anything.  This is how I submitted my job:

{code}
{
  "name": "sleeper-service",
  "kerberos_principal" : {
"principal_name" : "hbase/_h...@example.com",
"keytab" : "file:///etc/security/keytabs/hbase.service.keytab"
  },
  "version": "1",
  "components" :
  [
{
  "name": "ping",
  "number_of_containers": 2,
  "artifact": {
"id": "hadoop/centos:latest",
"type": "DOCKER"
  },
  "launch_command": "sleep,90",
  "resource": {
"cpus": 1,
"memory": "256"
  },
  "configuration": {
"env": {
  "YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL":"true",
  "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true"
},
"properties": {
  "docker.network": "host",
  "yarn.service.container-failure.validity-interval-ms": 3
}
  }
}
  ]
}
{code}

> Yarn service should expose failuresValidityInterval to users and use it for 
> launching containers
> 
>
> Key: YARN-8032
> URL: https://issues.apache.org/jira/browse/YARN-8032
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8032.001.patch, YARN-8032.002.patch
>
>
> With YARN-5015 the support for sliding window retry policy was added. Yarn 
> service should expose it via the api for the users to take advantage of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7574) Add support for Node Labels on Auto Created Leaf Queue Template

2018-03-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410533#comment-16410533
 ] 

Wangda Tan commented on YARN-7574:
--

Thanks [~suma.shivaprasad], in general patch looks good. Just a few minor 
comments: 

1) getAvailableCapacity: can be expanded for better readability.

2) This else may not be necessary:
{code}
else{
 throw new SchedulerDynamicEditException("Child queue absolute capacity "
 + "is initialized to 0. Check parent queue's " + managedParentQueue
 .getQueueName() + " leaf queue template configuration");
 }
{code}
We should check childQueueAbsoluteCapacity > 0 prior to this correct?

> Add support for Node Labels on Auto Created Leaf Queue Template
> ---
>
> Key: YARN-7574
> URL: https://issues.apache.org/jira/browse/YARN-7574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-7574.1.patch, YARN-7574.2.patch, YARN-7574.3.patch, 
> YARN-7574.4.patch, YARN-7574.5.patch
>
>
> YARN-7473 adds support for auto created leaf queues to inherit node labels 
> capacities from parent queues. Howebver there is no support for leaf queue 
> template to allow different configured capacities for different node labels. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5268) DShell AM fails java.lang.InterruptedException

2018-03-22 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410531#comment-16410531
 ] 

genericqa commented on YARN-5268:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} 
| {color:red} YARN-5268 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-5268 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12811411/YARN-5268.1.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20056/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DShell AM fails java.lang.InterruptedException
> --
>
> Key: YARN-5268
> URL: https://issues.apache.org/jira/browse/YARN-5268
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Zian Chen
>Priority: Critical
>  Labels: oct16-easy
> Attachments: YARN-5268.1.patch
>
>
> Distributed Shell AM failed with the following error
> {Code}
> 16/06/16 11:08:10 INFO impl.NMClientAsyncImpl: NMClient stopped.
> 16/06/16 11:08:10 INFO distributedshell.ApplicationMaster: Application 
> completed. Signalling finish to RM
> 16/06/16 11:08:10 INFO distributedshell.ApplicationMaster: Diagnostics., 
> total=16, completed=19, allocated=21, failed=4
> 16/06/16 11:08:10 INFO impl.AMRMClientImpl: Waiting for application to be 
> successfully unregistered.
> 16/06/16 11:08:10 INFO distributedshell.ApplicationMaster: Application Master 
> failed. exiting
> 16/06/16 11:08:10 INFO impl.AMRMClientAsyncImpl: Interrupted while waiting 
> for queue
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)
> End of LogType:AppMaster.stderr
> {Code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8064) Docker ".cmd" files should not be put in hadoop.tmp.dir

2018-03-22 Thread Eric Badger (JIRA)
Eric Badger created YARN-8064:
-

 Summary: Docker ".cmd" files should not be put in hadoop.tmp.dir
 Key: YARN-8064
 URL: https://issues.apache.org/jira/browse/YARN-8064
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Eric Badger
Assignee: Eric Badger


Currently all of the docker command files are being put into 
{{hadoop.tmp.dir}}, which doesn't get cleaned up. So, eventually all of the 
inodes will fill up and no more tasks will be able to run



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-7574) Add support for Node Labels on Auto Created Leaf Queue Template

2018-03-22 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-7574:
-
Comment: was deleted

(was: Thanks [~suma.shivaprasad], 
In general the patch is very nice, some comments: 

1) Changes in SingleConstraintAppPlacementAllocator:
- Before the patch only SELF supported, after this patch several ALLOCATION_TAG 
supported. I'm not sure if we should do that since we haven't validated if 
inter-app target works for SingleConstraintAppPlacementAllocator.

2) TargetApplications:
- Removed unused methods.

3) aggregateAllocationTagsByApps/aggregateAllocationTagsByRack 
- The two methods can be merged.
- appIds != null check is redundant (since 
{{AllocationTagNamespace#getNamespaceScope}} return non-null always).

4) getNamespaceScope:
- There're several "scope" related methods, I suggest to rename them to 
get/setApplicationIds. The name of "scope" is confusing to me. 
- Following methods are used by test only, suggest to remove them:
{code}
/**
* @return true if the namespace is effective in all applications
* in this cluster. Specifically the namespace prefix should be
* "all".
*/
public boolean isGlobal() {
return AllocationTagNamespaceType.ALL.equals(getNamespaceType());
}

/**
* @return true if the namespace is effective within a single application
* by its application ID, the namespace prefix should be "app-id";
* false otherwise.
*/
public boolean isSingleInterApp() {
return AllocationTagNamespaceType.APP_ID.equals(getNamespaceType());
}

/**
* @return true if the namespace is effective to the application itself,
* the namespace prefix should be "self"; false otherwise.
*/
public boolean isIntraApp() {
return AllocationTagNamespaceType.SELF.equals(getNamespaceType());
}

/**
* @return true if the namespace is effective to all applications except
* itself, the namespace prefix should be "not-self"; false otherwise.
*/
public boolean isNotSelf() {
return AllocationTagNamespaceType.NOT_SELF.equals(getNamespaceType());
}
{code}
And actually I suggest to remove all the is... methods since we can check 
{{AllocationTagNamespaceType.XYZ.equals(getNamespaceType)}}. This is optional 
to me, depends on you.
- Several checks are check unsupported namespaces:
{code} 
// TODO Complete remove this check once we support app-label.
if (namespace.isAppLabel()) {
throw new InvalidAllocationTagsQueryException(
namespace.toString() + " is not supported yet!");
}
{code} 
I suggest to add a method like {{throwExceptionIfNamespaceTypeNotSupported}} so 
we don't need to change all the places in the future.

 )

> Add support for Node Labels on Auto Created Leaf Queue Template
> ---
>
> Key: YARN-7574
> URL: https://issues.apache.org/jira/browse/YARN-7574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-7574.1.patch, YARN-7574.2.patch, YARN-7574.3.patch, 
> YARN-7574.4.patch, YARN-7574.5.patch
>
>
> YARN-7473 adds support for auto created leaf queues to inherit node labels 
> capacities from parent queues. Howebver there is no support for leaf queue 
> template to allow different configured capacities for different node labels. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7142) Support placement policy in yarn native services

2018-03-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410503#comment-16410503
 ] 

Wangda Tan commented on YARN-7142:
--

Thanks [~gsaha], my comments: 
1) Placement policy removed from Service? How user to specify common placement 
policies for components.
2) Compatibility of the placement policy?
3) validatePlacementPolicy:
- Should we enforce target tag name == self component name?
4) Remove expression name from Example.md:
- {{"name": "CA1"}}
5) Mark unsupported APIs from the documentation/definition.
6) Add node partition to PlacementConstraint. Ref: 
org.apache.hadoop.yarn.api.resource.PlacementConstraints.PlacementTargets#nodePartition
7) tags => targetTags
8) Use SchedulingRequest when ANY component uses placement policy. 

> Support placement policy in yarn native services
> 
>
> Key: YARN-7142
> URL: https://issues.apache.org/jira/browse/YARN-7142
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Gour Saha
>Priority: Major
> Attachments: YARN-7142.001.patch
>
>
> Placement policy exists in the API but is not implemented yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5268) DShell AM fails java.lang.InterruptedException

2018-03-22 Thread Zian Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410491#comment-16410491
 ] 

Zian Chen commented on YARN-5268:
-

[~leftnoteasy] , I'm interested in this issue. Will pick it up and continue the 
work. Thanks

 

> DShell AM fails java.lang.InterruptedException
> --
>
> Key: YARN-5268
> URL: https://issues.apache.org/jira/browse/YARN-5268
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Zian Chen
>Priority: Critical
>  Labels: oct16-easy
> Attachments: YARN-5268.1.patch
>
>
> Distributed Shell AM failed with the following error
> {Code}
> 16/06/16 11:08:10 INFO impl.NMClientAsyncImpl: NMClient stopped.
> 16/06/16 11:08:10 INFO distributedshell.ApplicationMaster: Application 
> completed. Signalling finish to RM
> 16/06/16 11:08:10 INFO distributedshell.ApplicationMaster: Diagnostics., 
> total=16, completed=19, allocated=21, failed=4
> 16/06/16 11:08:10 INFO impl.AMRMClientImpl: Waiting for application to be 
> successfully unregistered.
> 16/06/16 11:08:10 INFO distributedshell.ApplicationMaster: Application Master 
> failed. exiting
> 16/06/16 11:08:10 INFO impl.AMRMClientAsyncImpl: Interrupted while waiting 
> for queue
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)
> End of LogType:AppMaster.stderr
> {Code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5268) DShell AM fails java.lang.InterruptedException

2018-03-22 Thread Zian Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen reassigned YARN-5268:
---

Assignee: Zian Chen

> DShell AM fails java.lang.InterruptedException
> --
>
> Key: YARN-5268
> URL: https://issues.apache.org/jira/browse/YARN-5268
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Zian Chen
>Priority: Critical
>  Labels: oct16-easy
> Attachments: YARN-5268.1.patch
>
>
> Distributed Shell AM failed with the following error
> {Code}
> 16/06/16 11:08:10 INFO impl.NMClientAsyncImpl: NMClient stopped.
> 16/06/16 11:08:10 INFO distributedshell.ApplicationMaster: Application 
> completed. Signalling finish to RM
> 16/06/16 11:08:10 INFO distributedshell.ApplicationMaster: Diagnostics., 
> total=16, completed=19, allocated=21, failed=4
> 16/06/16 11:08:10 INFO impl.AMRMClientImpl: Waiting for application to be 
> successfully unregistered.
> 16/06/16 11:08:10 INFO distributedshell.ApplicationMaster: Application Master 
> failed. exiting
> 16/06/16 11:08:10 INFO impl.AMRMClientAsyncImpl: Interrupted while waiting 
> for queue
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)
> End of LogType:AppMaster.stderr
> {Code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5590) Add support for increase and decrease of container resources with resource profiles

2018-03-22 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410480#comment-16410480
 ] 

genericqa commented on YARN-5590:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
29s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 14m  
1s{color} | {color:red} hadoop-yarn in trunk failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
40s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 10s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 2 new + 30 unchanged - 0 fixed = 32 total (was 30) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 32 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 34s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 66m 
27s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 28m  
3s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
35s{color} | {color:red} The patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}163m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-5590 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12915738/YARN-5590.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 21fdc3c5e0a7 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f738d75 |
| maven | version: Apache 

[jira] [Commented] (YARN-6830) Support quoted strings for environment variables

2018-03-22 Thread Jim Brennan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410365#comment-16410365
 ] 

Jim Brennan commented on YARN-6830:
---

One question about [~aw]'s approach - while this would solve the problem for 
variables specified via {{-Dmapreduce.map.env}}, it would not work for lists of 
environment variables specified in configuration parameters, like 
{{yarn.nodemanager.admin-env}}, which is currently parsed via the same method 
as {{-Dmapreduce.map.env}}.


> Support quoted strings for environment variables
> 
>
> Key: YARN-6830
> URL: https://issues.apache.org/jira/browse/YARN-6830
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-6830.001.patch, YARN-6830.002.patch, 
> YARN-6830.003.patch, YARN-6830.004.patch
>
>
> There are cases where it is necessary to allow for quoted string literals 
> within environment variables values when passed via the yarn command line 
> interface.
> For example, consider the follow environment variables for a MR map task.
> {{MODE=bar}}
> {{IMAGE_NAME=foo}}
> {{MOUNTS=/tmp/foo,/tmp/bar}}
> When running the MR job, these environment variables are supplied as a comma 
> delimited string.
> {{-Dmapreduce.map.env="MODE=bar,IMAGE_NAME=foo,MOUNTS=/tmp/foo,/tmp/bar"}}
> In this case, {{MOUNTS}} will be parsed and added to the task environment as 
> {{MOUNTS=/tmp/foo}}. Any attempts to quote the embedded comma separated value 
> results in quote characters becoming part of the value, and parsing still 
> breaks down at the comma.
> This issue is to allow for quoting the comma separated value (escaped double 
> or single quote). This was mentioned on YARN-4595 and will impact YARN-5534 
> as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8018) Yarn service: Add support for initiating service upgrade

2018-03-22 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410364#comment-16410364
 ] 

genericqa commented on YARN-8018:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
56s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 15s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 26 new + 138 unchanged - 3 fixed = 164 total (was 141) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 40s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 27m 
37s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  5m 
14s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
27s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}100m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8018 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12915748/YARN-8018.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 30f234cc7834 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 
21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build 

[jira] [Commented] (YARN-1151) Ability to configure auxiliary services from HDFS-based JAR files

2018-03-22 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410355#comment-16410355
 ] 

Vinod Kumar Vavilapalli commented on YARN-1151:
---

>From config perspective, why not interpret an explicit scheme for the existing 
>config .%s.classpath instead of adding a new one called remote-classpath?

> Ability to configure auxiliary services from HDFS-based JAR files
> -
>
> Key: YARN-1151
> URL: https://issues.apache.org/jira/browse/YARN-1151
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.1.0-beta, 2.9.0
>Reporter: john lilley
>Assignee: Xuan Gong
>Priority: Major
>  Labels: auxiliary-service, yarn
> Attachments: YARN-1151.1.patch, YARN-1151.branch-2.poc.patch, 
> [YARN-1151] [Design] Configure auxiliary services from HDFS-based JAR 
> files.pdf
>
>
> I would like to install an auxiliary service in Hadoop YARN without actually 
> installing files/services on every node in the system.  Discussions on the 
> user@ list indicate that this is not easily done.  The reason we want an 
> auxiliary service is that our application has some persistent-data components 
> that are not appropriate for HDFS.  In fact, they are somewhat analogous to 
> the mapper output of MapReduce's shuffle, which is what led me to 
> auxiliary-services in the first place.  It would be much easier if we could 
> just place our service's JARs in HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1151) Ability to configure auxiliary services from HDFS-based JAR files

2018-03-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410330#comment-16410330
 ] 

Jason Lowe commented on YARN-1151:
--

+1 for Robert's comment about security.  We need to be very careful about 
consuming files from distributed sources.

bq. Probably we should check the file permission of the tars is something like 
600?

We need to check for jarfile owner == NM user and {{(permbits & 0022) == 0}} 
(i.e.: it's not writable by group or other).  IMO from the NM perspective, it's 
not critical to ensure the files aren't readable by group or other.  It is 
critical to make sure the files are not writeable.




> Ability to configure auxiliary services from HDFS-based JAR files
> -
>
> Key: YARN-1151
> URL: https://issues.apache.org/jira/browse/YARN-1151
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.1.0-beta, 2.9.0
>Reporter: john lilley
>Assignee: Xuan Gong
>Priority: Major
>  Labels: auxiliary-service, yarn
> Attachments: YARN-1151.1.patch, YARN-1151.branch-2.poc.patch, 
> [YARN-1151] [Design] Configure auxiliary services from HDFS-based JAR 
> files.pdf
>
>
> I would like to install an auxiliary service in Hadoop YARN without actually 
> installing files/services on every node in the system.  Discussions on the 
> user@ list indicate that this is not easily done.  The reason we want an 
> auxiliary service is that our application has some persistent-data components 
> that are not appropriate for HDFS.  In fact, they are somewhat analogous to 
> the mapper output of MapReduce's shuffle, which is what led me to 
> auxiliary-services in the first place.  It would be much easier if we could 
> just place our service's JARs in HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7794) SLSRunner is not loading timeline service jars causing failure

2018-03-22 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-7794:
---
Attachment: YARN-7794.001.patch

> SLSRunner is not loading timeline service jars causing failure
> --
>
> Key: YARN-7794
> URL: https://issues.apache.org/jira/browse/YARN-7794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.1.0
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-7794.001.patch
>
>
> {code:java}
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         ... 13 more
> Exception in thread "pool-2-thread-390" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollector
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:443)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:321)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:641){code}
> We are getting this error while running SLS. new patch of timelineservice 
> under share/hadoop/yarn is not loaded in SLS jvm (verified from slsrunner 
> classpath)
> cc/ [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7794) SLSRunner is not loading timeline service jars causing failure

2018-03-22 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410327#comment-16410327
 ] 

Yufei Gu commented on YARN-7794:


Thanks [~vrushalic]. It works. Posted the patch v1. [~rohithsharma], can I take 
this?

> SLSRunner is not loading timeline service jars causing failure
> --
>
> Key: YARN-7794
> URL: https://issues.apache.org/jira/browse/YARN-7794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.1.0
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-7794.001.patch
>
>
> {code:java}
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         ... 13 more
> Exception in thread "pool-2-thread-390" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollector
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:443)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:321)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:641){code}
> We are getting this error while running SLS. new patch of timelineservice 
> under share/hadoop/yarn is not loaded in SLS jvm (verified from slsrunner 
> classpath)
> cc/ [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1151) Ability to configure auxiliary services from HDFS-based JAR files

2018-03-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410305#comment-16410305
 ] 

Wangda Tan commented on YARN-1151:
--

[~rkanter], 
bq. but I think it would be a good idea to enforce that the source of the jars 
we download (i.e. the HDFS location) are owned by the NM user and that they 
can't be overwritten by some other user if the admin setup the permissions 
incorrectly
that makes sense. So maybe my previous comment can be moved out of this scope:
bq. Instead of hard coding remoteFs, is it better to load FS according to URI? 
For example user can put aux tars to HDFS/S3/WASB, etc:

Probably we should check the file permission of the tars is something like 600?

> Ability to configure auxiliary services from HDFS-based JAR files
> -
>
> Key: YARN-1151
> URL: https://issues.apache.org/jira/browse/YARN-1151
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.1.0-beta, 2.9.0
>Reporter: john lilley
>Assignee: Xuan Gong
>Priority: Major
>  Labels: auxiliary-service, yarn
> Attachments: YARN-1151.1.patch, YARN-1151.branch-2.poc.patch, 
> [YARN-1151] [Design] Configure auxiliary services from HDFS-based JAR 
> files.pdf
>
>
> I would like to install an auxiliary service in Hadoop YARN without actually 
> installing files/services on every node in the system.  Discussions on the 
> user@ list indicate that this is not easily done.  The reason we want an 
> auxiliary service is that our application has some persistent-data components 
> that are not appropriate for HDFS.  In fact, they are somewhat analogous to 
> the mapper output of MapReduce's shuffle, which is what led me to 
> auxiliary-services in the first place.  It would be much easier if we could 
> just place our service's JARs in HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7574) Add support for Node Labels on Auto Created Leaf Queue Template

2018-03-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410291#comment-16410291
 ] 

Wangda Tan commented on YARN-7574:
--

Thanks [~suma.shivaprasad], 
In general the patch is very nice, some comments: 

1) Changes in SingleConstraintAppPlacementAllocator:
- Before the patch only SELF supported, after this patch several ALLOCATION_TAG 
supported. I'm not sure if we should do that since we haven't validated if 
inter-app target works for SingleConstraintAppPlacementAllocator.

2) TargetApplications:
- Removed unused methods.

3) aggregateAllocationTagsByApps/aggregateAllocationTagsByRack 
- The two methods can be merged.
- appIds != null check is redundant (since 
{{AllocationTagNamespace#getNamespaceScope}} return non-null always).

4) getNamespaceScope:
- There're several "scope" related methods, I suggest to rename them to 
get/setApplicationIds. The name of "scope" is confusing to me. 
- Following methods are used by test only, suggest to remove them:
{code}
/**
* @return true if the namespace is effective in all applications
* in this cluster. Specifically the namespace prefix should be
* "all".
*/
public boolean isGlobal() {
return AllocationTagNamespaceType.ALL.equals(getNamespaceType());
}

/**
* @return true if the namespace is effective within a single application
* by its application ID, the namespace prefix should be "app-id";
* false otherwise.
*/
public boolean isSingleInterApp() {
return AllocationTagNamespaceType.APP_ID.equals(getNamespaceType());
}

/**
* @return true if the namespace is effective to the application itself,
* the namespace prefix should be "self"; false otherwise.
*/
public boolean isIntraApp() {
return AllocationTagNamespaceType.SELF.equals(getNamespaceType());
}

/**
* @return true if the namespace is effective to all applications except
* itself, the namespace prefix should be "not-self"; false otherwise.
*/
public boolean isNotSelf() {
return AllocationTagNamespaceType.NOT_SELF.equals(getNamespaceType());
}
{code}
And actually I suggest to remove all the is... methods since we can check 
{{AllocationTagNamespaceType.XYZ.equals(getNamespaceType)}}. This is optional 
to me, depends on you.
- Several checks are check unsupported namespaces:
{code} 
// TODO Complete remove this check once we support app-label.
if (namespace.isAppLabel()) {
throw new InvalidAllocationTagsQueryException(
namespace.toString() + " is not supported yet!");
}
{code} 
I suggest to add a method like {{throwExceptionIfNamespaceTypeNotSupported}} so 
we don't need to change all the places in the future.

 

> Add support for Node Labels on Auto Created Leaf Queue Template
> ---
>
> Key: YARN-7574
> URL: https://issues.apache.org/jira/browse/YARN-7574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-7574.1.patch, YARN-7574.2.patch, YARN-7574.3.patch, 
> YARN-7574.4.patch, YARN-7574.5.patch
>
>
> YARN-7473 adds support for auto created leaf queues to inherit node labels 
> capacities from parent queues. Howebver there is no support for leaf queue 
> template to allow different configured capacities for different node labels. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7931) [atsv2 read acls] Include domain table creation as part of schema creator

2018-03-22 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410277#comment-16410277
 ] 

Haibo Chen commented on YARN-7931:
--

Thanks [~vrushalic] for the patch. The patch looks good overall. Some minor 
comments:

1) Some comments in the DomainRowKey class is still referring to app_flow_table.

2) The DomainTable is missing writers as one of its columns. Also, let's add a 
link of TimelineDomain documentation (possibly the apache official doc where it 
is introduced).

3) I recall that we want to do compression on the domain table. Is that 
something done at table creation time?

The other thing I am thinking is, whether do we want to think of the cases 
where one application modifies domains posted by another application. This 
seems allowed at the moment.

> [atsv2 read acls] Include domain table creation as part of schema creator
> -
>
> Key: YARN-7931
> URL: https://issues.apache.org/jira/browse/YARN-7931
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Vrushali C
>Priority: Major
> Attachments: YARN-7391.0001.patch
>
>
>  
> Update the schema creator to create a domain table to store timeline entity 
> domain info. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1151) Ability to configure auxiliary services from HDFS-based JAR files

2018-03-22 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410276#comment-16410276
 ] 

Robert Kanter commented on YARN-1151:
-

I'll try to take a closer look sometime next week, but I think it would be a 
good idea to enforce that the source of the jars we download (i.e. the HDFS 
location) are owned by the NM user and that they can't be overwritten by some 
other user if the admin setup the permissions incorrectly.  We're going to be 
running these jars in the NM, so we need to be careful.

> Ability to configure auxiliary services from HDFS-based JAR files
> -
>
> Key: YARN-1151
> URL: https://issues.apache.org/jira/browse/YARN-1151
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.1.0-beta, 2.9.0
>Reporter: john lilley
>Assignee: Xuan Gong
>Priority: Major
>  Labels: auxiliary-service, yarn
> Attachments: YARN-1151.1.patch, YARN-1151.branch-2.poc.patch, 
> [YARN-1151] [Design] Configure auxiliary services from HDFS-based JAR 
> files.pdf
>
>
> I would like to install an auxiliary service in Hadoop YARN without actually 
> installing files/services on every node in the system.  Discussions on the 
> user@ list indicate that this is not easily done.  The reason we want an 
> auxiliary service is that our application has some persistent-data components 
> that are not appropriate for HDFS.  In fact, they are somewhat analogous to 
> the mapper output of MapReduce's shuffle, which is what led me to 
> auxiliary-services in the first place.  It would be much easier if we could 
> just place our service's JARs in HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1151) Ability to configure auxiliary services from HDFS-based JAR files

2018-03-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410208#comment-16410208
 ] 

Wangda Tan commented on YARN-1151:
--

I would prefer to let another set of eyes to look at the patch, maybe 
[~rkanter]/[~jlowe]?

> Ability to configure auxiliary services from HDFS-based JAR files
> -
>
> Key: YARN-1151
> URL: https://issues.apache.org/jira/browse/YARN-1151
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.1.0-beta, 2.9.0
>Reporter: john lilley
>Assignee: Xuan Gong
>Priority: Major
>  Labels: auxiliary-service, yarn
> Attachments: YARN-1151.1.patch, YARN-1151.branch-2.poc.patch, 
> [YARN-1151] [Design] Configure auxiliary services from HDFS-based JAR 
> files.pdf
>
>
> I would like to install an auxiliary service in Hadoop YARN without actually 
> installing files/services on every node in the system.  Discussions on the 
> user@ list indicate that this is not easily done.  The reason we want an 
> auxiliary service is that our application has some persistent-data components 
> that are not appropriate for HDFS.  In fact, they are somewhat analogous to 
> the mapper output of MapReduce's shuffle, which is what led me to 
> auxiliary-services in the first place.  It would be much easier if we could 
> just place our service's JARs in HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1151) Ability to configure auxiliary services from HDFS-based JAR files

2018-03-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410206#comment-16410206
 ] 

Wangda Tan commented on YARN-1151:
--

Thanks [~xgong],

Inside AuxServices#init:
- Rename {{appClassPath}} to {{appLocalClassPath}}
- Instead of warning:
{code}
// load AuxiliaryService from remote class path
if (appClassPath != null && !appClassPath.isEmpty()) {
  LOG.warn("The aux serivce:" + sName + " has local classpath:"
  + appClassPath + " and remote classpath:"
  + appRemoteClassPath + ". Using remote classpath.");
}
{code}
Is it better to throw exception? 
- Do we require this directory (parent dir of all aux service classpathes) 
created prior to using this feature? 
{code} 
Path(dirsHandler.getLocalPathForWrite(
"." + Path.SEPARATOR + NM_AUX_SERVICE_DIR)
{code}
- Instead of hard coding remoteFs, is it better to load FS according to URI? 
For example user can put aux tars to HDFS/S3/WASB, etc:
{code}
  this.remoteLFS = getRemoteFileContext(this.conf);
{code}
- Could you add a comment about why {{LocalResourceVisibility.APPLICATION}} is 
being used. I can understand it is not PUBLIC for sure. But maybe {{PRIVATE}} 
should be used? 
- Maybe it's better to cache exception here:
{code}
  Path dest = new Path(download.call() + Path.SEPARATOR + "*");
{code}
And add some information, like "exception happend while downloading files for 
aux-service=x and remote-file-path=y, etc."
- Following code is commented: 
{code}
// createAuxServiceDir(className);
{code}
Add by mistake? 
- IIUC, local directories of AuxService is initialized inside 
{{ResourceLocalizationService}}, we need to make sure 
{{ResourceLocalizationService}} is initialized prior to AuxServices.


> Ability to configure auxiliary services from HDFS-based JAR files
> -
>
> Key: YARN-1151
> URL: https://issues.apache.org/jira/browse/YARN-1151
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.1.0-beta, 2.9.0
>Reporter: john lilley
>Assignee: Xuan Gong
>Priority: Major
>  Labels: auxiliary-service, yarn
> Attachments: YARN-1151.1.patch, YARN-1151.branch-2.poc.patch, 
> [YARN-1151] [Design] Configure auxiliary services from HDFS-based JAR 
> files.pdf
>
>
> I would like to install an auxiliary service in Hadoop YARN without actually 
> installing files/services on every node in the system.  Discussions on the 
> user@ list indicate that this is not easily done.  The reason we want an 
> auxiliary service is that our application has some persistent-data components 
> that are not appropriate for HDFS.  In fact, they are somewhat analogous to 
> the mapper output of MapReduce's shuffle, which is what led me to 
> auxiliary-services in the first place.  It would be much easier if we could 
> just place our service's JARs in HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8016) Refine PlacementRule interface and add a app-name queue mapping rule as an example

2018-03-22 Thread Zian Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410197#comment-16410197
 ] 

Zian Chen commented on YARN-8016:
-

[~leftnoteasy] , just fixed the checkStyle issues and resubmit the patch. Any 
suggestions for the latest one? Thanks!

> Refine PlacementRule interface and add a app-name queue mapping rule as an 
> example
> --
>
> Key: YARN-8016
> URL: https://issues.apache.org/jira/browse/YARN-8016
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8016.001.patch, YARN-8016.002.patch, 
> YARN-8016.003.patch, YARN-8016.004.patch, YARN-8016.005.patch
>
>
> After YARN-3635/YARN-6689, PlacementRule becomes a common interface which can 
> be used by scheduler and can be dynamically updated by scheduler according to 
> configs. There're some other works. 
> - There's no way to initialize PlacementRule.
> - No example of PlacementRule except the user-group mapping one.
> This JIRA is targeted to refine PlacementRule interfaces and add another 
> PlacementRule example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8016) Refine PlacementRule interface and add a app-name queue mapping rule as an example

2018-03-22 Thread Zian Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen updated YARN-8016:

Attachment: YARN-8016.005.patch

> Refine PlacementRule interface and add a app-name queue mapping rule as an 
> example
> --
>
> Key: YARN-8016
> URL: https://issues.apache.org/jira/browse/YARN-8016
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8016.001.patch, YARN-8016.002.patch, 
> YARN-8016.003.patch, YARN-8016.004.patch, YARN-8016.005.patch
>
>
> After YARN-3635/YARN-6689, PlacementRule becomes a common interface which can 
> be used by scheduler and can be dynamically updated by scheduler according to 
> configs. There're some other works. 
> - There's no way to initialize PlacementRule.
> - No example of PlacementRule except the user-group mapping one.
> This JIRA is targeted to refine PlacementRule interfaces and add another 
> PlacementRule example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8037) CGroupsResourceCalculator logs excessive warnings on container relaunch

2018-03-22 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410191#comment-16410191
 ] 

Haibo Chen commented on YARN-8037:
--

I still think we should not add back the filtering. The problem reported here 
is the effect, and I'd argue that the excessive log helped surface the cause, 
which is YARN-8035.

Any problem that is causing excessive log in CGroupResourceCalculator should be 
addressed, instead of being hidden.

> CGroupsResourceCalculator logs excessive warnings on container relaunch
> ---
>
> Key: YARN-8037
> URL: https://issues.apache.org/jira/browse/YARN-8037
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Major
>
> When a container is relaunched, the old process no longer exists. When using 
> the {{CGroupsResourceCalculator}} this results in the warning and exception 
> below being logged every second until the relaunch occurs, which is excessive 
> and filling up the logs.
> {code:java}
> 2018-03-16 14:30:33,438 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
>  Failed to parse 12844
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the 
> interim 12844
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.readTotalProcessJiffies(CGroupsResourceCalculator.java:252)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:181)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_e01_1521209613260_0002_01_02/cpuacct.stat
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more
> 2018-03-16 14:30:33,438 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
>  Failed to parse cgroups 
> /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.memsw.usage_in_bytes
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the 
> interim 12844
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.getMemorySize(CGroupsResourceCalculator.java:238)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:187)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: 
> /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.usage_in_bytes
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more{code}
> We should consider moving the exception to debug to reduce the noise at a 
> minimum. Alternatively, it may make sense to stop the existing 
> {{MonitoringThread}} during relaunch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8054) Improve robustness of the LocalDirsHandlerService MonitoringTimerTask thread

2018-03-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410180#comment-16410180
 ] 

Wangda Tan commented on YARN-8054:
--

Thanks [~jlowe], will pick it up if we do a RC1.

> Improve robustness of the LocalDirsHandlerService MonitoringTimerTask thread
> 
>
> Key: YARN-8054
> URL: https://issues.apache.org/jira/browse/YARN-8054
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Fix For: 2.10.0, 2.9.1, 2.8.4, 3.0.2, 3.1.1
>
> Attachments: YARN-8054.001.patch, YARN-8054.002.patch
>
>
> The DeprecatedRawLocalFileStatus#loadPermissionInfo can throw a 
> RuntimeException which can kill the MonitoringTimerTask thread. This can 
> leave the node is a bad state where all NM local directories are marked "bad" 
> and there is no automatic recovery. In the below can the error was "too many 
> open files",  but could be a number of other recoverable states.
> {noformat}
> 2018-03-18 02:37:42,960 [DiskHealthMonitor-Timer] ERROR 
> yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[DiskHealthMonitor-Timer,5,main] threw an Exception.
> java.lang.RuntimeException: Error while running command to get file 
> permissions : java.io.IOException: Cannot run program "ls": error=24, Too 
> many open files
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:942)
> at org.apache.hadoop.util.Shell.run(Shell.java:898)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:1307)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:1289)
> at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1078)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:697)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1556)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkAndInitializeLocalDirs(ResourceLocalizationService.java:1521)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$1.onDirsChanged(ResourceLocalizationService.java:271)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:381)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:449)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$500(LocalDirsHandlerService.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.run(LocalDirsHandlerService.java:166)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> Caused by: java.io.IOException: error=24, Too many open files
> at java.lang.UNIXProcess.forkAndExec(Native Method)
> at java.lang.UNIXProcess.(UNIXProcess.java:247)
> at java.lang.ProcessImpl.start(ProcessImpl.java:134)
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
> ... 17 more
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:737)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1556)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkAndInitializeLocalDirs(ResourceLocalizationService.java:1521)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$1.onDirsChanged(ResourceLocalizationService.java:271)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:381)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:449)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$500(LocalDirsHandlerService.java:52)
> at 
> 

[jira] [Commented] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before

2018-03-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410177#comment-16410177
 ] 

Wangda Tan commented on YARN-6629:
--

Thanks [~Tao Yang], 

I'm a bit concern about following logic:
{code} 
  if (appPlacementAllocator != null) {
return appPlacementAllocator.allocate(schedulerKey, type, node);
  } else {
return null;
  }
{code}

AppSchedulingInfo#allocate method is called in many places, the change may 
break other parts. Instead of doing this, is it sufficient to move this logic 
to FiCaSchedulerApp#apply (at the beginning under writeLock, since application 
has chance to remove the resource request between accept/apply)?

> NPE occurred when container allocation proposal is applied but its resource 
> requests are removed before
> ---
>
> Key: YARN-6629
> URL: https://issues.apache.org/jira/browse/YARN-6629
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-6629.001.patch, YARN-6629.002.patch, 
> YARN-6629.003.patch
>
>
> I wrote a test case to reproduce another problem for branch-2 and found new 
> NPE error,  log: 
> {code}
> FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in 
> handling event type NODE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516)
> at 
> org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225)
> at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31)
> at org.mockito.internal.MockHandler.handle(MockHandler.java:97)
> at 
> org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply()
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Reproduce this error in chronological order:
> 1. AM started and requested 1 container with schedulerRequestKey#1 : 
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests 
> Added schedulerRequestKey#1 into schedulerKeyToPlacementSets
> 2. Scheduler allocatd 1 container for this request and accepted the proposal
> 3. AM removed this request
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests --> 
> AppSchedulingInfo#addToPlacementSets --> 
> AppSchedulingInfo#updatePendingResources
> Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets)
> 4. Scheduler applied this proposal
> CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> 
> AppSchedulingInfo#allocate 
> Throw NPE when called 
> 

[jira] [Updated] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before

2018-03-22 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-6629:
-
Priority: Critical  (was: Major)

> NPE occurred when container allocation proposal is applied but its resource 
> requests are removed before
> ---
>
> Key: YARN-6629
> URL: https://issues.apache.org/jira/browse/YARN-6629
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-6629.001.patch, YARN-6629.002.patch, 
> YARN-6629.003.patch
>
>
> I wrote a test case to reproduce another problem for branch-2 and found new 
> NPE error,  log: 
> {code}
> FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in 
> handling event type NODE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516)
> at 
> org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225)
> at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31)
> at org.mockito.internal.MockHandler.handle(MockHandler.java:97)
> at 
> org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply()
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Reproduce this error in chronological order:
> 1. AM started and requested 1 container with schedulerRequestKey#1 : 
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests 
> Added schedulerRequestKey#1 into schedulerKeyToPlacementSets
> 2. Scheduler allocatd 1 container for this request and accepted the proposal
> 3. AM removed this request
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests --> 
> AppSchedulingInfo#addToPlacementSets --> 
> AppSchedulingInfo#updatePendingResources
> Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets)
> 4. Scheduler applied this proposal
> CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> 
> AppSchedulingInfo#allocate 
> Throw NPE when called 
> schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, 
> type, node);



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5523) Yarn running container log fetching causes OutOfMemoryError

2018-03-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-5523:

Description: 
Fetching a 256MB log from container caused OOM on client
{code:java}
$ yarn logs -applicationId application_1470931023753_0001 -log_files 
log_2016-08-11-21_3.done > logs
16/08/11 21:58:11 INFO impl.TimelineClientImpl: Timeline service address: 
http://:8188/ws/v1/timeline/
16/08/11 21:58:11 INFO client.RMProxy: Connecting to ResourceManager at 
:8050
16/08/11 21:58:12 INFO client.AHSProxy: Connecting to Application History 
server at :10200
Can not find any log file matching the pattern: [log_2016-08-11-21_3.done] for 
the container: container_e04_1470931023753_0001_01_01 within the 
application: application_1470931023753_0001
Can not find any log file matching the pattern: [log_2016-08-11-21_3.done] for 
the container: container_e04_1470931023753_0001_01_02 within the 
application: application_1470931023753_0001
Can not find any log file matching the pattern: [log_2016-08-11-21_3.done] for 
the container: container_e04_1470931023753_0001_01_03 within the 
application: application_1470931023753_0001
Can not find any log file matching the pattern: [log_2016-08-11-21_3.done] for 
the container: container_e04_1470931023753_0001_01_04 within the 
application: application_1470931023753_0001
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
  at java.util.Arrays.copyOf(Arrays.java:3332)
  at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
  at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
  at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:569)
  at java.lang.StringBuilder.append(StringBuilder.java:190)
  at 
com.sun.jersey.core.util.ReaderWriter.readFromAsString(ReaderWriter.java:172)
  at 
com.sun.jersey.core.util.ReaderWriter.readFromAsString(ReaderWriter.java:157)
  at 
com.sun.jersey.core.provider.AbstractMessageReaderWriterProvider.readFromAsString(AbstractMessageReaderWriterProvider.java:114)
  at 
com.sun.jersey.core.impl.provider.entity.StringProvider.readFrom(StringProvider.java:73)
  at 
com.sun.jersey.core.impl.provider.entity.StringProvider.readFrom(StringProvider.java:58)
  at com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:553)
  at com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:506)
  at 
org.apache.hadoop.yarn.client.cli.LogsCLI.printContainerLogsFromRunningApplication(LogsCLI.java:477)
  at 
org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:950)
  at org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:280)
  at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:102)
  at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:307)
{code}

  was:
Fetching a 256MB log from container caused OOM on client

{code}
[hive@ctr-e20-1468887904486-0007-01-03 ~]$ yarn logs -applicationId 
application_1470931023753_0001 -log_files log_2016-08-11-21_3.done > logs
16/08/11 21:58:11 INFO impl.TimelineClientImpl: Timeline service address: 
http://ctr-e20-1468887904486-0007-01-03.hwx.site:8188/ws/v1/timeline/
16/08/11 21:58:11 INFO client.RMProxy: Connecting to ResourceManager at 
ctr-e20-1468887904486-0007-01-03.hwx.site/172.27.8.192:8050
16/08/11 21:58:12 INFO client.AHSProxy: Connecting to Application History 
server at ctr-e20-1468887904486-0007-01-03.hwx.site/172.27.8.192:10200
Can not find any log file matching the pattern: [log_2016-08-11-21_3.done] for 
the container: container_e04_1470931023753_0001_01_01 within the 
application: application_1470931023753_0001
Can not find any log file matching the pattern: [log_2016-08-11-21_3.done] for 
the container: container_e04_1470931023753_0001_01_02 within the 
application: application_1470931023753_0001
Can not find any log file matching the pattern: [log_2016-08-11-21_3.done] for 
the container: container_e04_1470931023753_0001_01_03 within the 
application: application_1470931023753_0001
Can not find any log file matching the pattern: [log_2016-08-11-21_3.done] for 
the container: container_e04_1470931023753_0001_01_04 within the 
application: application_1470931023753_0001
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
  at java.util.Arrays.copyOf(Arrays.java:3332)
  at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
  at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
  at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:569)
  at java.lang.StringBuilder.append(StringBuilder.java:190)
  at 
com.sun.jersey.core.util.ReaderWriter.readFromAsString(ReaderWriter.java:172)
  at 
com.sun.jersey.core.util.ReaderWriter.readFromAsString(ReaderWriter.java:157)
  at 

[jira] [Commented] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-03-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410161#comment-16410161
 ] 

Wangda Tan commented on YARN-7872:
--

For above behavior, you can take a look at 
{{LocalityAppPlacementAllocator#updateNodeLabels}}:

{code}
  private void updateNodeLabels(ResourceRequest request) {
String resourceName = request.getResourceName();
if (resourceName.equals(ResourceRequest.ANY)) {
  ResourceRequest previousAnyRequest =
  getResourceRequest(resourceName);

  // When there is change in ANY request label expression, we should
  // update label for all resource requests already added of same
  // priority as ANY resource request.
  if ((null == previousAnyRequest) || hasRequestLabelChanged(
  previousAnyRequest, request)) {
for (ResourceRequest r : resourceRequestMap.values()) {
  if (!r.getResourceName().equals(ResourceRequest.ANY)) {
r.setNodeLabelExpression(request.getNodeLabelExpression());
  }
}
  }
} else{
  ResourceRequest anyRequest = getResourceRequest(ResourceRequest.ANY);
  if (anyRequest != null) {
request.setNodeLabelExpression(anyRequest.getNodeLabelExpression());
  }
}
  }
{code}

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-03-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410158#comment-16410158
 ] 

Wangda Tan commented on YARN-7872:
--

[~yqwang], 
{code:java}
// we don't allow specify label expression other than resourceName=ANY now
if (!ResourceRequest.ANY.equals(resReq.getResourceName())
&& labelExp != null && !labelExp.trim().isEmpty()) {
  throw new InvalidLabelResourceRequestException(
  "Invalid resource request, queue=" + queueInfo.getQueueName()
  + " specified node label expression in a "
  + "resource request has resource name = "
  + resReq.getResourceName());
}{code}

Actually this is a misunderstanding, we may need to improve comment a bit: 
Currently we support using locality + partition at the same time. But the 
partition (nodeLabelExpression) should be only set on resourceName == *. Let's 
say:
{code}
Priority = 1
  ResourceName = ANY, labelExpression = "A" 
  ResourceName = "/rack1", labelExpression = null
  ResourceName = "host1", labelExpression = null, relaxLocality = false
{code} 

In this case, the "host1" hard locality will be respected when host1 is under 
partition==A. 

We should not silently ignore fields in ResourceRequest, for your requirement, 
you may take a look at YARN-6592 (which plans to support node attribute 
YARN-3409), delayed_or (Reference to YARN-6592 design doc), etc. If user 
specified conflict requirement (like hard locality to a node, but the node is 
not under specified node partition), scheduler should either reject the 
resource request (ideal) or keep it pending (current behavior). 

Please let me know your thoughts.

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7974) Allow updating application tracking url after registration

2018-03-22 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410143#comment-16410143
 ] 

Jonathan Hung commented on YARN-7974:
-

002 fixes javac deprecation issue, and related unit tests.

findbugs warning is not related.

> Allow updating application tracking url after registration
> --
>
> Key: YARN-7974
> URL: https://issues.apache.org/jira/browse/YARN-7974
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-7974.001.patch, YARN-7974.002.patch
>
>
> Normally an application's tracking url is set on AM registration. We have a 
> use case for updating the tracking url after registration (e.g. the UI is 
> hosted on one of the containers).
> Currently we added a {{updateTrackingUrl}} API to ApplicationClientProtocol.
> We'll post the patch soon, assuming there are no issues with this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7974) Allow updating application tracking url after registration

2018-03-22 Thread Jonathan Hung (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-7974:

Attachment: YARN-7974.001.patch

> Allow updating application tracking url after registration
> --
>
> Key: YARN-7974
> URL: https://issues.apache.org/jira/browse/YARN-7974
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-7974.001.patch, YARN-7974.002.patch
>
>
> Normally an application's tracking url is set on AM registration. We have a 
> use case for updating the tracking url after registration (e.g. the UI is 
> hosted on one of the containers).
> Currently we added a {{updateTrackingUrl}} API to ApplicationClientProtocol.
> We'll post the patch soon, assuming there are no issues with this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7974) Allow updating application tracking url after registration

2018-03-22 Thread Jonathan Hung (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-7974:

Attachment: (was: YARN-7974.001.patch)

> Allow updating application tracking url after registration
> --
>
> Key: YARN-7974
> URL: https://issues.apache.org/jira/browse/YARN-7974
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-7974.001.patch, YARN-7974.002.patch
>
>
> Normally an application's tracking url is set on AM registration. We have a 
> use case for updating the tracking url after registration (e.g. the UI is 
> hosted on one of the containers).
> Currently we added a {{updateTrackingUrl}} API to ApplicationClientProtocol.
> We'll post the patch soon, assuming there are no issues with this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7974) Allow updating application tracking url after registration

2018-03-22 Thread Jonathan Hung (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-7974:

Attachment: YARN-7974.002.patch

> Allow updating application tracking url after registration
> --
>
> Key: YARN-7974
> URL: https://issues.apache.org/jira/browse/YARN-7974
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-7974.001.patch, YARN-7974.002.patch
>
>
> Normally an application's tracking url is set on AM registration. We have a 
> use case for updating the tracking url after registration (e.g. the UI is 
> hosted on one of the containers).
> Currently we added a {{updateTrackingUrl}} API to ApplicationClientProtocol.
> We'll post the patch soon, assuming there are no issues with this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8018) Yarn service: Add support for initiating service upgrade

2018-03-22 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8018:

Attachment: YARN-8018.004.patch

> Yarn service: Add support for initiating service upgrade
> 
>
> Key: YARN-8018
> URL: https://issues.apache.org/jira/browse/YARN-8018
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8018.001.patch, YARN-8018.002.patch, 
> YARN-8018.003.patch, YARN-8018.004.patch
>
>
> Add support for initiating service upgrade which includes the following main 
> changes:
>  # Service API to initiate upgrade
>  # Persist service version on hdfs
>  # Start the upgraded version of service



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5590) Add support for increase and decrease of container resources with resource profiles

2018-03-22 Thread Manikandan R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-5590:
---
Attachment: YARN-5590.002.patch

> Add support for increase and decrease of container resources with resource 
> profiles
> ---
>
> Key: YARN-5590
> URL: https://issues.apache.org/jira/browse/YARN-5590
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-5590.001.patch, YARN-5590.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8062) yarn rmadmin -getGroups returns group from which the user has been removed

2018-03-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410035#comment-16410035
 ] 

Wangda Tan commented on YARN-8062:
--

Thanks [~sunilg] for working on the patch.

I'm not sure if this patch is correct: 

For Groups#getUserToGroupsMappingServiceWithLoadedConfiguration, it always 
creates a new GROUPS instance and replace the static variable. I think we 
should avoid invoking the method, instead of changing AdminService, I think we 
can change RM#init to use 

{\{Groups getUserToGroupsMappingService(Configuration conf)}}

Instead.

Thoughts? 

 

> yarn rmadmin -getGroups returns group from which the user has been removed
> --
>
> Key: YARN-8062
> URL: https://issues.apache.org/jira/browse/YARN-8062
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Sunil G
>Priority: Critical
> Attachments: YARN-8062.001.patch
>
>
> {code:title= adding group hrt_yarn_rmadmin_test}
> sudo su - -c "groupadd hrt_yarn_rmadmin_test" root
> {code}
> {Code:title=adding user hrt_yarn_rmadmin_test to group hrt_yarn_rmadmin_test}
> sudo su - -c "useradd hrt_yarn_rmadmin_test -g hrt_yarn_rmadmin_test" root
> {Code}
> {Code:title= adding group hrt_yarn_rmadmin_test_group2 }
> sudo su - -c "groupadd hrt_yarn_rmadmin_test_group2" root
> {Code}
> {Code:title=adding user hrt_yarn_rmadmin_test to group 
> hrt_yarn_rmadmin_test_group2}
> sudo su - -c "usermod -a -G hrt_yarn_rmadmin_test_group2 
> hrt_yarn_rmadmin_test" root
> {Code}
> Refresh and getGroups
> {code}
> yarn rmadmin -refreshUserToGroupsMappings
> /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups 
> hrt_yarn_rmadmin_test
> hrt_yarn_rmadmin_test : hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2
> {code}
> Delete group hrt_yarn_rmadmin_test_group2 from user hrt_yarn_rmadmin_test  
> and refresh and do getGroups.
> We can still see group hrt_yarn_rmadmin_test_group2
> {code}
> sudo su - -c "gpasswd -d hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2" 
> root
> {code}
> Removing user hrt_yarn_rmadmin_test from group hrt_yarn_rmadmin_test_group2
> {code}
> bash-4.2$  /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin 
> -refreshUserToGroupsMappings
> /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups 
> hrt_yarn_rmadmin_test
> hrt_yarn_rmadmin_test : hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6136) YARN registry service should avoid scanning whole ZK tree for every container/application finish

2018-03-22 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi resolved YARN-6136.
--
Resolution: Invalid

This issue is caused by YARN-2571, which has not been committed and is resolved 
as won't fix.

> YARN registry service should avoid scanning whole ZK tree for every 
> container/application finish
> 
>
> Key: YARN-6136
> URL: https://issues.apache.org/jira/browse/YARN-6136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
>
> In existing registry service implementation, purge operation triggered by 
> container finish event:
> {code}
>   public void onContainerFinished(ContainerId id) throws IOException {
> LOG.info("Container {} finished, purging container-level records",
> id);
> purgeRecordsAsync("/",
> id.toString(),
> PersistencePolicies.CONTAINER);
>   }
> {code} 
> Since this happens on every container finish, so it essentially scans all (or 
> almost) ZK node from the root. 
> We have a cluster which have hundreds of ZK nodes for service registry, and 
> have 20K+ ZK nodes for other purposes. The existing implementation could 
> generate massive ZK operations and internal Java objects (RegistryPathStatus) 
> as well. The RM becomes very unstable when there're batch container finish 
> events because of full GC pause and ZK connection failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7032) [ATSv2] NPE while starting hbase co-processor when HBase authorization is enabled.

2018-03-22 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129976#comment-16129976
 ] 

Rohith Sharma K S edited comment on YARN-7032 at 3/22/18 6:17 PM:
--

Full stack trace is 
{noformat}
2017-08-17 05:53:13,535 ERROR 
[RpcServer.FifoWFPBQ.priority.handler=18,queue=0,port=16020] 
coprocessor.CoprocessorHost: The coprocessor 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunCoprocessor 
threw java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.hadoop.hbase.Tag.fromList(Tag.java:187)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunCoprocessor.prePut(FlowRunCoprocessor.java:102)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$30.call(RegionCoprocessorHost.java:885)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1660)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1734)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1692)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.prePut(RegionCoprocessorHost.java:881)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doPreMutationHook(HRegion.java:3036)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3011)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2957)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:750)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:710)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2137)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32393)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
localhost,16020,1502949162490: The coprocessor 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunCoprocessor 
threw java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.hadoop.hbase.Tag.fromList(Tag.java:187)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunCoprocessor.prePut(FlowRunCoprocessor.java:102)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$30.call(RegionCoprocessorHost.java:885)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1660)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1734)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1692)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.prePut(RegionCoprocessorHost.java:881)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doPreMutationHook(HRegion.java:3036)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3011) 
   at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2957)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:750)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:710)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2137)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32393)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
2017-08-17 05:53:13,536 FATAL 
[RpcServer.FifoWFPBQ.priority.handler=18,queue=0,port=16020] 
regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: 
[org.apache.hadoop.hbase.security.access.AccessController, 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunCoprocessor, 
org.apache.hadoop.hbase.security.token.TokenProvider, 
org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint, 
org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
{noformat}


was (Author: rohithsharma):
Full stack trace is 
{noformat}
2017-08-17 05:53:13,535 ERROR 

[jira] [Updated] (YARN-7032) [ATSv2] NPE while starting hbase co-processor when HBase authorization is enabled.

2018-03-22 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-7032:

Attachment: (was: 
hbase-yarn-regionserver-ctr-e136-1513029738776-1405-01-02.hwx.site.log)

> [ATSv2] NPE while starting hbase co-processor when HBase authorization is 
> enabled.
> --
>
> Key: YARN-7032
> URL: https://issues.apache.org/jira/browse/YARN-7032
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Fix For: 3.1.0, 2.10.0, 3.0.1
>
> Attachments: YARN-7032.01.patch
>
>
> It is seen randomly that hbase co-processor fails to start with NPE. But 
> again starting RegionServer, able to succeed in starting RS. 
> {noformat}
> 2017-08-17 05:53:13,535 ERROR 
> [RpcServer.FifoWFPBQ.priority.handler=18,queue=0,port=16020] 
> coprocessor.CoprocessorHost: The coprocessor 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunCoprocessor 
> threw java.lang.NullPointerException
> java.lang.NullPointerException
> at org.apache.hadoop.hbase.Tag.fromList(Tag.java:187)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunCoprocessor.prePut(FlowRunCoprocessor.java:102)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$30.call(RegionCoprocessorHost.java:885)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8016) Refine PlacementRule interface and add a app-name queue mapping rule as an example

2018-03-22 Thread Zian Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409990#comment-16409990
 ] 

Zian Chen commented on YARN-8016:
-

[~leftnoteasy] , sure, let me quickly fix the checkstyle issue and resubmit

> Refine PlacementRule interface and add a app-name queue mapping rule as an 
> example
> --
>
> Key: YARN-8016
> URL: https://issues.apache.org/jira/browse/YARN-8016
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8016.001.patch, YARN-8016.002.patch, 
> YARN-8016.003.patch, YARN-8016.004.patch
>
>
> After YARN-3635/YARN-6689, PlacementRule becomes a common interface which can 
> be used by scheduler and can be dynamically updated by scheduler according to 
> configs. There're some other works. 
> - There's no way to initialize PlacementRule.
> - No example of PlacementRule except the user-group mapping one.
> This JIRA is targeted to refine PlacementRule interfaces and add another 
> PlacementRule example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5590) Add support for increase and decrease of container resources with resource profiles

2018-03-22 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409962#comment-16409962
 ] 

Manikandan R commented on YARN-5590:


Had a offline discussion with [~sunilg] about the next steps. Summary is to add 
more test cases in addition to earlier patch (for example, increasing and 
decreasing the same container etc) to ensure this feature works even for 
resource types without any issues. Added patch for the same. Also YARN-4175 is 
still open and can help us in this context.

> Add support for increase and decrease of container resources with resource 
> profiles
> ---
>
> Key: YARN-5590
> URL: https://issues.apache.org/jira/browse/YARN-5590
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-5590.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7988) Refactor FSNodeLabelStore code for attributes store support

2018-03-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409928#comment-16409928
 ] 

Sunil G commented on YARN-7988:
---

[~bibinchundatt]

Thanks for the patch. As discussed offline, could we use a register api model 
to add op's to store. So we can have a simple map model instead of enum. 

> Refactor FSNodeLabelStore code for attributes store support
> ---
>
> Key: YARN-7988
> URL: https://issues.apache.org/jira/browse/YARN-7988
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-7988-YARN-3409.002.patch, 
> YARN-7988-YARN-3409.003.patch, YARN-7988-YARN-3409.004.patch, 
> YARN-7988.001.patch
>
>
> # Abstract out file FileSystemStore operation
> # Define EditLog Operartions  and Mirror operation
> # Support compatibility with old nodelabel store



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container

2018-03-22 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409923#comment-16409923
 ] 

Eric Yang commented on YARN-7654:
-

[~jlowe] YARN-6830 appears to focus on Java side of serialization.  It may take 
a while to settle.  I don't think we want to make YARN-6830 a pre-requisite of 
this jira to introduce extra delays.  For now, environment key/value pair are 
delimited by pipe character, and replaced to = prior to docker launch.  I think 
this is the least intrusive approach to solve environment variable delimiter in 
.cmd file for now with some extra comments to indicate future fix to make this 
better by changing .cmd file format to JSON.

> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8062) yarn rmadmin -getGroups returns group from which the user has been removed

2018-03-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409368#comment-16409368
 ] 

Sunil G edited comment on YARN-8062 at 3/22/18 4:55 PM:


Test steps:
{noformat}
[r...@abc.com hadoop-yarn]# sudo su - -c "groupadd testUser5" root
[r...@abc.com hadoop-yarn]# sudo su - -c "useradd testUser5 -g testUser5" root
[r...@abc.com hadoop-yarn]# sudo su - -c "groupadd Group5" root
[r...@abc.com hadoop-yarn]# sudo su - -c "usermod -a -G Group5 testUser5" root
[r...@abc.com hadoop-yarn]# groups testUser5
testUser5 : testUser5 Group5
[r...@abc.com hadoop-yarn]# sudo su - -c "yarn rmadmin 
-refreshUserToGroupsMappings" yarn
[r...@abc.com hadoop-yarn]# sudo su - -c "yarn rmadmin -getGroups testUser5" 
yarn
testUser5 : testUser5 Group5
[r...@abc.com hadoop-yarn]# sudo su - -c "gpasswd -d testUser5 Group5" root
Removing user testUser5 from group Group5
[r...@abc.com hadoop-yarn]# groups testUser5
testUser5 : testUser5
[r...@abc.com hadoop-yarn]# sudo su - -c "yarn rmadmin 
-refreshUserToGroupsMappings" yarn
[r...@abc.com hadoop-yarn]# sudo su - -c "yarn rmadmin -getGroups testUser5" 
yarn
testUser5 : testUser5{noformat}
In Summary, *sudo su - -c "/usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin 
-getGroups testUser5" yarn* and *groups testUser5* gives same o/p.

 

[~leftnoteasy] pls help to review the patch. We also had to make change in 
AdminService.java in addition to RM init call.


was (Author: sunilg):
Test steps:
{noformat}
[r...@abc.com hadoop-yarn]# sudo su - -c "groupadd testUser5" root
[r...@abc.com hadoop-yarn]# sudo su - -c "useradd testUser5 -g testUser5" root
[r...@abc.com hadoop-yarn]# sudo su - -c "groupadd Group5" root
[r...@abc.com hadoop-yarn]# sudo su - -c "usermod -a -G Group5 testUser5" root
[r...@abc.com hadoop-yarn]# groups testUser5
testUser5 : testUser5 Group5
[r...@abc.com hadoop-yarn]# sudo su - -c 
"/usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin 
-refreshUserToGroupsMappings" yarn
[r...@abc.com hadoop-yarn]# sudo su - -c 
"/usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups testUser5" yarn
testUser5 : testUser5 Group5
[r...@abc.com hadoop-yarn]# sudo su - -c "gpasswd -d testUser5 Group5" root
Removing user testUser5 from group Group5
[r...@abc.com hadoop-yarn]# groups testUser5
testUser5 : testUser5
[r...@abc.com hadoop-yarn]# sudo su - -c 
"/usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin 
-refreshUserToGroupsMappings" yarn
[r...@abc.com hadoop-yarn]# sudo su - -c 
"/usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups testUser5" yarn
testUser5 : testUser5{noformat}
In Summary, *sudo su - -c "/usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin 
-getGroups testUser5" yarn* and *groups testUser5* gives same o/p.

 

[~leftnoteasy] pls help to review the patch. We also had to make change in 
AdminService.java in addition to RM init call.

> yarn rmadmin -getGroups returns group from which the user has been removed
> --
>
> Key: YARN-8062
> URL: https://issues.apache.org/jira/browse/YARN-8062
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Sunil G
>Priority: Critical
> Attachments: YARN-8062.001.patch
>
>
> {code:title= adding group hrt_yarn_rmadmin_test}
> sudo su - -c "groupadd hrt_yarn_rmadmin_test" root
> {code}
> {Code:title=adding user hrt_yarn_rmadmin_test to group hrt_yarn_rmadmin_test}
> sudo su - -c "useradd hrt_yarn_rmadmin_test -g hrt_yarn_rmadmin_test" root
> {Code}
> {Code:title= adding group hrt_yarn_rmadmin_test_group2 }
> sudo su - -c "groupadd hrt_yarn_rmadmin_test_group2" root
> {Code}
> {Code:title=adding user hrt_yarn_rmadmin_test to group 
> hrt_yarn_rmadmin_test_group2}
> sudo su - -c "usermod -a -G hrt_yarn_rmadmin_test_group2 
> hrt_yarn_rmadmin_test" root
> {Code}
> Refresh and getGroups
> {code}
> yarn rmadmin -refreshUserToGroupsMappings
> /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups 
> hrt_yarn_rmadmin_test
> hrt_yarn_rmadmin_test : hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2
> {code}
> Delete group hrt_yarn_rmadmin_test_group2 from user hrt_yarn_rmadmin_test  
> and refresh and do getGroups.
> We can still see group hrt_yarn_rmadmin_test_group2
> {code}
> sudo su - -c "gpasswd -d hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2" 
> root
> {code}
> Removing user hrt_yarn_rmadmin_test from group hrt_yarn_rmadmin_test_group2
> {code}
> bash-4.2$  /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin 
> -refreshUserToGroupsMappings
> /usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups 
> hrt_yarn_rmadmin_test
> hrt_yarn_rmadmin_test : hrt_yarn_rmadmin_test hrt_yarn_rmadmin_test_group2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (YARN-8062) yarn rmadmin -getGroups returns group from which the user has been removed

2018-03-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409368#comment-16409368
 ] 

Sunil G edited comment on YARN-8062 at 3/22/18 4:53 PM:


Test steps:
{noformat}
[r...@abc.com hadoop-yarn]# sudo su - -c "groupadd testUser5" root
[r...@abc.com hadoop-yarn]# sudo su - -c "useradd testUser5 -g testUser5" root
[r...@abc.com hadoop-yarn]# sudo su - -c "groupadd Group5" root
[r...@abc.com hadoop-yarn]# sudo su - -c "usermod -a -G Group5 testUser5" root
[r...@abc.com hadoop-yarn]# groups testUser5
testUser5 : testUser5 Group5
[r...@abc.com hadoop-yarn]# sudo su - -c 
"/usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin 
-refreshUserToGroupsMappings" yarn
[r...@abc.com hadoop-yarn]# sudo su - -c 
"/usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups testUser5" yarn
testUser5 : testUser5 Group5
[r...@abc.com hadoop-yarn]# sudo su - -c "gpasswd -d testUser5 Group5" root
Removing user testUser5 from group Group5
[r...@abc.com hadoop-yarn]# groups testUser5
testUser5 : testUser5
[r...@abc.com hadoop-yarn]# sudo su - -c 
"/usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin 
-refreshUserToGroupsMappings" yarn
[r...@abc.com hadoop-yarn]# sudo su - -c 
"/usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups testUser5" yarn
testUser5 : testUser5{noformat}
In Summary, *sudo su - -c "/usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin 
-getGroups testUser5" yarn* and *groups testUser5* gives same o/p.

 

[~leftnoteasy] pls help to review the patch. We also had to make change in 
AdminService.java in addition to RM init call.


was (Author: sunilg):
Test steps:
{noformat}
[root@ctr-e138-1518143905142-105679-01-02 hadoop-yarn]# sudo su - -c 
"groupadd testUser5" root
[root@ctr-e138-1518143905142-105679-01-02 hadoop-yarn]# sudo su - -c 
"useradd testUser5 -g testUser5" root
[root@ctr-e138-1518143905142-105679-01-02 hadoop-yarn]# sudo su - -c 
"groupadd Group5" root
[root@ctr-e138-1518143905142-105679-01-02 hadoop-yarn]# sudo su - -c 
"usermod -a -G Group5 testUser5" root
[root@ctr-e138-1518143905142-105679-01-02 hadoop-yarn]# groups testUser5
testUser5 : testUser5 Group5
[root@ctr-e138-1518143905142-105679-01-02 hadoop-yarn]# sudo su - -c 
"/usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin 
-refreshUserToGroupsMappings" yarn
WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.
WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.
WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
18/03/22 10:36:37 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
18/03/22 10:36:37 INFO client.RMProxy: Connecting to ResourceManager at 
ctr-e138-1518143905142-105679-01-02.hwx.site/172.27.14.27:8141
[root@ctr-e138-1518143905142-105679-01-02 hadoop-yarn]# sudo su - -c 
"/usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin -getGroups testUser5" yarn
WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.
WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.
WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
18/03/22 10:36:48 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
18/03/22 10:36:48 INFO client.RMProxy: Connecting to ResourceManager at 
ctr-e138-1518143905142-105679-01-02.hwx.site/172.27.14.27:8141
testUser5 : testUser5 Group5
[root@ctr-e138-1518143905142-105679-01-02 hadoop-yarn]# sudo su - -c 
"gpasswd -d testUser5 Group5" root
Removing user testUser5 from group Group5
[root@ctr-e138-1518143905142-105679-01-02 hadoop-yarn]# groups testUser5
testUser5 : testUser5
[root@ctr-e138-1518143905142-105679-01-02 hadoop-yarn]# sudo su - -c 
"/usr/hdp/current/hadoop-yarn-client/bin/yarn rmadmin 
-refreshUserToGroupsMappings" yarn
WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.
WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.
WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
18/03/22 10:37:09 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
18/03/22 10:37:09 INFO client.RMProxy: Connecting to ResourceManager at 
ctr-e138-1518143905142-105679-01-02.hwx.site/172.27.14.27:8141
[root@ctr-e138-1518143905142-105679-01-02 hadoop-yarn]# sudo su - 

[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container

2018-03-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409856#comment-16409856
 ] 

Jason Lowe commented on YARN-7654:
--

bq. Can you share how spark does tokenizer for environment variables that we 
talked about in the meeting? The current approach is a bit adhoc in this jira, 
and I like to address it, if possible.

That is a separate, mostly unrelated thing having to do with 
Configuration.getStrings behavior when the values within the commas have commas 
themselves.  See the discussion in YARN-6830, and 
http://spark.apache.org/docs/latest/running-on-yarn.html#configuration 
discusses how Spark handles environment variables being passed by the user (see 
spark.yarn.appMasterEnv on that page).

The clean way to handle this would be to either avoid using 
ShellCommandExecutor in PrivilegedOperationExecutor and use ProcessBuilder 
directly or fix Shell#runCommand to not smash all of its arguments together 
into one big string.  It's a shame that we actually do go through all the 
motions to build up an array of separate arguments only to smash them all 
together with space separators before running the child process.  
ProcessBuilder supports an array of arguments which is what we need to avoid 
marshalling multiple values into strings and the pitfalls that arise from 
trying to decode it.  That's clearly a separate JIRA which could be a 
prerequisite if we want to leverage it here.

bq.  Let me know if there is a subset that would be interesting to have. I will 
modify code accordingly.

Yeah, we can start out with no variables being passed through from the 
nodemanager and see if that's going to be problematic for the entry point use 
cases.  In essence we're bundling "my container has an entry point" with "I 
don't need to inherit any environment variables."  We may need a way to 
distinguish those two later, but we can try combining them for now.  I would 
add a comment to the relevant code explaining why we're not putting in those 
variables to help document the decision to combine these two concepts.


> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8036) Memory Available shows a negative value after running updateNodeResource

2018-03-22 Thread Charan Hebri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charan Hebri updated YARN-8036:
---
Attachment: (was: Memory_Available.jpg)

> Memory Available shows a negative value after running updateNodeResource
> 
>
> Key: YARN-8036
> URL: https://issues.apache.org/jira/browse/YARN-8036
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Charan Hebri
>Priority: Major
>
> Running updateNodeResource for a node that already has applications running 
> on it doesn't update Memory Available with the right values. It may end up 
> showing negative values based on the requirements of the application. 
> Attached a screenshot for reference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6830) Support quoted strings for environment variables

2018-03-22 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409822#comment-16409822
 ] 

Shane Kumpf commented on YARN-6830:
---

Thanks for taking this over and the patch [~Jim_Brennan]!

{quote}I'm leaning towards Allen Wittenauer's proposal to support separate 
properties for environment variables in MapReduce.{quote}
I am also in favor of this approach for the reasons [~jlowe] mentions and to 
avoid risky regex changes.

{quote}
 All that would remain is defining the semantics of what happens when both 
mapreduce.map.env and mapreduce.map.env.SOMEVAR are defined and they conflict. 
I lean towards the latter overriding any conflicting value in the former.
{quote}
Initially I was thinking the opposite to maintain backwards compatibility, 
however, support for mapreduce.map.env.SOMEVAR would be a new feature that a 
user would need to opt in to using. If the user has gone out of their way to 
add this new configuration, it seems likely that they want the new value to be 
used, so I tend to agree with the approach you outlined [~jlowe].

> Support quoted strings for environment variables
> 
>
> Key: YARN-6830
> URL: https://issues.apache.org/jira/browse/YARN-6830
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-6830.001.patch, YARN-6830.002.patch, 
> YARN-6830.003.patch, YARN-6830.004.patch
>
>
> There are cases where it is necessary to allow for quoted string literals 
> within environment variables values when passed via the yarn command line 
> interface.
> For example, consider the follow environment variables for a MR map task.
> {{MODE=bar}}
> {{IMAGE_NAME=foo}}
> {{MOUNTS=/tmp/foo,/tmp/bar}}
> When running the MR job, these environment variables are supplied as a comma 
> delimited string.
> {{-Dmapreduce.map.env="MODE=bar,IMAGE_NAME=foo,MOUNTS=/tmp/foo,/tmp/bar"}}
> In this case, {{MOUNTS}} will be parsed and added to the task environment as 
> {{MOUNTS=/tmp/foo}}. Any attempts to quote the embedded comma separated value 
> results in quote characters becoming part of the value, and parsing still 
> breaks down at the comma.
> This issue is to allow for quoting the comma separated value (escaped double 
> or single quote). This was mentioned on YARN-4595 and will impact YARN-5534 
> as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7581) HBase filters are not constructed correctly in ATSv2

2018-03-22 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409772#comment-16409772
 ] 

Rohith Sharma K S commented on YARN-7581:
-

bq. The compilation fails even with JDK8
I agree, definitely I have done wrong branch compilation :-( I just did compile 
in branch-2 and failed. Thanks for pointing out, its high time for me to be 
more careful for branch-2 commits.

> HBase filters are not constructed correctly in ATSv2
> 
>
> Key: YARN-7581
> URL: https://issues.apache.org/jira/browse/YARN-7581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.0.0-beta1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Fix For: 3.1.0, yarn-7055, 3.2.0
>
> Attachments: YARN-7581-YARN-7055.04.patch, 
> YARN-7581-branch-2.05.patch, YARN-7581.00.patch, YARN-7581.01.patch, 
> YARN-7581.02.patch, YARN-7581.03.patch, YARN-7581.04.patch, YARN-7581.05.patch
>
>
> Post YARN-7346,
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters() and 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters() 
> start to fail when hbase.profile is set to 2.0)
> *Error Message*
>  [ERROR] Failures:
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters:1266 
> expected:<2> but was:<0>
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters:1523 
> expected:<1> but was:<0>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-22 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-8012:
-
Target Version/s: 3.0.0, 2.7.1  (was: 2.7.1, 3.0.0)
   Fix Version/s: (was: 2.7.1)

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, 
> YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6830) Support quoted strings for environment variables

2018-03-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409762#comment-16409762
 ] 

Jason Lowe commented on YARN-6830:
--

Sorry to show up late on this.  I'm leaning towards [~aw]'s proposal to support 
separate properties for environment variables in MapReduce.  To me it has the 
clearest path forward without the possibility of breaking backwards 
compatibility.  We don't touch Configuration methods and risk breaking any use 
cases that rely on the existing semantics, yet we can still support environment 
variables with commas, quotes, or any other weird characters in their value.  
With YARN-5714 ordering the environment variables based on their 
inter-references, we shouldn't need to rely on any magical ordering within a 
single property.  All that would remain is defining the semantics of what 
happens when _both_ mapreduce.map.env and mapreduce.map.env.SOMEVAR are defined 
and they conflict.  I lean towards the latter overriding any conflicting value 
in the former.

Thoughts?  Objections?

> Support quoted strings for environment variables
> 
>
> Key: YARN-6830
> URL: https://issues.apache.org/jira/browse/YARN-6830
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-6830.001.patch, YARN-6830.002.patch, 
> YARN-6830.003.patch, YARN-6830.004.patch
>
>
> There are cases where it is necessary to allow for quoted string literals 
> within environment variables values when passed via the yarn command line 
> interface.
> For example, consider the follow environment variables for a MR map task.
> {{MODE=bar}}
> {{IMAGE_NAME=foo}}
> {{MOUNTS=/tmp/foo,/tmp/bar}}
> When running the MR job, these environment variables are supplied as a comma 
> delimited string.
> {{-Dmapreduce.map.env="MODE=bar,IMAGE_NAME=foo,MOUNTS=/tmp/foo,/tmp/bar"}}
> In this case, {{MOUNTS}} will be parsed and added to the task environment as 
> {{MOUNTS=/tmp/foo}}. Any attempts to quote the embedded comma separated value 
> results in quote characters becoming part of the value, and parsing still 
> breaks down at the comma.
> This issue is to allow for quoting the comma separated value (escaped double 
> or single quote). This was mentioned on YARN-4595 and will impact YARN-5534 
> as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409737#comment-16409737
 ] 

Jason Lowe commented on YARN-8012:
--

{quote}Agree. The configuration is windows specific now is because for this 
patch, I only implement the feature for windows. We can expand it after the 
first stage. However, we should also consider for win, it depends on 
DefaultContainerExecutor. For linux, it depends on LinuxContainerExecutor.
{quote}
If we know we're going to get rid of the system-specific configs then we should 
not advertise them even in the initial commit. Otherwise we then have to deal 
with migrating users when we remove those configs. Better to simply use the 
final config names up front and document the systems that are or are not 
supported, IMHO.
{quote}Do you mean Secure Container Executor?
{quote}
No, I mean when the unmanaged container monitor is trying to connect with a 
nodemanager running in a secure cluster. In a secure cluster setup, RPC and 
REST endpoints are authenticated to prevent literally anyone from just seeing 
the information available at those APIs. How will the unmanaged container 
monitor authenticate with the REST endpoint? Is it running as the NM user and 
leveraging the NM's Kerberos keytab, using tokens, or ..? I was under the 
impression it runs as the user running the container.
{quote}Since YARN NM does not even retry the container executor process 
unexpected exit, and it happens rarely, we can ignore to retry the ucc process 
in the first stage. And if really required, we can add retry policy on the 
batch start-yarn-ucc.cmd instead of winutils.
{quote}
I'm still not following here. We're admitting this is a problem, and it has a 
fairly straightforward fix which is to have winutils relaunch the command if it 
fails. It's already launching it today, right? If so, what's the concern again? 
I did not see an explanation in the design or in this JIRA why that's not going 
to work.
{quote}Agree, but it needs outside to retry the process.
{quote}
Again, I don't understand the concern with retrying the process.
{quote}Any thoughts for the whole feature?
{quote}
As I said above, I'm OK with the overall approach of a per-container monitor, 
especially since we sort of already have one today (monitoring for container 
exit code instead of NM existence, but a per-container monitor nonetheless). 
However I'm not comfortable reviewing most of the patch since it's Windows code 
that I'm not going to be able to review properly. I'm just raising specific 
concerns about the design, how it will work on other, non-Windows systems and 
secure clusters, but I don't have major concerns about the high-level approach.

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, 
> YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8054) Improve robustness of the LocalDirsHandlerService MonitoringTimerTask thread

2018-03-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409713#comment-16409713
 ] 

Jason Lowe commented on YARN-8054:
--

Ah, sorry, I missed that branch-3.1.0 was created.  IMHO this would be a good 
addition to branch-3.1.0 as well.

> Improve robustness of the LocalDirsHandlerService MonitoringTimerTask thread
> 
>
> Key: YARN-8054
> URL: https://issues.apache.org/jira/browse/YARN-8054
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Fix For: 2.10.0, 2.9.1, 2.8.4, 3.0.2, 3.1.1
>
> Attachments: YARN-8054.001.patch, YARN-8054.002.patch
>
>
> The DeprecatedRawLocalFileStatus#loadPermissionInfo can throw a 
> RuntimeException which can kill the MonitoringTimerTask thread. This can 
> leave the node is a bad state where all NM local directories are marked "bad" 
> and there is no automatic recovery. In the below can the error was "too many 
> open files",  but could be a number of other recoverable states.
> {noformat}
> 2018-03-18 02:37:42,960 [DiskHealthMonitor-Timer] ERROR 
> yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[DiskHealthMonitor-Timer,5,main] threw an Exception.
> java.lang.RuntimeException: Error while running command to get file 
> permissions : java.io.IOException: Cannot run program "ls": error=24, Too 
> many open files
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:942)
> at org.apache.hadoop.util.Shell.run(Shell.java:898)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:1307)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:1289)
> at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1078)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:697)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1556)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkAndInitializeLocalDirs(ResourceLocalizationService.java:1521)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$1.onDirsChanged(ResourceLocalizationService.java:271)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:381)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:449)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$500(LocalDirsHandlerService.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.run(LocalDirsHandlerService.java:166)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> Caused by: java.io.IOException: error=24, Too many open files
> at java.lang.UNIXProcess.forkAndExec(Native Method)
> at java.lang.UNIXProcess.(UNIXProcess.java:247)
> at java.lang.ProcessImpl.start(ProcessImpl.java:134)
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
> ... 17 more
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:737)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1556)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkAndInitializeLocalDirs(ResourceLocalizationService.java:1521)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$1.onDirsChanged(ResourceLocalizationService.java:271)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:381)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:449)
> at 
> 

[jira] [Commented] (YARN-7581) HBase filters are not constructed correctly in ATSv2

2018-03-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409706#comment-16409706
 ] 

Jason Lowe commented on YARN-7581:
--

bq. I did compilation for branch-2 and committed. Looks my workspace was loaded 
with JDK1.8 which I missed while compiling branch-2. 

I'm building with JDK8 as well.  The compilation fails even with JDK8 because 
branch-2's version of hadoop-project/pom.xml explicitly asks the compiler to 
enforce JDK7:
{code}

1.7
[...]

  org.apache.maven.plugins
  maven-compiler-plugin
  ${maven-compiler-plugin.version}
  
${javac.version}
${javac.version}
false
  

{code}


> HBase filters are not constructed correctly in ATSv2
> 
>
> Key: YARN-7581
> URL: https://issues.apache.org/jira/browse/YARN-7581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.0.0-beta1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Fix For: 3.1.0, yarn-7055, 3.2.0
>
> Attachments: YARN-7581-YARN-7055.04.patch, 
> YARN-7581-branch-2.05.patch, YARN-7581.00.patch, YARN-7581.01.patch, 
> YARN-7581.02.patch, YARN-7581.03.patch, YARN-7581.04.patch, YARN-7581.05.patch
>
>
> Post YARN-7346,
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters() and 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters() 
> start to fail when hbase.profile is set to 2.0)
> *Error Message*
>  [ERROR] Failures:
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters:1266 
> expected:<2> but was:<0>
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters:1523 
> expected:<1> but was:<0>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8063) DistributedShellTimelinePlugin wrongly check for entityId instead of entityType

2018-03-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409653#comment-16409653
 ] 

Sunil G commented on YARN-8063:
---

Changes look straight forward. +1

> DistributedShellTimelinePlugin wrongly check for entityId instead of 
> entityType
> ---
>
> Key: YARN-8063
> URL: https://issues.apache.org/jira/browse/YARN-8063
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8063.01.patch
>
>
> DistributedShellTimelinePlugin#getTimelineEntityGroupId compare with entityId 
> rather than entityType. This causes to fail to getTimelineEntityGroupId.  
> {code}
>  public Set getTimelineEntityGroupId(String entityId,
>   String entityType) {
> if (ApplicationMaster.DSEntity.DS_CONTAINER.toString().equals(entityId)) {
>   ContainerId containerId = ContainerId.fromString(entityId);
>   ApplicationId appId = containerId.getApplicationAttemptId()
>   .getApplicationId();
>   return toEntityGroupId(appId.toString());
> }
> return null;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7986) ATSv2 REST API queries do not return results for uppercase application tags

2018-03-22 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-7986:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-7055

> ATSv2 REST API queries do not return results for uppercase application tags
> ---
>
> Key: YARN-7986
> URL: https://issues.apache.org/jira/browse/YARN-7986
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Charan Hebri
>Assignee: Charan Hebri
>Priority: Critical
> Attachments: YARN-7986.001.patch
>
>
> When applications are submitted to YARN with application tags, the tags are 
> converted to lowercase. This can be seen on the old/new UI. But using the 
> original tags for ATSv2 REST API queries do not return results as they expect 
> the query url to have the tags in lowercase. 
> This is additional work for the client because each tag needs to be 
> lowercased before running a query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8063) DistributedShellTimelinePlugin wrongly check for entityId instead of entityType

2018-03-22 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409531#comment-16409531
 ] 

Rohith Sharma K S commented on YARN-8063:
-

Failed test case _TestDistributedShell#testDSShellWithoutDomainV2DefaultFlow_ 
is flaky and will be tracked as part of YARN-7771. 

> DistributedShellTimelinePlugin wrongly check for entityId instead of 
> entityType
> ---
>
> Key: YARN-8063
> URL: https://issues.apache.org/jira/browse/YARN-8063
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8063.01.patch
>
>
> DistributedShellTimelinePlugin#getTimelineEntityGroupId compare with entityId 
> rather than entityType. This causes to fail to getTimelineEntityGroupId.  
> {code}
>  public Set getTimelineEntityGroupId(String entityId,
>   String entityType) {
> if (ApplicationMaster.DSEntity.DS_CONTAINER.toString().equals(entityId)) {
>   ContainerId containerId = ContainerId.fromString(entityId);
>   ApplicationId appId = containerId.getApplicationAttemptId()
>   .getApplicationId();
>   return toEntityGroupId(appId.toString());
> }
> return null;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8062) yarn rmadmin -getGroups returns group from which the user has been removed

2018-03-22 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409518#comment-16409518
 ] 

genericqa commented on YARN-8062:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 40s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m  4s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}115m 30s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
 |
|   | hadoop.yarn.server.resourcemanager.TestRMAdminService |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8062 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12915645/YARN-8062.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3908c59ad497 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8d898ab |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/20048/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20048/testReport/ |
| Max. 

[jira] [Commented] (YARN-8063) DistributedShellTimelinePlugin wrongly check for entityId instead of entityType

2018-03-22 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409513#comment-16409513
 ] 

genericqa commented on YARN-8063:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
8m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 18s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 12m  7s{color} 
| {color:red} hadoop-yarn-applications-distributedshell in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m 20s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.applications.distributedshell.TestDistributedShell |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8063 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12915671/YARN-8063.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 46ec33b49412 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8d898ab |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/20049/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-applications-distributedshell.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20049/testReport/ |
| Max. process+thread count | 657 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 U: 

[jira] [Updated] (YARN-8063) DistributedShellTimelinePlugin wrongly check for entityId instead of entityType

2018-03-22 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-8063:

Attachment: YARN-8063.01.patch

> DistributedShellTimelinePlugin wrongly check for entityId instead of 
> entityType
> ---
>
> Key: YARN-8063
> URL: https://issues.apache.org/jira/browse/YARN-8063
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8063.01.patch
>
>
> DistributedShellTimelinePlugin#getTimelineEntityGroupId compare with entityId 
> rather than entityType. This causes to fail to getTimelineEntityGroupId.  
> {code}
>  public Set getTimelineEntityGroupId(String entityId,
>   String entityType) {
> if (ApplicationMaster.DSEntity.DS_CONTAINER.toString().equals(entityId)) {
>   ContainerId containerId = ContainerId.fromString(entityId);
>   ApplicationId appId = containerId.getApplicationAttemptId()
>   .getApplicationId();
>   return toEntityGroupId(appId.toString());
> }
> return null;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8063) DistributedShellTimelinePlugin wrongly check for entityId instead of entityType

2018-03-22 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-8063:
---

 Summary: DistributedShellTimelinePlugin wrongly check for entityId 
instead of entityType
 Key: YARN-8063
 URL: https://issues.apache.org/jira/browse/YARN-8063
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S


DistributedShellTimelinePlugin#getTimelineEntityGroupId compare with entityId 
rather than entityType. This causes to fail to getTimelineEntityGroupId.  
{code}
 public Set getTimelineEntityGroupId(String entityId,
  String entityType) {
if (ApplicationMaster.DSEntity.DS_CONTAINER.toString().equals(entityId)) {
  ContainerId containerId = ContainerId.fromString(entityId);
  ApplicationId appId = containerId.getApplicationAttemptId()
  .getApplicationId();
  return toEntityGroupId(appId.toString());
}
return null;
  }
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >