[jira] [Commented] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing

2016-08-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427654#comment-15427654
 ] 

Hadoop QA commented on YARN-4837:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 2s 
{color} | {color:blue} The patch file was not named according to hadoop's 
naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute 
for instructions. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 11 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
59s {color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 11s 
{color} | {color:green} branch-2.8 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 24s 
{color} | {color:green} branch-2.8 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
46s {color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 42s 
{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
43s {color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
29s {color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} branch-2.8 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s 
{color} | {color:green} branch-2.8 passed with JDK v1.7.0_101 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 54s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 14s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 44s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 6 
new + 947 unchanged - 10 fixed = 953 total (was 957) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 4s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 19s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_101. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_101
 with JDK v1.7.0_101 generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) 
{color} |
| {color:green}+1{color} | 

[jira] [Commented] (YARN-4307) Display blacklisted nodes for AM container in the RM web UI

2016-08-18 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427624#comment-15427624
 ] 

Naganarasimha G R commented on YARN-4307:
-

[~vinodkv], Oops did not see your comment properly just read the uploaded patch 
... thanks for closing anyway!

> Display blacklisted nodes for AM container in the RM web UI
> ---
>
> Key: YARN-4307
> URL: https://issues.apache.org/jira/browse/YARN-4307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: AppInfoPage.png, RMappAttempt.png, 
> YARN-4307-branch-2.8.txt, YARN-4307.v1.001.patch, YARN-4307.v1.002.patch, 
> YARN-4307.v1.003.patch, YARN-4307.v1.004.patch, YARN-4307.v1.005.patch, 
> webpage.png, yarn-capacity-scheduler-debug.log
>
>
> In pseudo cluster had 2 NM's  and had launched app with incorrect 
> configuration *./hadoop org.apache.hadoop.mapreduce.SleepJob 
> -Dmapreduce.job.node-label-expression=labelX  
> -Dyarn.app.mapreduce.am.env=JAVA_HOME=/no/jvm/here  -m 5 -mt 1200*.
> First attempt failed and 2nd attempt was launched, but the application was 
> hung. In the scheduler logs found that localhost was blacklisted but in the 
> UI (app& apps listing page) count was shown as zero and as well no hosts 
> listed in the app page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4307) Display blacklisted nodes for AM container in the RM web UI

2016-08-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427584#comment-15427584
 ] 

Hadoop QA commented on YARN-4307:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 5s 
{color} | {color:blue} The patch file was not named according to hadoop's 
naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute 
for instructions. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 10s {color} 
| {color:red} YARN-4307 does not apply to branch-2.8. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12824437/YARN-4307-branch-2.8.txt
 |
| JIRA Issue | YARN-4307 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12829/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Display blacklisted nodes for AM container in the RM web UI
> ---
>
> Key: YARN-4307
> URL: https://issues.apache.org/jira/browse/YARN-4307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: AppInfoPage.png, RMappAttempt.png, 
> YARN-4307-branch-2.8.txt, YARN-4307.v1.001.patch, YARN-4307.v1.002.patch, 
> YARN-4307.v1.003.patch, YARN-4307.v1.004.patch, YARN-4307.v1.005.patch, 
> webpage.png, yarn-capacity-scheduler-debug.log
>
>
> In pseudo cluster had 2 NM's  and had launched app with incorrect 
> configuration *./hadoop org.apache.hadoop.mapreduce.SleepJob 
> -Dmapreduce.job.node-label-expression=labelX  
> -Dyarn.app.mapreduce.am.env=JAVA_HOME=/no/jvm/here  -m 5 -mt 1200*.
> First attempt failed and 2nd attempt was launched, but the application was 
> hung. In the scheduler logs found that localhost was blacklisted but in the 
> UI (app& apps listing page) count was shown as zero and as well no hosts 
> listed in the app page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-4307) Display blacklisted nodes for AM container in the RM web UI

2016-08-18 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reopened YARN-4307:
-

Reopening the issue to apply patch for 2.8...

> Display blacklisted nodes for AM container in the RM web UI
> ---
>
> Key: YARN-4307
> URL: https://issues.apache.org/jira/browse/YARN-4307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: AppInfoPage.png, RMappAttempt.png, 
> YARN-4307-branch-2.8.txt, YARN-4307.v1.001.patch, YARN-4307.v1.002.patch, 
> YARN-4307.v1.003.patch, YARN-4307.v1.004.patch, YARN-4307.v1.005.patch, 
> webpage.png, yarn-capacity-scheduler-debug.log
>
>
> In pseudo cluster had 2 NM's  and had launched app with incorrect 
> configuration *./hadoop org.apache.hadoop.mapreduce.SleepJob 
> -Dmapreduce.job.node-label-expression=labelX  
> -Dyarn.app.mapreduce.am.env=JAVA_HOME=/no/jvm/here  -m 5 -mt 1200*.
> First attempt failed and 2nd attempt was launched, but the application was 
> hung. In the scheduler logs found that localhost was blacklisted but in the 
> UI (app& apps listing page) count was shown as zero and as well no hosts 
> listed in the app page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit

2016-08-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427549#comment-15427549
 ] 

Hadoop QA commented on YARN-3388:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
47s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 3 new + 514 unchanged - 3 fixed = 517 total (was 517) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 38m 18s 
{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 54m 44s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12824402/YARN-3388-v7.patch |
| JIRA Issue | YARN-3388 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 09eecc8d8e6f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / c5c3e81 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/12828/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/12828/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12828/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Allocation in LeafQueue could get stuck because DRF calculator isn't well 
> supported when computing user-limit
> 

[jira] [Commented] (YARN-3673) Create a FailoverProxy for Federation services

2016-08-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427536#comment-15427536
 ] 

Jian He commented on YARN-3673:
---

- If FederationRMFailoverProxyProvider is used for standalone RM, we only have 
a single RM, how does it do failover ? 
- In the test, can you also validate that the RM address in the conf is updated 
properly when failover? also , add a logging in the performFailover method
bq.  For pooling, you need Singleton
My point is that it looks to me the cache in the singleton is not much used. 
Please correct me if I’m wrong. Because it always removes the object from  the 
cache before doing failover.

> Create a FailoverProxy for Federation services
> --
>
> Key: YARN-3673
> URL: https://issues.apache.org/jira/browse/YARN-3673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3673-YARN-2915-v1.patch
>
>
> This JIRA proposes creating a failover proxy for Federation based on the 
> cluster membership information in the StateStore that can be used by both 
> Router & AMRMProxy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5533) Jmx AM Used metrics for queue wrong when app submited to partition

2016-08-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427501#comment-15427501
 ] 

Hadoop QA commented on YARN-5533:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
9s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 37m 50s 
{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m 8s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12824478/YARN-5533.0002.patch |
| JIRA Issue | YARN-5533 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux a0bea8b68d2b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / c5c3e81 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/12826/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12826/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Jmx AM Used metrics for queue wrong when app submited to partition
> --
>
> Key: YARN-5533
> URL: https://issues.apache.org/jira/browse/YARN-5533
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: YARN-5533.0001.patch, YARN-5533.0002.patch
>
>
> # Configure cluster with node label 
> # Configure default and 

[jira] [Updated] (YARN-5533) Jmx AM Used metrics for queue wrong when app submited to partition

2016-08-18 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-5533:
---
Attachment: YARN-5533.0002.patch

Attaching patch after updating testcase

> Jmx AM Used metrics for queue wrong when app submited to partition
> --
>
> Key: YARN-5533
> URL: https://issues.apache.org/jira/browse/YARN-5533
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: YARN-5533.0001.patch, YARN-5533.0002.patch
>
>
> # Configure cluster with node label 
> # Configure default and root queue with label capacity 100
> # Submit application to labelx and default queue
> # Wait for application completion
> # Check queue jmx metrics
> {noformat}
>  "name" : 
> "Hadoop:service=ResourceManager,name=QueueMetrics,q0=root,q1=default",
> "modelerType" : "QueueMetrics,q0=root,q1=default",
> "tag.Queue" : "root.default",
> "tag.Context" : "yarn",
> "tag.Hostname" : "localhost",
> "running_0" : 0,
> "running_60" : 0,
> "running_300" : 0,
> "running_1440" : 0,
> "AMResourceLimitMB" : 512,
> "AMResourceLimitVCores" : 1,
> "UsedAMResourceMB" : 3072,
> "UsedAMResourceVCores" : 2,
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing

2016-08-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427400#comment-15427400
 ] 

Hadoop QA commented on YARN-4837:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 5s 
{color} | {color:blue} The patch file was not named according to hadoop's 
naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute 
for instructions. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 11 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
48s {color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s 
{color} | {color:green} branch-2.8 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s 
{color} | {color:green} branch-2.8 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
46s {color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 40s 
{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
45s {color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
33s {color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} branch-2.8 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s 
{color} | {color:green} branch-2.8 passed with JDK v1.7.0_101 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 54s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 16s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 43s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 6 
new + 947 unchanged - 10 fixed = 953 total (was 957) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 7s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 19s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_101. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_101
 with JDK v1.7.0_101 generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) 
{color} |
| {color:green}+1{color} | 

[jira] [Commented] (YARN-1547) Prevent DoS of ApplicationMasterProtocol by putting in limits

2016-08-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427327#comment-15427327
 ] 

Hadoop QA commented on YARN-1547:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
59s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 20s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 48s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 44s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 96 
new + 208 unchanged - 0 fixed = 304 total (was 208) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 182 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 48s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 13s 
{color} | {color:red} hadoop-yarn-server-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 25s {color} 
| {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 38s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 1s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 17s 
{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 42m 48s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
 |
|  |  Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.preventdos.SlotBasedAccumulator.getCounts()  At 
SlotBasedAccumulator.java:then immediately reboxed in 
org.apache.hadoop.yarn.server.preventdos.SlotBasedAccumulator.getCounts()  At 
SlotBasedAccumulator.java:[line 85] |
|  |  
org.apache.hadoop.yarn.server.preventdos.SlotBasedAccumulator.updateTotal(Object,
 long) invokes inefficient new Long(long) constructor; use Long.valueOf(long) 
instead  At SlotBasedAccumulator.java:Long(long) constructor; use 
Long.valueOf(long) instead  At SlotBasedAccumulator.java:[line 122] |
| Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields |

[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM

2016-08-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427301#comment-15427301
 ] 

Jason Lowe commented on YARN-1529:
--

bq. One comment that I have is we are adding a new API, albeit a small one, for 
YARN application developers.

That's a great point, and actually I'd be perfectly happy if this JIRA simply 
added the NM-level metric source and skipped the container API part for now.  
If we're moving towards doing this via the ATS anyway, we may not want/need the 
env variable API.  It might be worth splitting the patch so the less 
controversial NM-level metrics can go in earlier and we can discuss the 
per-container metrics API in another.  If the consensus is that this patch 
should include the per-container metrics API via the container env as well then 
I'm OK with that too.  I also agree that hiding the implementation details of 
that API would be important, whether that's in this JIRA or another.

Either way the patch needs an update, and please feel free to do so.

> Add Localization overhead metrics to NM
> ---
>
> Key: YARN-1529
> URL: https://issues.apache.org/jira/browse/YARN-1529
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Gera Shegalov
>Assignee: Chris Trezzo
> Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, 
> YARN-1529.v03.patch, YARN-1529.v04.patch
>
>
> Users are often unaware of localization cost that their jobs incur. To 
> measure effectiveness of localization caches it is necessary to expose the 
> overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be 
> fetched from a central location, typically on HDFS, that results in a number 
> of download requests for the files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache 
> misses.
> LocalizedFilesCached: total localization requests that were served from local 
> caches. Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that 
> were served out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container 
> to go from ResourceRequestTransition to LocalizedTransition



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4685) AM blacklisting result in application to get hanged

2016-08-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427298#comment-15427298
 ] 

Wangda Tan commented on YARN-4685:
--

[~rohithsharma],

Discussed with [~vinodkv] about this, one solution is to update 
DEFAULT_AM_BLACKLIST_ENABLED to false, and update default threshold from .8 to 
.2. We can open a separate JIRA to have a longer term fix for this issue. 
Sounds like a plan?

> AM blacklisting result in application to get hanged
> ---
>
> Key: YARN-4685
> URL: https://issues.apache.org/jira/browse/YARN-4685
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-4685-workaround.patch
>
>
> AM blacklist addition or removal is updated only when RMAppAttempt is 
> scheduled i.e {{RMAppAttemptImpl#ScheduleTransition#transition}}. But once 
> attempt is scheduled if there is any removeNode/addNode in cluster then this 
> is not updated to {{BlackListManager#refreshNodeHostCount}}. This leads 
> BlackListManager to operate on stale NM's count. And application is in 
> ACCEPTED state and wait forever even if blacklisted nodes are reconnected 
> with clearing disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5202) Dynamic Overcommit of Node Resources - POC

2016-08-18 Thread Ha Son Hai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427275#comment-15427275
 ] 

Ha Son Hai commented on YARN-5202:
--

Hi [~nroberts]!

Would you mind if I ask for a little bit more on the explanation of this 
parameter "RM_OVERCOMMIT_MEM_MAX_FACTOR"? I set it to a different value, but 
the log of the ResourceManager reports that value 0. It's the same for 
vcoreFactor. I wonder if it's a bug?

By the way, is RM_OVERCOMMIT_MEM_MAX_FACTOR redundant with 
RM_OVERCOMMIT_MEM_INCREMENT? one is in ratio and the other is in MBs?
In the case that I have node with 32Gigs RAM, if I set 
RM_OVERCOMMIT_MEM_MAX_FACTOR to 2, does it mean that I can over-commit for 2 
times the total memory that I have (in case the utilization is very low) that 
is 64Gigs?

Sorry if the question is a "basic" or a "stupid" one. I have just started to 
work with the code of HADOOP so there is a lot of things that's new to me.
Thanks a lot for your clarification. I attached below your explanation for the 
parameters. 

+ RM_OVERCOMMIT_MEM_INCREMENT: Specifies the largest memory increment in 
megabytes when enlarging a node's total resource for overcommit. Once 
incremented at least one container must be launched on the node to increase the 
value further. A value <= 0 will disable memory overcommit.
+ RM_OVERCOMMIT_MEM_MAX_FACTOR
Maximum amount of memory to overcommit as a factor of the total node 
memory. A value <= 0 disables memory overcommit.


> Dynamic Overcommit of Node Resources - POC
> --
>
> Key: YARN-5202
> URL: https://issues.apache.org/jira/browse/YARN-5202
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-5202-branch2.7-uber.patch, YARN-5202.patch
>
>
> This Jira is to present a proof-of-concept implementation (collaboration 
> between [~jlowe] and myself) of a dynamic over-commit implementation in YARN. 
>  The type of over-commit implemented in this jira is similar to but not as 
> full-featured as what's being implemented via YARN-1011. YARN-1011 is where 
> we see ourselves heading but we needed something quick and completely 
> transparent so that we could test it at scale with our varying workloads 
> (mainly MapReduce, Spark, and Tez). Doing so has shed some light on how much 
> additional capacity we can achieve with over-commit approaches, and has 
> fleshed out some of the problems these approaches will face.
> Primary design goals:
> - Avoid changing protocols, application frameworks, or core scheduler logic,  
> - simply adjust individual nodes' available resources based on current node 
> utilization and then let scheduler do what it normally does
> - Over-commit slowly, pull back aggressively - If things are looking good and 
> there is demand, slowly add resource. If memory starts to look over-utilized, 
> aggressively reduce the amount of over-commit.
> - Make sure the nodes protect themselves - i.e. if memory utilization on a 
> node gets too high, preempt something - preferably something from a 
> preemptable queue
> A patch against trunk will be attached shortly.  Some notes on the patch:
> - This feature was originally developed against something akin to 2.7.  Since 
> the patch is mainly to explain the approach, we didn't do any sort of testing 
> against trunk except for basic build and basic unit tests
> - The key pieces of functionality are in {{SchedulerNode}}, 
> {{AbstractYarnScheduler}}, and {{NodeResourceMonitorImpl}}. The remainder of 
> the patch is mainly UI, Config, Metrics, Tests, and some minor code 
> duplication (e.g. to optimize node resource changes we treat an over-commit 
> resource change differently than an updateNodeResource change - i.e. 
> remove_node/add_node is just too expensive for the frequency of over-commit 
> changes)
> - We only over-commit memory at this point. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1547) Prevent DoS of ApplicationMasterProtocol by putting in limits

2016-08-18 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-1547:
---
Attachment: YARN-1547.v1.patch

> Prevent DoS of ApplicationMasterProtocol by putting in limits
> -
>
> Key: YARN-1547
> URL: https://issues.apache.org/jira/browse/YARN-1547
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-1547.pdf, YARN-1547.prototype.v0.patch, 
> YARN-1547.v0.pdf, YARN-1547.v1.patch
>
>
> Points of DoS in ApplicationMasterProtocol
>  - Host and trackingURL in RegisterApplicationMasterRequest
>  - Diagnostics, final trackingURL in FinishApplicationMasterRequest
>  - Unlimited number of resourceAsks, containersToBeReleased and 
> resourceBlacklistRequest in AllocateRequest
> -- Unbounded number of priorities and/or resourceRequests in each ask.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-1529) Add Localization overhead metrics to NM

2016-08-18 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427222#comment-15427222
 ] 

Chris Trezzo edited comment on YARN-1529 at 8/18/16 9:52 PM:
-

Thanks [~jlowe] for the rebased patch! I agree that it would be nice to not tie 
these localization metrics to ATS so that more people can leverage them earlier.

One comment that I have is we are adding a new API, albeit a small one, for 
YARN application developers. This API is the serialized data we put into the 
environment variable (LOCALIZATION_COUNTERS) to communicate the localization 
statistics to the application-level process. Currently, if a YARN developer 
wants to leverage these metrics, they have to figure out how information is 
serialized into this env var and hope it doesn't change. What do you think 
about adding a small class/method that defines this a little more formally and 
contains the deserialization logic? That way if another application, let's say 
TEZ, wants to leverage this data, they can just call the new deserialize method.

If you think this is a good idea, I can post another patch with the added 
class. Thanks!


was (Author: ctrezzo):
Thanks [~jlowe] for the rebased patch! I agree that it would be nice to not tie 
these localization metrics to ATS so that more people can leverage them earlier.

One comment that I have is we are adding a new API, albeit a small one, for 
YARN application developers. This API is the serialized data we put into the 
environment variable (LOCALIZATION_COUNTERS) to communicate the localization 
statistics to the application-level container. Currently, if a YARN developer 
wants to leverage these metrics, they have to figure out how information is 
serialized into this env var and hope it doesn't change. What do you think 
about adding a small class/method that defines this a little more formally and 
contains the deserialization logic? That way if another application, let's say 
TEZ, wants to leverage this data, they can just call the new deserialize method.

If you think this is a good idea, I can post another patch with the added 
class. Thanks!

> Add Localization overhead metrics to NM
> ---
>
> Key: YARN-1529
> URL: https://issues.apache.org/jira/browse/YARN-1529
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Gera Shegalov
>Assignee: Chris Trezzo
> Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, 
> YARN-1529.v03.patch, YARN-1529.v04.patch
>
>
> Users are often unaware of localization cost that their jobs incur. To 
> measure effectiveness of localization caches it is necessary to expose the 
> overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be 
> fetched from a central location, typically on HDFS, that results in a number 
> of download requests for the files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache 
> misses.
> LocalizedFilesCached: total localization requests that were served from local 
> caches. Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that 
> were served out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container 
> to go from ResourceRequestTransition to LocalizedTransition



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM

2016-08-18 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427222#comment-15427222
 ] 

Chris Trezzo commented on YARN-1529:


Thanks [~jlowe] for the rebased patch! I agree that it would be nice to not tie 
these localization metrics to ATS so that more people can leverage them earlier.

One comment that I have is we are adding a new API, albeit a small one, for 
YARN application developers. This API is the serialized data we put into the 
environment variable (LOCALIZATION_COUNTERS) to communicate the localization 
statistics to the application-level container. Currently, if a YARN developer 
wants to leverage these metrics, they have to figure out how information is 
serialized into this env var and hope it doesn't change. What do you think 
about adding a small class/method that defines this a little more formally and 
contains the deserialization logic? That way if another application, let's say 
TEZ, wants to leverage this data, they can just call the new deserialize method.

If you think this is a good idea, I can post another patch with the added 
class. Thanks!

> Add Localization overhead metrics to NM
> ---
>
> Key: YARN-1529
> URL: https://issues.apache.org/jira/browse/YARN-1529
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Gera Shegalov
>Assignee: Chris Trezzo
> Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, 
> YARN-1529.v03.patch, YARN-1529.v04.patch
>
>
> Users are often unaware of localization cost that their jobs incur. To 
> measure effectiveness of localization caches it is necessary to expose the 
> overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be 
> fetched from a central location, typically on HDFS, that results in a number 
> of download requests for the files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache 
> misses.
> LocalizedFilesCached: total localization requests that were served from local 
> caches. Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that 
> were served out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container 
> to go from ResourceRequestTransition to LocalizedTransition



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing

2016-08-18 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-4837:
--
Attachment: YARN-4837-branch-2.8.txt

Uploading a 2.8 patch - fixing conflicts, test-issues etc.

> User facing aspects of 'AM blacklisting' feature need fixing
> 
>
> Key: YARN-4837
> URL: https://issues.apache.org/jira/browse/YARN-4837
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: YARN-4837-20160515.txt, YARN-4837-20160520.1.txt, 
> YARN-4837-20160520.txt, YARN-4837-20160527.txt, YARN-4837-20160604.txt, 
> YARN-4837-branch-2.005.patch, YARN-4837-branch-2.8.txt
>
>
> Was reviewing the user-facing aspects that we are releasing as part of 2.8.0.
> Looking at the 'AM blacklisting feature', I see several things to be fixed 
> before we release it in 2.8.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4307) Display blacklisted nodes for AM container in the RM web UI

2016-08-18 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-4307:
--
Attachment: YARN-4307-branch-2.8.txt

Uploading a patch that I just pushed in for branch-2.8. This patch is needed by 
YARN-4837 on branch-2.8.

> Display blacklisted nodes for AM container in the RM web UI
> ---
>
> Key: YARN-4307
> URL: https://issues.apache.org/jira/browse/YARN-4307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: AppInfoPage.png, RMappAttempt.png, 
> YARN-4307-branch-2.8.txt, YARN-4307.v1.001.patch, YARN-4307.v1.002.patch, 
> YARN-4307.v1.003.patch, YARN-4307.v1.004.patch, YARN-4307.v1.005.patch, 
> webpage.png, yarn-capacity-scheduler-debug.log
>
>
> In pseudo cluster had 2 NM's  and had launched app with incorrect 
> configuration *./hadoop org.apache.hadoop.mapreduce.SleepJob 
> -Dmapreduce.job.node-label-expression=labelX  
> -Dyarn.app.mapreduce.am.env=JAVA_HOME=/no/jvm/here  -m 5 -mt 1200*.
> First attempt failed and 2nd attempt was launched, but the application was 
> hung. In the scheduler logs found that localhost was blacklisted but in the 
> UI (app& apps listing page) count was shown as zero and as well no hosts 
> listed in the app page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4307) Display blacklisted nodes for AM container in the RM web UI

2016-08-18 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-4307:
--
Fix Version/s: (was: 2.9.0)
   2.8.0

> Display blacklisted nodes for AM container in the RM web UI
> ---
>
> Key: YARN-4307
> URL: https://issues.apache.org/jira/browse/YARN-4307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: AppInfoPage.png, RMappAttempt.png, 
> YARN-4307-branch-2.8.txt, YARN-4307.v1.001.patch, YARN-4307.v1.002.patch, 
> YARN-4307.v1.003.patch, YARN-4307.v1.004.patch, YARN-4307.v1.005.patch, 
> webpage.png, yarn-capacity-scheduler-debug.log
>
>
> In pseudo cluster had 2 NM's  and had launched app with incorrect 
> configuration *./hadoop org.apache.hadoop.mapreduce.SleepJob 
> -Dmapreduce.job.node-label-expression=labelX  
> -Dyarn.app.mapreduce.am.env=JAVA_HOME=/no/jvm/here  -m 5 -mt 1200*.
> First attempt failed and 2nd attempt was launched, but the application was 
> hung. In the scheduler logs found that localhost was blacklisted but in the 
> UI (app& apps listing page) count was shown as zero and as well no hosts 
> listed in the app page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3446) FairScheduler headroom calculation should exclude nodes in the blacklist

2016-08-18 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3446:
--
Fix Version/s: (was: 2.9.0)
   2.8.0

> FairScheduler headroom calculation should exclude nodes in the blacklist
> 
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.8.0
>
> Attachments: YARN-3446-branch-2.8.txt, YARN-3446.000.patch, 
> YARN-3446.001.patch, YARN-3446.002.patch, YARN-3446.003.patch, 
> YARN-3446.004.patch, YARN-3446.005.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3446) FairScheduler headroom calculation should exclude nodes in the blacklist

2016-08-18 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3446:
--
Attachment: YARN-3446-branch-2.8.txt

Uploading a patch that I just pushed in for branch-2.8. This patch is needed by 
YARN-4837-> YARN-4307 on branch-2.8.

> FairScheduler headroom calculation should exclude nodes in the blacklist
> 
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.8.0
>
> Attachments: YARN-3446-branch-2.8.txt, YARN-3446.000.patch, 
> YARN-3446.001.patch, YARN-3446.002.patch, YARN-3446.003.patch, 
> YARN-3446.004.patch, YARN-3446.005.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5049) Extend NMStateStore to save queued container information

2016-08-18 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-5049:
-
Hadoop Flags: Incompatible change

Sorry for getting here late, as I just ran into this by accident.  This breaks 
rolling upgrades because it changes the major version of the NM state store 
schema.  Therefore when a new NM comes up on an old state store it crashes like 
this:
{noformat}
2016-08-18 19:37:26,713 INFO  [main] service.AbstractService 
(AbstractService.java:noteFailure(272)) - Service NodeManager failed in state 
INITED; cause: org.apache.hadoop.service.ServiceStateException: 
java.io.IOException: Incompatible version for NM state: expecting NM state 
version 2.0, but loading version 1.0
org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
Incompatible version for NM state: expecting NM state version 2.0, but loading 
version 1.0
at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:236)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:300)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:762)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:823)
Caused by: java.io.IOException: Incompatible version for NM state: expecting NM 
state version 2.0, but loading version 1.0
at 
org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1226)
at 
org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1085)
at 
org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:249)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
... 5 more
{noformat}

Breaking the compatibility, while annoying, is not terrible if this just went 
into trunk.  However this was also put into branch-2, therefore this breaks 
rolling upgrades from 2.8->2.9.  As such this should be reverted from branch-2 
until there's a migration path from schema 1 to schema 2 that doesn't break the 
NM startup.


> Extend NMStateStore to save queued container information
> 
>
> Key: YARN-5049
> URL: https://issues.apache.org/jira/browse/YARN-5049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Fix For: 2.9.0
>
> Attachments: YARN-5049.001.patch, YARN-5049.002.patch, 
> YARN-5049.003.patch
>
>
> This JIRA is about extending the NMStateStore to save queued container 
> information whenever a new container is added to the NM queue. 
> It also removes the information from the state store when the queued 
> container starts its execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3673) Create a FailoverProxy for Federation services

2016-08-18 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426990#comment-15426990
 ] 

Subru Krishnan commented on YARN-3673:
--

Thanks [~jianhe] for reviewing the patch and your detailed feedback.

 bq. I think for this patch the FEDERATION_FAILOVER_ENABLED and 
FEDERATION_ENABLED flag are not needed

There is a nuance here that unlike the other failover providers which are only 
initialized when RM HA is enabled,{{FederationRMFailoverProxyProvider}} is used 
with both standalone RM and RM HA as the main purpose is to bypass _yarn-site_ 
and obtain the RM connection address from {{FederationStateStore}}. 
The FEDERATION_ENABLED flag indicates that irrespective of whether RM has HA 
enabled or not, we should configure the {{FederationRMFailoverProxyProvider}}.
When we have both Federation & RM HA, the FEDERATION_FAILOVER_ENABLED indicates 
that we should configure failover {{RetryPolicies}} as otherwise normal ones 
(like _retryForeverWithFixedSleep_) would be initialized. This is needed 
because we override HA enabled in conf as otherwise that triggers the code path 
of looking up RMs from xml using rmId index.

bq. Why do we need to store the tokens and then re-add the tokens?

This is to cover a corner case scenario which we hit during testing. IIRC, it 
was required for the UnmanagedAMs (used for transparently spanning a job across 
clusters) to reconnect to RM post failover. Without this, I was getting 
authentication error with missing token from the RM primary RM.
Do you think I should leave it or add it when we reach the e2e testing phase?

bq. It appears to me the FederationStateStoreFacade is not needed for this patch

Since we are using in-memory store for the test case, you are right. The issue 
is with the SQL based store (YARN-3663) which is what we want to deploy in 
production. We did some simulated scale testing with 10s of thousands of nodes 
and there was performance degradation without connection pooling. For pooling, 
you need Singleton as otherwise every proxy instance would create its own 
private pool and that's what {{FederationStateStoreFacade}} provides.

bq. Test: IIUC, I don't actually see FederationRMFailoverProxyProvider 
configured in the test, is it tested?

Yes! In the test, I create a proxy using {{FederationProxyProviderUtil}}:
{code}
ApplicationClientProtocol client = FederationProxyProviderUtil
.createRMProxy(conf, ApplicationClientProtocol.class, 
subClusterId,
UserGroupInformation.getCurrentUser());
{code}

And {{FederationProxyProviderUtil}} always sets 
{{FederationRMFailoverProxyProvider}} as the failover provider:
{code}
// updating the conf with the refreshed RM addresses as proxy creations
  // are based out of conf
  private static void updateConf(Configuration conf,
  SubClusterId subClusterId) {
conf.set(YarnConfiguration.FEDERATION_SUBCLUSTER_ID, 
subClusterId.getId());
conf.setBoolean(YarnConfiguration.FEDERATION_ENABLED, true);
conf.setClass(YarnConfiguration.CLIENT_FAILOVER_PROXY_PROVIDER,
FederationRMFailoverProxyProvider.class, 
RMFailoverProxyProvider.class);
// we will failover using FederationStateStore so set that and skip
// traditional HA
if (HAUtil.isHAEnabled(conf)) {
  conf.setBoolean(YarnConfiguration.FEDERATION_FAILOVER_ENABLED, 
true);
  conf.setBoolean(YarnConfiguration.RM_HA_ENABLED, false);
}
{code}

This also is how FEDERATION_FAILOVER_ENABLED and FEDERATION_ENABLED flags are 
used.

And I had to do multiple rounds of debugging and step through 
{{FederationRMFailoverProxyProvider}} so many times while writing the tests, so 
most definitely it's being exercised :).


 



> Create a FailoverProxy for Federation services
> --
>
> Key: YARN-3673
> URL: https://issues.apache.org/jira/browse/YARN-3673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3673-YARN-2915-v1.patch
>
>
> This JIRA proposes creating a failover proxy for Federation based on the 
> cluster membership information in the StateStore that can be used by both 
> Router & AMRMProxy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5520) [Capacity Scheduler] Change the logic for when to trigger user/group mappings to queues

2016-08-18 Thread Min Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426845#comment-15426845
 ] 

Min Shen commented on YARN-5520:


[~Ying Zhang],

Your proposal does add more flexibility to queue mappings.
However, my only concern is related to the added complexity for admins to 
configure these mapping rules.
If the secondary queues for most users/groups are the same, it seems reasonable 
to just use {{yarn.scheduler.capacity.queue-mappings.disabled.queues}}.
If the secondary queues vary a lot between users/groups, it might be difficult 
for admins to configure these rules in the first place.

> [Capacity Scheduler] Change the logic for when to trigger user/group mappings 
> to queues
> ---
>
> Key: YARN-5520
> URL: https://issues.apache.org/jira/browse/YARN-5520
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.0, 2.7.0, 2.6.1
>Reporter: Min Shen
>
> In YARN-2411, the feature in Capacity Scheduler to support user/group based 
> mappings to queues was introduced.
> In the original implementation, the configuration key 
> {{yarn.scheduler.capacity.queue-mappings-override.enable}} was added to 
> control when to enable overriding user requested queues.
> However, even if this configuration is set to false, queue overriding could 
> still happen if the user didn't request for any specific queue or choose to 
> simply submit his job to "default" queue, according to the following if 
> condition which triggers queue overriding:
> {code}
> if (queueName.equals(YarnConfiguration.DEFAULT_QUEUE_NAME)
>   || overrideWithQueueMappings)
> {code}
> This logic does not seem very reasonable, as there's no way to fully disable 
> queue overriding when mappings are configured inside capacity-scheduler.xml.
> In addition, in our environment, we have setup a few organization dedicated 
> queues as well as some "adhoc" queues. The organization dedicated queues have 
> better resource guarantees and we want to be able to route users to the 
> corresponding organization queues. On the other hand, the "adhoc" queues have 
> less resource guarantees but everyone can use it to get some opportunistic 
> resources when the cluster is free.
> The current logic will also prevent this type of use cases as when you enable 
> queue overriding, users cannot use these "adhoc" queues any more. They will 
> always be routed to the dedicated organization queues.
> To address the above 2 issues, I propose to change the implementation so that:
> * Admin can fully disable queue overriding even if mappings are already 
> configured.
> * Admin have finer grained control to cope queue overriding with the above 
> mentioned organization/adhoc queue setups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5524) Yarn live log aggregation does not throw if command line arg is wrong

2016-08-18 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426839#comment-15426839
 ] 

Xuan Gong commented on YARN-5524:
-

[~vrushalic]

Instead of doing
{code}
String unknownOptions = StringUtils.join(commandLine.getArgs(), ' ');
if (StringUtils.isNotBlank(unknownOptions)) {
  System.err.println("Invalid option(s) specified " + unknownOptions);
  return -1;
}
{code}

we could change 
{code}
CommandLine commandLine = parser.parse(opts, args, true);
to
CommandLine commandLine = parser.parse(opts, args, false);
{code}
So, the GnuParser would detect invalid option by itself.

> Yarn live log aggregation does not throw if command line arg is wrong
> -
>
> Key: YARN-5524
> URL: https://issues.apache.org/jira/browse/YARN-5524
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.9.0
>Reporter: Prasanth Jayachandran
>Assignee: Vrushali C
> Attachments: YARN-5524.001.patch, YARN-5524.002.patch
>
>
> When we used wrong command line arg for specify log file pattern, yarn did 
> not throw any exception instead it pulled entire log onto the client.
> {code}
> [hive@ctr-e20-1468887904486-0007-01-03 ~]$ yarn logs -applicationId 
> application_1470931023753_0001 -logFiles hive.*2016-08-11-21.* > logs
> {code}
> NOTE: we are using -logFiles instead of -log_files
> This query should have failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5524) Yarn live log aggregation does not throw if command line arg is wrong

2016-08-18 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426832#comment-15426832
 ] 

Naganarasimha G R commented on YARN-5524:
-

[~vrushalic],
IMO i felt it should be CommandLineParser's responsibility to detect invalid 
option, if validation is not happening would it not be better to support it 
there so that its applicable across all the CLI Command?


> Yarn live log aggregation does not throw if command line arg is wrong
> -
>
> Key: YARN-5524
> URL: https://issues.apache.org/jira/browse/YARN-5524
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.9.0
>Reporter: Prasanth Jayachandran
>Assignee: Vrushali C
> Attachments: YARN-5524.001.patch, YARN-5524.002.patch
>
>
> When we used wrong command line arg for specify log file pattern, yarn did 
> not throw any exception instead it pulled entire log onto the client.
> {code}
> [hive@ctr-e20-1468887904486-0007-01-03 ~]$ yarn logs -applicationId 
> application_1470931023753_0001 -logFiles hive.*2016-08-11-21.* > logs
> {code}
> NOTE: we are using -logFiles instead of -log_files
> This query should have failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2499) Respect labels in preemption policy of fair scheduler

2016-08-18 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2499:

Assignee: (was: Naganarasimha G R)

> Respect labels in preemption policy of fair scheduler
> -
>
> Key: YARN-2499
> URL: https://issues.apache.org/jira/browse/YARN-2499
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit

2016-08-18 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-3388:
-
Attachment: YARN-3388-v7.patch

Thanks [~leftnoteasy] for the comments. I took both suggestions in latest patch 
and cleaned up some remaining checkstyle issues.


> Allocation in LeafQueue could get stuck because DRF calculator isn't well 
> supported when computing user-limit
> -
>
> Key: YARN-3388
> URL: https://issues.apache.org/jira/browse/YARN-3388
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.8.0, 2.7.2, 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch, 
> YARN-3388-v2.patch, YARN-3388-v3.patch, YARN-3388-v4.patch, 
> YARN-3388-v5.patch, YARN-3388-v6.patch, YARN-3388-v7.patch
>
>
> When there are multiple active users in a queue, it should be possible for 
> those users to make use of capacity up-to max_capacity (or close). The 
> resources should be fairly distributed among the active users in the queue. 
> This works pretty well when there is a single resource being scheduled.   
> However, when there are multiple resources the situation gets more complex 
> and the current algorithm tends to get stuck at Capacity. 
> Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5445) Log aggregation configured to different namenode can fail fast

2016-08-18 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426700#comment-15426700
 ] 

Daniel Templeton commented on YARN-5445:


Moving {{DFS_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY}} to {{CommonConfigurationKeys}} 
doesn't sound like the right solution.  If the log aggregation code is going to 
depend on specific behavior of HDFS why shouldn't the project depend on 
hadoop-hdfs?

As you pointed out, there are other JIRAs that are attempting to resolve the 
base issue that is prompting your unusual cluster config.  I don't think 
introducing a new configuration parameter to deal with a temporary issue is a 
good idea.  Config params, like diamonds, are forever, and we already have 
entirely too many.

I haven't looked at the code for log aggregation.  What happens what the DFS 
connection fails?

> Log aggregation configured to different namenode can fail fast
> --
>
> Key: YARN-5445
> URL: https://issues.apache.org/jira/browse/YARN-5445
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Chackaravarthy
> Attachments: YARN-5445-1.patch
>
>
> Log aggregation is enabled and configured to write applogs to different 
> cluster or different namespace (NN federation). In these cases, would like to 
> have some configs on attempts or retries to fail fast in case the other 
> cluster is completely down.
> Currently it takes default {{dfs.client.failover.max.attempts}} as 15 and 
> hence adding a latency of 2 to 2.5 mins in each container launch (per node 
> manager).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5535) Remove RMDelegationToken make resourcemanager recovery very slow

2016-08-18 Thread tangshangwen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426631#comment-15426631
 ] 

tangshangwen commented on YARN-5535:


I'm sorry, it is after recovery ,  and i found even queue size very large
{noformat}
[2016-08-12T19:43:25.986+08:00] [INFO] 
yarn.event.AsyncDispatcher.handle(AsyncDispatcher.java:235) [AsyncDispatcher 
event handler] : Size of event-queue is 643000
[2016-08-12T19:43:25.986+08:00] [INFO] 
yarn.event.AsyncDispatcher.handle(AsyncDispatcher.java:235) [AsyncDispatcher 
event handler] : Size of event-queue is 644000
[2016-08-12T19:43:25.986+08:00] [INFO] 
yarn.event.AsyncDispatcher.handle(AsyncDispatcher.java:235) [AsyncDispatcher 
event handler] : Size of event-queue is 645000
[2016-08-12T19:43:25.986+08:00] [INFO] 
yarn.event.AsyncDispatcher.handle(AsyncDispatcher.java:235) [AsyncDispatcher 
event handler] : Size of event-queue is 646000
{noformat} 

> Remove RMDelegationToken make resourcemanager recovery very slow
> 
>
> Key: YARN-5535
> URL: https://issues.apache.org/jira/browse/YARN-5535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
>
> In our cluster, I found that when restart RM, RM recovery is very slow, this 
> is my log
> {noformat}
> [2016-08-12T19:43:21.478+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737879
> [2016-08-12T19:43:21.478+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.486+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737878
> [2016-08-12T19:43:21.486+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.494+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737877
> [2016-08-12T19:43:21.494+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.503+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737876
> [2016-08-12T19:43:21.503+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.519+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737875
> [2016-08-12T19:43:21.519+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.533+08:00] [INFO] 
> security.authorize.ServiceAuthorizationManager.authorize(ServiceAuthorizationManager.java:148)
>  [Socket Reader #1 for port 8031] : Authorization successful for yarn 
> (auth:SIMPLE) for protocol=interface 
> org.apache.hadoop.yarn.server.api.ResourceTrackerPB
> [2016-08-12T19:43:21.536+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737874
> [2016-08-12T19:43:21.536+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.553+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737873
> [2016-08-12T19:43:21.553+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.568+08:00] [INFO] 
> 

[jira] [Commented] (YARN-5430) Return container's ip and host from NM ContainerStatus call

2016-08-18 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426597#comment-15426597
 ] 

Billie Rinaldi commented on YARN-5430:
--

I have started trying this out with a patch I am working on for YARN-5505. It 
looks like the IP is not populated immediately, which is okay; I can retry 
getting the container status until the IP exists. When the IP doesn’t exist 
yet, getIPs throws an NPE on ContainerStatusPBImpl line 283. It seems like it 
would be better for it to return null in this case rather than throwing an 
exception. Another issue is that it looks like the hostname returned by getHost 
has a newline at the end. Perhaps this should be trimmed off in 
DockerLinuxContainerRuntime.getIpAndHost.

> Return container's ip and host from NM ContainerStatus call
> ---
>
> Key: YARN-5430
> URL: https://issues.apache.org/jira/browse/YARN-5430
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5430-branch-2.patch, YARN-5430.1.patch, 
> YARN-5430.2.patch, YARN-5430.3.patch, YARN-5430.4.patch, YARN-5430.5.patch, 
> YARN-5430.6.patch, YARN-5430.7.patch, YARN-5430.8.patch
>
>
> In YARN-4757, we introduced a DNS mechanism for containers. That's based on 
> the assumption  that we can get the container's ip and host information and 
> store it in the registry-service. This jira aims to get the container's ip 
> and host from the NM, primarily docker container



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-08-18 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426567#comment-15426567
 ] 

Junping Du commented on YARN-4676:
--

Forget to mention, thanks [~rkanter], [~vvasudev] and [~mingma] for review and 
comments and [~ka...@cloudera.com] for many offline discussions!
There are still several works left after this patch get in:
1. We need to make sure timeout get persistent during RM fail over/restart. 
[~rkanter] filed YARN-5464 and he will work on this.
2. As mentioned by [~mingma] above, we need to support JSON format which 
consistent with DN decommission. Just filed YARN-5536 to address this issue.
3. YARN-5311 get filed before for documentation effort of graceful 
decommission. [~danzhi], given you already have a patch for review, do you want 
to continue to work on that? If so, please feel free to reassign that JIRA to 
you.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Zhi
>Assignee: Daniel Zhi
>  Labels: features
> Fix For: 2.9.0
>
> Attachments: GracefulDecommissionYarnNode.pdf, 
> GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, 
> YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, 
> YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, 
> YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, 
> YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, 
> YARN-4676.018.patch, YARN-4676.019.patch, YARN-4676.020.patch, 
> YARN-4676.021.patch, YARN-4676.022.patch, YARN-4676.023.patch, 
> YARN-4676.024.patch
>
>
> YARN-4676 implements an automatic, asynchronous and flexible mechanism to 
> graceful decommission
> YARN nodes. After user issues the refreshNodes request, ResourceManager 
> automatically evaluates
> status of all affected nodes to kicks out decommission or recommission 
> actions. RM asynchronously
> tracks container and application status related to DECOMMISSIONING nodes to 
> decommission the
> nodes immediately after there are ready to be decommissioned. Decommissioning 
> timeout at individual
> nodes granularity is supported and could be dynamically updated. The 
> mechanism naturally supports multiple
> independent graceful decommissioning “sessions” where each one involves 
> different sets of nodes with
> different timeout settings. Such support is ideal and necessary for graceful 
> decommission request issued
> by external cluster management software instead of human.
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5534) Allow whitelisted volume mounts

2016-08-18 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426566#comment-15426566
 ] 

Daniel Templeton commented on YARN-5534:


A good use case for this is mounting in the Hadoop directories so that they 
don't have to be build into the container.  Another use case is mounting in the 
local tool chain.

> Allow whitelisted volume mounts 
> 
>
> Key: YARN-5534
> URL: https://issues.apache.org/jira/browse/YARN-5534
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: luhuichun
>Assignee: luhuichun
>
> Mounting arbitrary volumes into a Docker container can be a security risk. 
> One approach to provide safe volume mounts is to allow the cluster 
> administrator to configure a set of parent directories in the yarn-site.xml 
> from which volume mounts are allowed.  only these directories and 
> sub-directories are allowed to mount.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5534) Allow whitelisted volume mounts

2016-08-18 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-5534:
---
Summary: Allow whitelisted volume mounts   (was: Prevent arbitrary volume 
mounts )

> Allow whitelisted volume mounts 
> 
>
> Key: YARN-5534
> URL: https://issues.apache.org/jira/browse/YARN-5534
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: luhuichun
>Assignee: luhuichun
>
> Mounting arbitrary volumes into a Docker container can be a security risk. 
> One approach to provide safe volume mounts is to allow the cluster 
> administrator to configure a set of parent directories in the yarn-site.xml 
> from which volume mounts are allowed.  only these directories and 
> sub-directories are allowed to mount.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-08-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426556#comment-15426556
 ] 

Hudson commented on YARN-4676:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10300 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10300/])
YARN-4676. Automatic and Asynchronous Decommissioning Nodes Status (junping_du: 
rev 0da69c324dee9baab0f0b9700db1cc5b623f8421)
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/DecommissioningNodesWatcher.java
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestHostsFileReader.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* (edit) hadoop-project/src/site/site.xml
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestDecommissioningNodesWatcher.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/HostsFileReader.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeDecommissioningEvent.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshNodesRequest.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshNodesRequestPBImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  

[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-08-18 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426559#comment-15426559
 ] 

Junping Du commented on YARN-4676:
--

+1. I have commit the latest patch (024) to trunk and branch-2. Thanks 
[~danzhi] for the patch contribution and being patient through the review 
process. Also, congratulations on your first Apache Hadoop patch contribution! 
:)

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Zhi
>Assignee: Daniel Zhi
>  Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, 
> GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, 
> YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, 
> YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, 
> YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, 
> YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, 
> YARN-4676.018.patch, YARN-4676.019.patch, YARN-4676.020.patch, 
> YARN-4676.021.patch, YARN-4676.022.patch, YARN-4676.023.patch, 
> YARN-4676.024.patch
>
>
> YARN-4676 implements an automatic, asynchronous and flexible mechanism to 
> graceful decommission
> YARN nodes. After user issues the refreshNodes request, ResourceManager 
> automatically evaluates
> status of all affected nodes to kicks out decommission or recommission 
> actions. RM asynchronously
> tracks container and application status related to DECOMMISSIONING nodes to 
> decommission the
> nodes immediately after there are ready to be decommissioned. Decommissioning 
> timeout at individual
> nodes granularity is supported and could be dynamically updated. The 
> mechanism naturally supports multiple
> independent graceful decommissioning “sessions” where each one involves 
> different sets of nodes with
> different timeout settings. Such support is ideal and necessary for graceful 
> decommission request issued
> by external cluster management software instead of human.
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5536) Multiple format support (JSON, etc.) for exclude node file in NM graceful decommission with timeout

2016-08-18 Thread Junping Du (JIRA)
Junping Du created YARN-5536:


 Summary: Multiple format support (JSON, etc.) for exclude node 
file in NM graceful decommission with timeout
 Key: YARN-5536
 URL: https://issues.apache.org/jira/browse/YARN-5536
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Priority: Blocker


Per discussion in YARN-4676, we agree that multiple format (other than xml) 
should be supported to decommission nodes with timeout values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5535) Remove RMDelegationToken make resourcemanager recovery very slow

2016-08-18 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426330#comment-15426330
 ] 

Naganarasimha G R commented on YARN-5535:
-

Hi [~tangshangwen], Is this removal of RM delegation token happening 
synchronously during recovery ?
IIUC, Recovery of application doesnt involve removal of delegation token 
(atleast synchronously), or correct me if i am wrong or missing something !


> Remove RMDelegationToken make resourcemanager recovery very slow
> 
>
> Key: YARN-5535
> URL: https://issues.apache.org/jira/browse/YARN-5535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
>
> In our cluster, I found that when restart RM, RM recovery is very slow, this 
> is my log
> {noformat}
> [2016-08-12T19:43:21.478+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737879
> [2016-08-12T19:43:21.478+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.486+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737878
> [2016-08-12T19:43:21.486+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.494+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737877
> [2016-08-12T19:43:21.494+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.503+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737876
> [2016-08-12T19:43:21.503+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.519+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737875
> [2016-08-12T19:43:21.519+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.533+08:00] [INFO] 
> security.authorize.ServiceAuthorizationManager.authorize(ServiceAuthorizationManager.java:148)
>  [Socket Reader #1 for port 8031] : Authorization successful for yarn 
> (auth:SIMPLE) for protocol=interface 
> org.apache.hadoop.yarn.server.api.ResourceTrackerPB
> [2016-08-12T19:43:21.536+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737874
> [2016-08-12T19:43:21.536+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.553+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737873
> [2016-08-12T19:43:21.553+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.568+08:00] [INFO] 
> yarn.util.RackResolver.coreResolve(RackResolver.java:109) [IPC Server handler 
> 0 on 8031] : Resolved -7056.hadoop.xxx.local to /rack/rack5118
> [2016-08-12T19:43:21.569+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737872
> [2016-08-12T19:43:21.569+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> 

[jira] [Commented] (YARN-3649) Allow configurable prefix for hbase table names (like prod, exp, test etc)

2016-08-18 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426288#comment-15426288
 ] 

Varun Saxena commented on YARN-3649:


In my personal opinion, default values should be such that they can be directly 
used on a production cluster until and unless they depend on deployment i.e. 
network, queue config, which feature to enable/disable, etc.
Or some config which you know, works well for your deployment. Config change 
should be minimum once the feature is enabled.

Take for instance, {{yarn.nodemanager.delete.debug-delay.sec}}. This is mainly 
a debug config to be used in a non production cluster. But default value for it 
is 0.

As ours is an alpha quality feature release, dev makes sense in the short term 
but prod might make more sense in the longer term IMO.
Let us see what others think of it.

BTW, you have missed the test class in the latest patch.
Other than that patch at a high level looks fine.

> Allow configurable prefix for hbase table names (like prod, exp, test etc)
> --
>
> Key: YARN-3649
> URL: https://issues.apache.org/jira/browse/YARN-3649
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: YARN-5355
> Attachments: YARN-3649-YARN-2928.01.patch, 
> YARN-3649-YARN-5355.002.patch, YARN-3649-YARN-5355.01.patch
>
>
> As per [~jrottinghuis]'s suggestion in YARN-3411, it will be a good idea to 
> have a configurable prefix for hbase table names.  
> This way we can easily run a staging, a test, a production and whatever setup 
> in the same HBase instance / without having to override every single table in 
> the config.
> One could simply overwrite the default prefix and you're off and running.
> For prefix, potential candidates are "tst" "prod" "exp" etc. Once can then 
> still override one tablename if needed, but managing one whole setup will be 
> easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5526) DrainDispacher#ServiceStop blocked if setDrainEventsOnStop invoked

2016-08-18 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426170#comment-15426170
 ] 

sandflee commented on YARN-5526:


thanks [~varun_saxena] for suggestion and review !

> DrainDispacher#ServiceStop blocked if setDrainEventsOnStop invoked
> --
>
> Key: YARN-5526
> URL: https://issues.apache.org/jira/browse/YARN-5526
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Fix For: 2.9.0
>
> Attachments: YARN-5526.01.patch, YARN-5526.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5508) Exception should be recorded as caught rather than thrown

2016-08-18 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R resolved YARN-5508.
-
Resolution: Duplicate

YARN-5507 supercedes this issue

> Exception should be recorded as caught rather than thrown
> -
>
> Key: YARN-5508
> URL: https://issues.apache.org/jira/browse/YARN-5508
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Nemo Chen
>  Labels: easyfix, easytest, trivial
>
> Similar to the fix to HADOOP-657. In file:
> hadoop-rel-release-2.7.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/CleanerTask.java
> In line 221, the exception was caught rather than thrown.
> {code: borderStyle=solid}
> try {
> store.cleanResourceReferences(key);
> } catch (YarnException e) {
> LOG.error("Exception thrown while removing dead appIds.", e);
>}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5508) Exception should be recorded as caught rather than thrown

2016-08-18 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-5508:

Labels: easyfix easytest trivial  (was: easyfix easytest)

> Exception should be recorded as caught rather than thrown
> -
>
> Key: YARN-5508
> URL: https://issues.apache.org/jira/browse/YARN-5508
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Nemo Chen
>  Labels: easyfix, easytest, trivial
>
> Similar to the fix to HADOOP-657. In file:
> hadoop-rel-release-2.7.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/CleanerTask.java
> In line 221, the exception was caught rather than thrown.
> {code: borderStyle=solid}
> try {
> store.cleanResourceReferences(key);
> } catch (YarnException e) {
> LOG.error("Exception thrown while removing dead appIds.", e);
>}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4995) FairScheduler: Display per-queue demand on the scheduler page

2016-08-18 Thread stefanlee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426123#comment-15426123
 ] 

stefanlee commented on YARN-4995:
-

[~xupeng] thanks your jira, i have a question,the demand resource you mentioned 
will beyond the max resource a lot,IMO,the demand resource is used resource 
plus request resource,because of node locality and rack locality ,a task's 
request resource may include many containers,e.g. 
<20,"node1","memory:1G,cpu:1",1,true> ,<20,"node2","memory:1G,cpu:1",1,true> 
,<20,"rack1","memory:1G,cpu:1",1,true> ,<20,"rack2","memory:1G,cpu:1",1,true> 
,<20,"*","memory:1G,cpu:1",1,true> , a MR application may has 20 maps and  need 
20*,but the demand resource will reach 5*20* 
,so the demand resource cant help user to modify  a queue's min/max resource 
clearly. I dont know my option correct or not?

> FairScheduler: Display per-queue demand on the scheduler page
> -
>
> Key: YARN-4995
> URL: https://issues.apache.org/jira/browse/YARN-4995
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: xupeng
>Assignee: xupeng
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-4995.001.patch, YARN-4995.002.patch, 
> demo_screenshot.png
>
>
> For now there is no demand resource information for queues on the scheduler 
> page. 
> Just using used resource information, it is hard for us to judge whether the 
> queue is needy (demand > used , but cluster has no available resource). And 
> without demand resource information, modifying min/max resource for queue is 
> not accurate. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5535) Remove RMDelegationToken make resourcemanager recovery very slow

2016-08-18 Thread tangshangwen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426107#comment-15426107
 ] 

tangshangwen commented on YARN-5535:


Thanks [~sunilg] for the comments. 
I think Removing RMDelegationToken and SequenceNumber may take a long time,lead 
to can't handle other events
{code:title=ZKRMStateStore.java|borderStyle=solid}
  @Override
  protected synchronized void removeRMDelegationTokenState(
  RMDelegationTokenIdentifier rmDTIdentifier) throws Exception {
String nodeRemovePath =
getNodePath(delegationTokensRootPath, DELEGATION_TOKEN_PREFIX
+ rmDTIdentifier.getSequenceNumber());
if (LOG.isDebugEnabled()) {
  LOG.debug("Removing RMDelegationToken_"
  + rmDTIdentifier.getSequenceNumber());
}
if (existsWithRetries(nodeRemovePath, false) != null) {
  ArrayList opList = new ArrayList();
  opList.add(Op.delete(nodeRemovePath, -1));
  doDeleteMultiWithRetries(opList);
} else {
  LOG.debug("Attempted to delete a non-existing znode " + nodeRemovePath);
}
  }
{code}

> Remove RMDelegationToken make resourcemanager recovery very slow
> 
>
> Key: YARN-5535
> URL: https://issues.apache.org/jira/browse/YARN-5535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
>
> In our cluster, I found that when restart RM, RM recovery is very slow, this 
> is my log
> {noformat}
> [2016-08-12T19:43:21.478+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737879
> [2016-08-12T19:43:21.478+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.486+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737878
> [2016-08-12T19:43:21.486+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.494+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737877
> [2016-08-12T19:43:21.494+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.503+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737876
> [2016-08-12T19:43:21.503+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.519+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737875
> [2016-08-12T19:43:21.519+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.533+08:00] [INFO] 
> security.authorize.ServiceAuthorizationManager.authorize(ServiceAuthorizationManager.java:148)
>  [Socket Reader #1 for port 8031] : Authorization successful for yarn 
> (auth:SIMPLE) for protocol=interface 
> org.apache.hadoop.yarn.server.api.ResourceTrackerPB
> [2016-08-12T19:43:21.536+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737874
> [2016-08-12T19:43:21.536+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.553+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737873
> [2016-08-12T19:43:21.553+08:00] [INFO] 
> 

[jira] [Commented] (YARN-5535) Remove RMDelegationToken make resourcemanager recovery very slow

2016-08-18 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426084#comment-15426084
 ] 

Sunil G commented on YARN-5535:
---

YARN-4041 has made token recovery asynchronous during recovery. 
Its been merged into 2.7 branch, but i think not present in 2.7.1

With this optimization, recovery seems more faster when token renewer was slow.

> Remove RMDelegationToken make resourcemanager recovery very slow
> 
>
> Key: YARN-5535
> URL: https://issues.apache.org/jira/browse/YARN-5535
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
>
> In our cluster, I found that when restart RM, RM recovery is very slow, this 
> is my log
> {noformat}
> [2016-08-12T19:43:21.478+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737879
> [2016-08-12T19:43:21.478+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.486+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737878
> [2016-08-12T19:43:21.486+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.494+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737877
> [2016-08-12T19:43:21.494+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.503+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737876
> [2016-08-12T19:43:21.503+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.519+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737875
> [2016-08-12T19:43:21.519+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.533+08:00] [INFO] 
> security.authorize.ServiceAuthorizationManager.authorize(ServiceAuthorizationManager.java:148)
>  [Socket Reader #1 for port 8031] : Authorization successful for yarn 
> (auth:SIMPLE) for protocol=interface 
> org.apache.hadoop.yarn.server.api.ResourceTrackerPB
> [2016-08-12T19:43:21.536+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737874
> [2016-08-12T19:43:21.536+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.553+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737873
> [2016-08-12T19:43:21.553+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
> [2016-08-12T19:43:21.568+08:00] [INFO] 
> yarn.util.RackResolver.coreResolve(RackResolver.java:109) [IPC Server handler 
> 0 on 8031] : Resolved -7056.hadoop.xxx.local to /rack/rack5118
> [2016-08-12T19:43:21.569+08:00] [INFO] 
> resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
>  [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence 
> number: 737872
> [2016-08-12T19:43:21.569+08:00] [INFO] 
> resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
> [Thread[Thread-26,5,main]] : Removing RMDelegationToken and 

[jira] [Updated] (YARN-5535) Remove RMDelegationToken make resourcemanager recovery very slow

2016-08-18 Thread tangshangwen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tangshangwen updated YARN-5535:
---
Description: 
In our cluster, I found that when restart RM, RM recovery is very slow, this is 
my log
{noformat}
[2016-08-12T19:43:21.478+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737879
[2016-08-12T19:43:21.478+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.486+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737878
[2016-08-12T19:43:21.486+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.494+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737877
[2016-08-12T19:43:21.494+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.503+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737876
[2016-08-12T19:43:21.503+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.519+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737875
[2016-08-12T19:43:21.519+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.533+08:00] [INFO] 
security.authorize.ServiceAuthorizationManager.authorize(ServiceAuthorizationManager.java:148)
 [Socket Reader #1 for port 8031] : Authorization successful for yarn 
(auth:SIMPLE) for protocol=interface 
org.apache.hadoop.yarn.server.api.ResourceTrackerPB
[2016-08-12T19:43:21.536+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737874
[2016-08-12T19:43:21.536+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.553+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737873
[2016-08-12T19:43:21.553+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.568+08:00] [INFO] 
yarn.util.RackResolver.coreResolve(RackResolver.java:109) [IPC Server handler 0 
on 8031] : Resolved -7056.hadoop.xxx.local to /rack/rack5118
[2016-08-12T19:43:21.569+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737872
[2016-08-12T19:43:21.569+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.570+08:00] [INFO] 
server.resourcemanager.ResourceTrackerService.registerNodeManager(ResourceTrackerService.java:343)
 [IPC Server handler 0 on 8031] : NodeManager from node 
x-7056.hadoop.xxx.local(cmPort: 50086 httpPort: 8042) registered with 
capability: , assigned nodeId 
xx-7056.hadoop.xxx.local:50086
[2016-08-12T19:43:21.572+08:00] [INFO] 
resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:424) [AsyncDispatcher 
event handler] : xx-7056.hadoop.xxx.local:50086 Node Transitioned from NEW 
to RUNNING
[2016-08-12T19:43:21.576+08:00] [INFO] 
yarn.event.AsyncDispatcher.handle(AsyncDispatcher.java:235) [AsyncDispatcher 
event handler] : Size of event-queue is 1000

[jira] [Updated] (YARN-5535) Remove RMDelegationToken make resourcemanager recovery very slow

2016-08-18 Thread tangshangwen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tangshangwen updated YARN-5535:
---
Description: 
In our cluster, I found that when restart RM, RM recovery is very slow, this is 
my log
{noformat}
[2016-08-12T19:43:21.478+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737879
[2016-08-12T19:43:21.478+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.486+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737878
[2016-08-12T19:43:21.486+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.494+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737877
[2016-08-12T19:43:21.494+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.503+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737876
[2016-08-12T19:43:21.503+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.519+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737875
[2016-08-12T19:43:21.519+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.533+08:00] [INFO] 
security.authorize.ServiceAuthorizationManager.authorize(ServiceAuthorizationManager.java:148)
 [Socket Reader #1 for port 8031] : Authorization successful for yarn 
(auth:SIMPLE) for protocol=interface 
org.apache.hadoop.yarn.server.api.ResourceTrackerPB
[2016-08-12T19:43:21.536+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737874
[2016-08-12T19:43:21.536+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.553+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737873
[2016-08-12T19:43:21.553+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.568+08:00] [INFO] 
yarn.util.RackResolver.coreResolve(RackResolver.java:109) [IPC Server handler 0 
on 8031] : Resolved -7056.hadoop.jd.local to /rack/rack5118
[2016-08-12T19:43:21.569+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737872
[2016-08-12T19:43:21.569+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.570+08:00] [INFO] 
server.resourcemanager.ResourceTrackerService.registerNodeManager(ResourceTrackerService.java:343)
 [IPC Server handler 0 on 8031] : NodeManager from node 
x-7056.hadoop.jd.local(cmPort: 50086 httpPort: 8042) registered with 
capability: , assigned nodeId 
xx-7056.hadoop.jd.local:50086
[2016-08-12T19:43:21.572+08:00] [INFO] 
resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:424) [AsyncDispatcher 
event handler] : xx-7056.hadoop.jd.local:50086 Node Transitioned from NEW 
to RUNNING
[2016-08-12T19:43:21.576+08:00] [INFO] 
yarn.event.AsyncDispatcher.handle(AsyncDispatcher.java:235) [AsyncDispatcher 
event handler] : Size of event-queue is 1000

[jira] [Created] (YARN-5535) Remove RMDelegationToken make resourcemanager recovery very slow

2016-08-18 Thread tangshangwen (JIRA)
tangshangwen created YARN-5535:
--

 Summary: Remove RMDelegationToken make resourcemanager recovery 
very slow
 Key: YARN-5535
 URL: https://issues.apache.org/jira/browse/YARN-5535
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: tangshangwen
Assignee: tangshangwen


In our cluster, I found that when restart RM, RM recovery is very slow, this is 
my log
{noformat}
[2016-08-12T19:43:21.478+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737879
[2016-08-12T19:43:21.478+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.486+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737878
[2016-08-12T19:43:21.486+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.494+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737877
[2016-08-12T19:43:21.494+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.503+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737876
[2016-08-12T19:43:21.503+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.519+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737875
[2016-08-12T19:43:21.519+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.533+08:00] [INFO] 
security.authorize.ServiceAuthorizationManager.authorize(ServiceAuthorizationManager.java:148)
 [Socket Reader #1 for port 8031] : Authorization successful for yarn 
(auth:SIMPLE) for protocol=interface 
org.apache.hadoop.yarn.server.api.ResourceTrackerPB
[2016-08-12T19:43:21.536+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737874
[2016-08-12T19:43:21.536+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.553+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737873
[2016-08-12T19:43:21.553+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.568+08:00] [INFO] 
yarn.util.RackResolver.coreResolve(RackResolver.java:109) [IPC Server handler 0 
on 8031] : Resolved BJHC-Jmartad-7056.hadoop.jd.local to /rack/rack5118
[2016-08-12T19:43:21.569+08:00] [INFO] 
resourcemanager.security.RMDelegationTokenSecretManager.removeStoredToken(RMDelegationTokenSecretManager.java:136)
 [Thread[Thread-26,5,main]] : removing RMDelegation token with sequence number: 
737872
[2016-08-12T19:43:21.569+08:00] [INFO] 
resourcemanager.recovery.RMStateStore.transition(RMStateStore.java:320) 
[Thread[Thread-26,5,main]] : Removing RMDelegationToken and SequenceNumber
[2016-08-12T19:43:21.570+08:00] [INFO] 
server.resourcemanager.ResourceTrackerService.registerNodeManager(ResourceTrackerService.java:343)
 [IPC Server handler 0 on 8031] : NodeManager from node 
BJHC-Jmartad-7056.hadoop.jd.local(cmPort: 50086 httpPort: 8042) registered with 
capability: , assigned nodeId 
BJHC-Jmartad-7056.hadoop.jd.local:50086
[2016-08-12T19:43:21.572+08:00] [INFO] 
resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:424) [AsyncDispatcher 
event handler] : 

[jira] [Comment Edited] (YARN-5526) DrainDispacher#ServiceStop blocked if setDrainEventsOnStop invoked

2016-08-18 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425996#comment-15425996
 ] 

Varun Saxena edited comment on YARN-5526 at 8/18/16 7:01 AM:
-

Committed to trunk, branch-2.
Thanks [~sandflee] for your contribution.


was (Author: varun_saxena):
Committed to trunk, branch-2.
Thanks 

> DrainDispacher#ServiceStop blocked if setDrainEventsOnStop invoked
> --
>
> Key: YARN-5526
> URL: https://issues.apache.org/jira/browse/YARN-5526
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Fix For: 2.9.0
>
> Attachments: YARN-5526.01.patch, YARN-5526.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5526) DrainDispacher#ServiceStop blocked if setDrainEventsOnStop invoked

2016-08-18 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425996#comment-15425996
 ] 

Varun Saxena commented on YARN-5526:


Committed to trunk, branch-2.
Thanks 

> DrainDispacher#ServiceStop blocked if setDrainEventsOnStop invoked
> --
>
> Key: YARN-5526
> URL: https://issues.apache.org/jira/browse/YARN-5526
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Fix For: 2.9.0
>
> Attachments: YARN-5526.01.patch, YARN-5526.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5308) FairScheduler: Move continuous scheduling related tests to TestContinuousScheduling

2016-08-18 Thread Kai Sasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425974#comment-15425974
 ] 

Kai Sasaki commented on YARN-5308:
--

[~varun_saxena] Sure, I'll do that.

> FairScheduler: Move continuous scheduling related tests to 
> TestContinuousScheduling
> ---
>
> Key: YARN-5308
> URL: https://issues.apache.org/jira/browse/YARN-5308
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, test
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Kai Sasaki
>  Labels: newbie
> Attachments: YARN-5308.01.patch, YARN-5308.02.patch, 
> YARN-5308.03.patch
>
>
> TestFairScheduler still has some tests on continuous scheduling. We should 
> move them to TestContinuousScheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3673) Create a FailoverProxy for Federation services

2016-08-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425967#comment-15425967
 ] 

Jian He commented on YARN-3673:
---

Test: IIUC, I don't actually see FederationRMFailoverProxyProvider configured 
in the test, is it tested?

> Create a FailoverProxy for Federation services
> --
>
> Key: YARN-3673
> URL: https://issues.apache.org/jira/browse/YARN-3673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3673-YARN-2915-v1.patch
>
>
> This JIRA proposes creating a failover proxy for Federation based on the 
> cluster membership information in the StateStore that can be used by both 
> Router & AMRMProxy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3673) Create a FailoverProxy for Federation services

2016-08-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425963#comment-15425963
 ] 

Jian He commented on YARN-3673:
---


- I think for this patch the FEDERATION_FAILOVER_ENABLED and FEDERATION_ENABLED 
flag are not needed. To use FederationRMFailoverProxyProvider, we just need to 
change the provider config to use the FederationRMFailoverProxyProvider class? 
Looks to me the FederationRMFailoverProxyProvider is just one way to retrieve 
the RM proxy object as compared to other existing ones. 
- Why do we need to store the tokens and then re-add the tokens?
{code}
private void addOriginalTokens(UserGroupInformation currentUser) {
  if (originalTokens == null || originalTokens.isEmpty()) {
    return;
  }
  for (Token token : originalTokens) {
    currentUser.addToken(token);
  }
}
{code}
- It appears to me the FederationStateStoreFacade is not needed for this patch. 
It is used in getProxyInternal(federationFailoverEnabled) only, while if 
federationFailoverEnabled is true (which should be always the case), the 
FederationStateStoreFacade  will then always flushes the cache and request the 
state-store for the RMAddress. 
On the other hand, in the getProxy() method, we already have a current variable 
to cache the current object, which cause the cached object in the 
FederationStateStoreFacade essentially not used?


> Create a FailoverProxy for Federation services
> --
>
> Key: YARN-3673
> URL: https://issues.apache.org/jira/browse/YARN-3673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3673-YARN-2915-v1.patch
>
>
> This JIRA proposes creating a failover proxy for Federation based on the 
> cluster membership information in the StateStore that can be used by both 
> Router & AMRMProxy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5534) Prevent arbitrary volume mounts

2016-08-18 Thread luhuichun (JIRA)
luhuichun created YARN-5534:
---

 Summary: Prevent arbitrary volume mounts 
 Key: YARN-5534
 URL: https://issues.apache.org/jira/browse/YARN-5534
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: luhuichun
Assignee: luhuichun


Mounting arbitrary volumes into a Docker container can be a security risk. One 
approach to provide safe volume mounts is to allow the cluster administrator to 
configure a set of parent directories in the yarn-site.xml from which volume 
mounts are allowed.  only these directories and sub-directories are allowed to 
mount.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5308) FairScheduler: Move continuous scheduling related tests to TestContinuousScheduling

2016-08-18 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425953#comment-15425953
 ] 

Varun Saxena commented on YARN-5308:


[~lewuathe], the patch doesnt apply cleanly. Can you rebase it ?

> FairScheduler: Move continuous scheduling related tests to 
> TestContinuousScheduling
> ---
>
> Key: YARN-5308
> URL: https://issues.apache.org/jira/browse/YARN-5308
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, test
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Kai Sasaki
>  Labels: newbie
> Attachments: YARN-5308.01.patch, YARN-5308.02.patch, 
> YARN-5308.03.patch
>
>
> TestFairScheduler still has some tests on continuous scheduling. We should 
> move them to TestContinuousScheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org