[jira] [Commented] (YARN-4155) TestLogAggregationService.testLogAggregationServiceWithInterval failing

2015-10-09 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950006#comment-14950006
 ] 

Bibin A Chundatt commented on YARN-4155:


Hi [~rohithsharma]/[~ste...@apache.org]

Could you please review patch attached.

> TestLogAggregationService.testLogAggregationServiceWithInterval failing
> ---
>
> Key: YARN-4155
> URL: https://issues.apache.org/jira/browse/YARN-4155
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Steve Loughran
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-4155.patch, 0001-YARN-4155.patch, 
> 0003-YARN-4155.patch
>
>
> Test failing on Jenkins: 
> {{TestLogAggregationService.testLogAggregationServiceWithInterval}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4201) AMBlacklist does not work for minicluster

2015-10-09 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-4201:
---
Attachment: YARN-4201.003.patch

> AMBlacklist does not work for minicluster
> -
>
> Key: YARN-4201
> URL: https://issues.apache.org/jira/browse/YARN-4201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4021.001.patch, YARN-4201.002.patch, 
> YARN-4201.003.patch
>
>
> For minicluster (scheduler.include-port-in-node-name is set to TRUE), 
> AMBlacklist does not work. It is because RM just puts host to AMBlacklist 
> whether scheduler.include-port-in-node-name is set or not. In fact RM should 
> put "host + port" to AMBlacklist when it is set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4201) AMBlacklist does not work for minicluster

2015-10-09 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950015#comment-14950015
 ] 

Jun Gong commented on YARN-4201:


Thanks [~zxu] for the good catch. Attach a new patch to address the problem.

> AMBlacklist does not work for minicluster
> -
>
> Key: YARN-4201
> URL: https://issues.apache.org/jira/browse/YARN-4201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4021.001.patch, YARN-4201.002.patch, 
> YARN-4201.003.patch
>
>
> For minicluster (scheduler.include-port-in-node-name is set to TRUE), 
> AMBlacklist does not work. It is because RM just puts host to AMBlacklist 
> whether scheduler.include-port-in-node-name is set or not. In fact RM should 
> put "host + port" to AMBlacklist when it is set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user

2015-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950014#comment-14950014
 ] 

Hudson commented on YARN-4235:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #503 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/503/])
YARN-4235. FairScheduler PrimaryGroup does not handle empty groups 
(rohithsharmaks: rev 8f195387a4a4a5a278119bf4c2f15cad61f0e2c7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueuePlacementPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java


> FairScheduler PrimaryGroup does not handle empty groups returned for a user 
> 
>
> Key: YARN-4235
> URL: https://issues.apache.org/jira/browse/YARN-4235
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-4235.001.patch
>
>
> We see NPE if empty groups are returned for a user. This causes a NPE and 
> cause RM to crash as below
> {noformat}
> 2015-09-22 16:51:52,780  FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ADDED to the scheduler
> java.lang.IndexOutOfBoundsException: Index: 0
>   at java.util.Collections$EmptyList.get(Collections.java:3212)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-09-22 16:51:52,797  INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4245) Clean up container-executor invocation interface

2015-10-09 Thread Sidharta Seethana (JIRA)
Sidharta Seethana created YARN-4245:
---

 Summary: Clean up container-executor invocation interface
 Key: YARN-4245
 URL: https://issues.apache.org/jira/browse/YARN-4245
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.8.0
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana


The current container-executor invocation interface (especially for launch 
container) is cumbersome to use . Launching a container now requires 13-15 
arguments.  This becomes especially problematic when additional, potentially 
optional, arguments are required. We need a better mechanism to deal with this. 
One such mechanism could be to handle this could be to use a file containing 
key/value pairs (similar to container-executor.cfg) corresponding to the 
arguments each invocation needs. Such a mechanism would make it easier to add 
new optional arguments to container-executor and better manage existing ones. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-261) Ability to fail AM attempts

2015-10-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950020#comment-14950020
 ] 

Hadoop QA commented on YARN-261:


\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  24m  9s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   8m 20s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 38s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 19s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m  2s | The applied patch generated  5 
new checkstyle issues (total was 33, now 38). |
| {color:red}-1{color} | whitespace |   0m 36s | The patch has 27  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |  10m 36s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | mapreduce tests | 110m 33s | Tests failed in 
hadoop-mapreduce-client-jobclient. |
| {color:green}+1{color} | yarn tests |   0m 31s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   7m  7s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  7s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   8m 57s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  56m 39s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 246m 43s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.mapreduce.v2.TestRMNMInfo |
|   | hadoop.mapred.TestNetworkedJob |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765524/0002-YARN-261.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / e1bf8b3 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9387/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9387/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9387/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9387/artifact/patchprocess/whitespace.txt
 |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9387/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9387/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9387/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9387/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9387/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9387/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9387/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9387/console |


This message was automatically generated.

> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, 
> YARN-261--n5.patch, YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be k

[jira] [Commented] (YARN-4201) AMBlacklist does not work for minicluster

2015-10-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950025#comment-14950025
 ] 

Hadoop QA commented on YARN-4201:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 12s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 58s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 17s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 54s | The applied patch generated  1 
new checkstyle issues (total was 176, now 176). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  56m 42s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  97m  0s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765738/YARN-4201.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8f19538 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9388/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9388/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9388/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9388/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9388/console |


This message was automatically generated.

> AMBlacklist does not work for minicluster
> -
>
> Key: YARN-4201
> URL: https://issues.apache.org/jira/browse/YARN-4201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4021.001.patch, YARN-4201.002.patch, 
> YARN-4201.003.patch
>
>
> For minicluster (scheduler.include-port-in-node-name is set to TRUE), 
> AMBlacklist does not work. It is because RM just puts host to AMBlacklist 
> whether scheduler.include-port-in-node-name is set or not. In fact RM should 
> put "host + port" to AMBlacklist when it is set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4017) container-executor overuses PATH_MAX

2015-10-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950044#comment-14950044
 ] 

Hadoop QA commented on YARN-4017:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   5m 39s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:red}-1{color} | release audit |   0m 17s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | yarn tests |   8m 41s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  24m 37s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765750/YARN-4017.001.patch |
| Optional Tests | javac unit |
| git revision | trunk / 8f19538 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9389/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9389/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9389/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9389/console |


This message was automatically generated.

> container-executor overuses PATH_MAX
> 
>
> Key: YARN-4017
> URL: https://issues.apache.org/jira/browse/YARN-4017
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.8.0
>Reporter: Allen Wittenauer
>Assignee: Sidharta Seethana
> Attachments: YARN-4017.001.patch
>
>
> Lots of places in container-executor are now using PATH_MAX, which is simply 
> too small on a lot of platforms.  We should use a larger buffer size and be 
> done with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user

2015-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950056#comment-14950056
 ] 

Hudson commented on YARN-4235:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2447 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2447/])
YARN-4235. FairScheduler PrimaryGroup does not handle empty groups 
(rohithsharmaks: rev 8f195387a4a4a5a278119bf4c2f15cad61f0e2c7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueuePlacementPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java


> FairScheduler PrimaryGroup does not handle empty groups returned for a user 
> 
>
> Key: YARN-4235
> URL: https://issues.apache.org/jira/browse/YARN-4235
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-4235.001.patch
>
>
> We see NPE if empty groups are returned for a user. This causes a NPE and 
> cause RM to crash as below
> {noformat}
> 2015-09-22 16:51:52,780  FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ADDED to the scheduler
> java.lang.IndexOutOfBoundsException: Index: 0
>   at java.util.Collections$EmptyList.get(Collections.java:3212)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-09-22 16:51:52,797  INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4017) container-executor overuses PATH_MAX

2015-10-09 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950104#comment-14950104
 ] 

Sidharta Seethana commented on YARN-4017:
-

The release artifact is unrelated to this JIRA. The patch simply uses a 
different constant (defined in configuration.h) instead of PATH_MAX - no 
additional tests are required.

[~vvasudev] , could you please review this patch? Thank you.

> container-executor overuses PATH_MAX
> 
>
> Key: YARN-4017
> URL: https://issues.apache.org/jira/browse/YARN-4017
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.8.0
>Reporter: Allen Wittenauer
>Assignee: Sidharta Seethana
> Attachments: YARN-4017.001.patch
>
>
> Lots of places in container-executor are now using PATH_MAX, which is simply 
> too small on a lot of platforms.  We should use a larger buffer size and be 
> done with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4201) AMBlacklist does not work for minicluster

2015-10-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950111#comment-14950111
 ] 

Hadoop QA commented on YARN-4201:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 47s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 50s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 10s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 19s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 51s | The applied patch generated  1 
new checkstyle issues (total was 176, now 176). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  56m 32s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  96m  5s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765751/YARN-4201.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8f19538 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9390/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9390/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9390/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9390/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9390/console |


This message was automatically generated.

> AMBlacklist does not work for minicluster
> -
>
> Key: YARN-4201
> URL: https://issues.apache.org/jira/browse/YARN-4201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4021.001.patch, YARN-4201.002.patch, 
> YARN-4201.003.patch
>
>
> For minicluster (scheduler.include-port-in-node-name is set to TRUE), 
> AMBlacklist does not work. It is because RM just puts host to AMBlacklist 
> whether scheduler.include-port-in-node-name is set or not. In fact RM should 
> put "host + port" to AMBlacklist when it is set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-261) Ability to fail AM attempts

2015-10-09 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950118#comment-14950118
 ] 

Rohith Sharma K S commented on YARN-261:


*-1 pre-patch* AND *-1  release audit* are not caused by this patch.

> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, 
> YARN-261--n5.patch, YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-10-09 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3216:
--
Attachment: 0004-YARN-3216.patch

Attaching an updated version of patch addressing the comments.

I will updated another patch with more test cases to cover all possible error 
conditions.

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-261) Ability to fail AM attempts

2015-10-09 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950159#comment-14950159
 ] 

Rohith Sharma K S commented on YARN-261:


for the test failures, filed MAPREDUCE-6508 and MAPREDUCE-6507

> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, 
> YARN-261--n5.patch, YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4246) NPE while listing app attempt

2015-10-09 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-4246:
--

 Summary: NPE while listing app attempt
 Key: YARN-4246
 URL: https://issues.apache.org/jira/browse/YARN-4246
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Saxena
Assignee: nijel


{noformat}
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669)
at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4246) NPE while listing app attempt

2015-10-09 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4246:
---
Description: 
{noformat}
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669)
at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
{noformat}

This is because AM container id can be null if AM container hasnt been 
allocated. In ApplicationCLI#listApplicationAttempts we should check whether AM 
container ID is null instead of directly calling toString.
{code}
  writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport
  .getApplicationAttemptId(), appAttemptReport
  .getYarnApplicationAttemptState(), appAttemptReport
  .getAMContainerId().toString(), appAttemptReport.getTrackingUrl());
{code}

  was:
{noformat}
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669)
at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
{noformat}


> NPE while listing app attempt
> -
>
> Key: YARN-4246
> URL: https://issues.apache.org/jira/browse/YARN-4246
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: nijel
>
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
> {noformat}
> This is because AM container id can be null if AM container hasnt been 
> allocated. In ApplicationCLI#listApplicationAttempts we should check whether 
> AM container ID is null instead of directly calling toString.
> {code}
>   writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport
>   .getApplicationAttemptId(), appAttemptReport
>   .getYarnApplicationAttemptState(), appAttemptReport
>   .getAMContainerId().toString(), appAttemptReport.getTrackingUrl());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4246) NPE while listing app attempt

2015-10-09 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950215#comment-14950215
 ] 

nijel commented on YARN-4246:
-

thanks [~varun_saxena] for reporting
 the same issue is there in applicationattempt  status also

{noformat}
 ./yarn applicationattempt -status appattempt_1444389134985_0001_01
15/10/09 16:53:19 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
15/10/09 16:53:20 INFO impl.TimelineClientImpl: Timeline service address: 
http://10.18.130.110:55033/ws/v1/timeline/
15/10/09 16:53:20 INFO client.RMProxy: Connecting to ResourceManager at 
host-10-18-130-110/10.18.130.110:8032
15/10/09 16:53:21 INFO client.AHSProxy: Connecting to Application History 
server at /10.18.130.110:55034
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationAttemptReport(ApplicationCLI.java:352)
at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:182)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
{noformat}

> NPE while listing app attempt
> -
>
> Key: YARN-4246
> URL: https://issues.apache.org/jira/browse/YARN-4246
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: nijel
>
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.listApplicationAttempts(ApplicationCLI.java:669)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:233)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:89)
> {noformat}
> This is because AM container id can be null if AM container hasnt been 
> allocated. In ApplicationCLI#listApplicationAttempts we should check whether 
> AM container ID is null instead of directly calling toString.
> {code}
>   writer.printf(APPLICATION_ATTEMPTS_PATTERN, appAttemptReport
>   .getApplicationAttemptId(), appAttemptReport
>   .getYarnApplicationAttemptState(), appAttemptReport
>   .getAMContainerId().toString(), appAttemptReport.getTrackingUrl());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4201) AMBlacklist does not work for minicluster

2015-10-09 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950228#comment-14950228
 ] 

Jun Gong commented on YARN-4201:


"release audit" and "checkstyle"  errors are not related.

> AMBlacklist does not work for minicluster
> -
>
> Key: YARN-4201
> URL: https://issues.apache.org/jira/browse/YARN-4201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4021.001.patch, YARN-4201.002.patch, 
> YARN-4201.003.patch
>
>
> For minicluster (scheduler.include-port-in-node-name is set to TRUE), 
> AMBlacklist does not work. It is because RM just puts host to AMBlacklist 
> whether scheduler.include-port-in-node-name is set or not. In fact RM should 
> put "host + port" to AMBlacklist when it is set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-10-09 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950399#comment-14950399
 ] 

Varun Saxena commented on YARN-2902:


[~jlowe] kindly review

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, 
> YARN-2902.07.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-261) Ability to fail AM attempts

2015-10-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950418#comment-14950418
 ] 

Jason Lowe commented on YARN-261:
-

+1 lgtm.  Will fix whitespace issues on commit.

> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, 
> YARN-261--n5.patch, YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-261) Ability to fail AM attempts

2015-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950505#comment-14950505
 ] 

Hudson commented on YARN-261:
-

FAILURE: Integrated in Hadoop-trunk-Commit #8600 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8600/])
YARN-261. Ability to fail AM attempts. Contributed by Andrey Klochkov (jlowe: 
rev a0bca2b5ad2344fda5411d910a3730c85f12a0df)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/FailApplicationAttemptRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/FailApplicationAttemptResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/YarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptFailedEvent.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptRegistrationEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockResourceManagerFacade.java


> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, 
> YARN-

[jira] [Commented] (YARN-261) Ability to fail AM attempts

2015-10-09 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950561#comment-14950561
 ] 

Rohith Sharma K S commented on YARN-261:


Thanks [~jlowe] for quickly reviewing and committing patch, thanks [~aklochkov] 
for your contribution and patience:-)
Thanks to [~xgong] for the review!!

> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, 
> YARN-261--n5.patch, YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-10-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950606#comment-14950606
 ] 

Hadoop QA commented on YARN-3216:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 53s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 15s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 27s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 51s | The applied patch generated  
11 new checkstyle issues (total was 270, now 260). |
| {color:red}-1{color} | whitespace |   0m  6s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 50s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  61m  0s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 102m 58s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765766/0004-YARN-3216.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8f19538 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9391/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9391/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9391/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9391/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9391/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9391/console |


This message was automatically generated.

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-261) Ability to fail AM attempts

2015-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950608#comment-14950608
 ] 

Hudson commented on YARN-261:
-

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #514 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/514/])
YARN-261. Ability to fail AM attempts. Contributed by Andrey Klochkov (jlowe: 
rev a0bca2b5ad2344fda5411d910a3730c85f12a0df)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/FailApplicationAttemptResponsePBImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptRegistrationEvent.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/FailApplicationAttemptRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/YarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockResourceManagerFacade.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptFailedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java


> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, 
>

[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-09 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950630#comment-14950630
 ] 

Rohith Sharma K S commented on YARN-4243:
-

Thanks [~xgong] for working on this.
Some comments and suggestions
# While initializing Elector service createConnection will retry as per 
configured value i.e *maxRetryNum* say 10. But if session is closed and 
reestablished then number of retry count will be  *maxRetryNum* * *maxRetryNum* 
i.e 10*10=100 times.
# And method {{reEstablishSession()}} can be reused rather duplicating same 
logic over embedded electors.  Instead of overriding createConnection() method, 
reEstablishSession() method can be used in ActiveStandByElector constructor.I'd 
prefer to make change in hadoop-common rather in embedded elector service. 

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-09 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950642#comment-14950642
 ] 

Junping Du commented on YARN-4243:
--

Thanks for reporting the issue and delivering the patch, [~xgong]! 
The patch make sense in overall. Some minor comments:
1. I think we are adding a new configuration here, and we may want to add it to 
yarn-default.xml as well. It is only for documentation purpose and we don't 
have to specify default value though.
2. Do we need to add another configuration for sleep interval during retry? 
hard coded with 5 seconds sounds lack of flexibility.
3. If connection still get failed after max retry times, shall we put retry 
times in error messages as well? like: "Can not establish Zookeeper 
Connection... after retry x times").

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-261) Ability to fail AM attempts

2015-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950663#comment-14950663
 ] 

Hudson commented on YARN-261:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk #2414 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2414/])
YARN-261. Ability to fail AM attempts. Contributed by Andrey Klochkov (jlowe: 
rev a0bca2b5ad2344fda5411d910a3730c85f12a0df)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptRegistrationEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/FailApplicationAttemptRequestPBImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockResourceManagerFacade.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/YarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/FailApplicationAttemptResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptFailedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java


> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, 
> YARN-261-

[jira] [Updated] (YARN-4207) Add a non-judgemental YARN app completion status

2015-10-09 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated YARN-4207:
-
Attachment: YARN-4207.patch

Trivial patch that adds ENDED as a YARN application status.

> Add a non-judgemental YARN app completion status
> 
>
> Key: YARN-4207
> URL: https://issues.apache.org/jira/browse/YARN-4207
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>  Labels: trivial
> Attachments: YARN-4207.patch
>
>
> For certain applications, it doesn't make sense to have SUCCEEDED or FAILED 
> end state. For example, Tez sessions may include multiple DAGs, some of which 
> have succeeded and some have failed; there's no clear status for the session 
> both logically and from user perspective (users are confused either way). 
> There needs to be a status not implying success or failure, such as 
> "done"/"ended"/"finished".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3880) Writing more RM side app-level metrics

2015-10-09 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-3880:
---

Assignee: Naganarasimha G R  (was: Zhijie Shen)

> Writing more RM side app-level metrics
> --
>
> Key: YARN-3880
> URL: https://issues.apache.org/jira/browse/YARN-3880
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Naganarasimha G R
>
> In YARN-3044, we implemented an analog of metrics publisher for ATS v1. While 
> it helps to write app/attempt/container life cycle events, it really doesn't 
> write  as many app-level system metrics that RM are now having.  Just list 
> the metrics that I found missing:
> * runningContainers
> * memorySeconds
> * vcoreSeconds
> * preemptedResourceMB
> * preemptedResourceVCores
> * numNonAMContainerPreempted
> * numAMContainerPreempted
> Please feel fee to add more into the list if you find it's not covered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-261) Ability to fail AM attempts

2015-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950796#comment-14950796
 ] 

Hudson commented on YARN-261:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2448 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2448/])
YARN-261. Ability to fail AM attempts. Contributed by Andrey Klochkov (jlowe: 
rev a0bca2b5ad2344fda5411d910a3730c85f12a0df)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/FailApplicationAttemptRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptFailedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptRegistrationEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/FailApplicationAttemptResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockResourceManagerFacade.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/YarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* hadoop-yarn-project/CHANGES.txt


> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, 
>

[jira] [Commented] (YARN-261) Ability to fail AM attempts

2015-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950801#comment-14950801
 ] 

Hudson commented on YARN-261:
-

FAILURE: Integrated in Hadoop-Yarn-trunk #1241 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1241/])
YARN-261. Ability to fail AM attempts. Contributed by Andrey Klochkov (jlowe: 
rev a0bca2b5ad2344fda5411d910a3730c85f12a0df)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptRegistrationEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/FailApplicationAttemptResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptFailedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/FailApplicationAttemptRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockResourceManagerFacade.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/YarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, 
> YARN-261-

[jira] [Moved] (YARN-4247) Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing events

2015-10-09 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot moved MAPREDUCE-6509 to YARN-4247:


Component/s: (was: resourcemanager)
 resourcemanager
 fairscheduler
Key: YARN-4247  (was: MAPREDUCE-6509)
Project: Hadoop YARN  (was: Hadoop Map/Reduce)

> Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing 
> events
> -
>
> Key: YARN-4247
> URL: https://issues.apache.org/jira/browse/YARN-4247
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> We see this deadlock in our testing where events do not get processed and we 
> see this in the logs before the RM dies of OOM {noformat} 2015-10-08 
> 04:48:01,918 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of 
> event-queue is 1488000 2015-10-08 04:48:01,918 INFO 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 1488000 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-09 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950835#comment-14950835
 ] 

Bikas Saha commented on YARN-1509:
--

A change container request (maybe not supported now) can be increase cpu + 
decrease memory. Hence a built in concept of increase and decrease in the API 
is something I am wary off.
So how about
{code} public abstract void onContainersResourceChanged(
Map oldToNewContainers); 
OR 
public abstract void onContainersResourceChanged(
List  updatedContainerInfo);{code}

Would there be a case (maybe not currently) when a change container request can 
fail on the RM? Should the callback allow notifying about a failure to change 
the container?
What is the RM notifies AMRMClient about a container completed. That container 
happens to have a pending change request? What should happen in this case? 
Should the AMRM client clear that pending request? Should it also notify the 
user that pending container change request has failed or just rely on 
onContainerCompleted() to let the AM get that information.

I would be wary of overloading cancel with a second container change request. 
To be clear, here we are discussing user facing semantics and API. Having clear 
semantics is important vs implicit or overloaded behavior. E.g. are there cases 
where an increase followed by a decrease request is a valid scenario and how 
would that be different compare to an increase followed by a cancel. Should the 
RM do different things for increase followed by cancel vs increase followed by 
decrease?

AM restart does not need any handling since the AM is going to start from a 
clean slate. Sorry, my bad.

I missed the handling of the RM restart case. Is there an existing test for 
that code path that could be augmented to make sure that the new changes are 
tested?


> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch, YARN-1509.5.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-261) Ability to fail AM attempts

2015-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950897#comment-14950897
 ] 

Hudson commented on YARN-261:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #476 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/476/])
YARN-261. Ability to fail AM attempts. Contributed by Andrey Klochkov (jlowe: 
rev a0bca2b5ad2344fda5411d910a3730c85f12a0df)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/FailApplicationAttemptRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptRegistrationEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/YarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptFailedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/FailApplicationAttemptResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockResourceManagerFacade.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java


> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, 
>

[jira] [Commented] (YARN-261) Ability to fail AM attempts

2015-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950914#comment-14950914
 ] 

Hudson commented on YARN-261:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #504 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/504/])
YARN-261. Ability to fail AM attempts. Contributed by Andrey Klochkov (jlowe: 
rev a0bca2b5ad2344fda5411d910a3730c85f12a0df)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/YarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/MockResourceManagerFacade.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptFailedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptRegistrationEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/FailApplicationAttemptRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/FailApplicationAttemptResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FailApplicationAttemptResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java


> Ability to fail AM attempts
> ---
>
> Key: YARN-261
> URL: https://issues.apache.org/jira/browse/YARN-261
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-261.patch, 0002-YARN-261.patch, 
> YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4

[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-10-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950919#comment-14950919
 ] 

Jason Lowe commented on YARN-2902:
--

Thanks for updating the patch, Varun!

Sorry, I'm still a little confused on why we need to complicate the localizer 
protocol to fix this issue.  Seems like this is a hack to help the NM figure 
out what's going on, but it should already know this stuff.  That prompted me 
to dig around for an alternative solution, and I think I found one.

The NM knows the local path where a resource is localized, since it tells the 
localizer where to put it in the download request.  Also each localizer has a 
LocalizerRunner thread that is tracking it, and it knows which resources were 
pending when the localizer process exits.  That's tracked in the {{scheduled}} 
map so the runner thread can unlock every pending resource to allow a 
subsequent localizer to try downloading it again.  Seems to me all we need to 
do is have the LocalizerRunner issue a delete of the local path and temporary 
download path for each resource that was pending at the time the localizer 
process died, since we know any pending resources when a localizer exits must 
have been orphaned.  Resources that were successfully localized are pulled out 
of the {{scheduled}} map, so the only things left should be the ones we need to 
process for cleanup.

That seems like a much simpler implementation as it doesn't change any 
protocols and doesn't rely on the container localizer doing any cleanup.  The 
NM will automatically do so when it exits.  We also don't need delayed deletion 
support in DeletionService, since we know the container localizer process is 
dead.

Maybe I'm missing something and that approach can't work.  If it can then that 
seems like a preferable solution for 2.7 as it will be a smaller, simpler patch.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, 
> YARN-2902.07.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4247) Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing events

2015-10-09 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4247:
---
Priority: Blocker  (was: Major)

> Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing 
> events
> -
>
> Key: YARN-4247
> URL: https://issues.apache.org/jira/browse/YARN-4247
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
>
> We see this deadlock in our testing where events do not get processed and we 
> see this in the logs before the RM dies of OOM {noformat} 2015-10-08 
> 04:48:01,918 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of 
> event-queue is 1488000 2015-10-08 04:48:01,918 INFO 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 1488000 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-09 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950960#comment-14950960
 ] 

Karthik Kambatla commented on YARN-4243:


I would like to review the patch before commit. 

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-10-09 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950963#comment-14950963
 ] 

Jonathan Eagles commented on YARN-4183:
---

[~xgong], any thoughts on the current patch based on my above comments?

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-10-09 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950989#comment-14950989
 ] 

Varun Saxena commented on YARN-2902:


[~jlowe], thanks for looking at the patch. 

The reason I inserted delete downloading flag in the protocol was to indicate 
to localizer that the resources it reported to NM in last HB were not processed 
by NM. So localizer needs to delete them. That is why an extra list of paths 
was maintained in localizer(paths which have been reported to NM for download).
I was primarily working on the principle that we can delete as much as we can 
in localizer. So that if NM crashes and its not work preserving, paths can be 
deleted. And vice versa. So 2 points of deletion can make it almost sure that 
downloading resources are deleted.

But yeah this does make it complex.

You are correct that NM will know about these paths as well and can delete 
them. The extra flag in localizer protocol thus can be removed.

I will update the patch.


> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, 
> YARN-2902.07.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-09 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950990#comment-14950990
 ] 

Junping Du commented on YARN-4243:
--

No worry. Nobody want to commit it right now as we all leave concrete 
review/improvement comments.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-10-09 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951010#comment-14951010
 ] 

Varun Saxena commented on YARN-2902:


Sorry I mean NM crashes and recovery is not enabled.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, 
> YARN-2902.07.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-10-09 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951016#comment-14951016
 ] 

Varun Saxena commented on YARN-2902:


Just to let you know, one case where this wont work, I mean after removal of 
flag from protocol.

1. NM recovery is disabled.
2. Container is killed. Associated resources are stuck in downloading state and 
a deletion task is launched for them.
3. In the meantime localizer downloads a resource and on next HB, Localizer 
reports a downloaded resource to NM. In NM this will be in downloading state.
4. NM indicates localizer to DIE. Localizer wont delete the resource just 
downloaded.
5. NM crashes.
6. NM would missing deleting the downloading resource as well as recovery is 
disabled.

This I agree though should be a very rare scenario and we can skip it.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, 
> YARN-2902.07.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4248) REST API for submit/update/delete Reservations

2015-10-09 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-4248:
--

 Summary: REST API for submit/update/delete Reservations
 Key: YARN-4248
 URL: https://issues.apache.org/jira/browse/YARN-4248
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.8.0
Reporter: Carlo Curino
Assignee: Carlo Curino


This JIRA tracks work to extend the RMWebService to support REST APIs to 
submit/update/delete reservations. This will ease integration with external 
tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4248) REST API for submit/update/delete Reservations

2015-10-09 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-4248:
---
Attachment: YARN-4248.patch

First cut of patch... 

> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-10-09 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951034#comment-14951034
 ] 

Varun Saxena commented on YARN-2902:


[~jlowe],
bq. We also don't need delayed deletion support in DeletionService, since we 
know the container localizer process is dead.
Localizer doesnt exit immediately as of now when container is killed, even 
though we interrupt the thread. We first issue a DIE on next HB and then only 
localizer exits.
>From the time container is killed and localizer exits, a resource maybe 
>downloaded or may start downloading. We will delete the tmp directory and main 
>directory.
But if download is started by localizer in the meantime, it will recreate the 
directories (in FSDownload#call).
That is why delay has been introduced. Thoughts ?

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, 
> YARN-2902.07.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4179) [reader implementation] support flow activity queries based on time

2015-10-09 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4179:
---
Attachment: YARN-4179-YARN-2928.01.patch

> [reader implementation] support flow activity queries based on time
> ---
>
> Key: YARN-4179
> URL: https://issues.apache.org/jira/browse/YARN-4179
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4179-YARN-2928.01.patch
>
>
> This came up as part of YARN-4074 and YARN-4075.
> Currently the only query pattern that's supported on the flow activity table 
> is by cluster only. But it might be useful to support queries by cluster and 
> certain date or dates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4179) [reader implementation] support flow activity queries based on time

2015-10-09 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951052#comment-14951052
 ] 

Varun Saxena commented on YARN-4179:


The approach chosen in the patch is that date will be in format ddMM and 
timezone will be assumed as GMT.
Now the issue here is should we fix the format ? Across geographies the popular 
date format varies.

Another solution is that date range can be specified as seconds since epoch. 
Here the issue is that instead of a single date any timestamp can be specified 
within a day.
We can although normalize the timestamp in date range by converting it into top 
of the day timestamp and then firing the query to backend. 

So would welcome views from others on this regarding which approach to follow. 

> [reader implementation] support flow activity queries based on time
> ---
>
> Key: YARN-4179
> URL: https://issues.apache.org/jira/browse/YARN-4179
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4179-YARN-2928.01.patch
>
>
> This came up as part of YARN-4074 and YARN-4075.
> Currently the only query pattern that's supported on the flow activity table 
> is by cluster only. But it might be useful to support queries by cluster and 
> certain date or dates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-10-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951115#comment-14951115
 ] 

Jason Lowe commented on YARN-2902:
--

The key with the new proposal is that the LocalizerRunner thread is the one 
issuing the deletes and only after the localizer process exits.

bq. 2. Container is killed. Associated resources are stuck in downloading state 
and a deletion task is launched for them.
That deletion task would not be launched because the localizer has not exited.  
The LocalizerRunner will still be waiting on the localizer process to exit.

bq. Localizer doesnt exit immediately as of now when container is killed, even 
though we interrupt the thread.
Yes, and that's fine.  We won't issue the deletion requests until the localizer 
process eventually exits.

The key is this code in LocalizerRunner:
{code:title=LocalizerRunner}
public void run() {
  Path nmPrivateCTokensPath = null;
  Throwable exception = null;
  try {
[...localizer pre-startup code removed for brevity...]
if (dirsHandler.areDisksHealthy()) {
  exec.startLocalizer(new LocalizerStartContext.Builder()
  .setNmPrivateContainerTokens(nmPrivateCTokensPath)
  .setNmAddr(localizationServerAddress)
  .setUser(context.getUser())
  .setAppId(ConverterUtils.toString(context.getContainerId()
  .getApplicationAttemptId().getApplicationId()))
  .setLocId(localizerId)
  .setDirsHandler(dirsHandler)
  .build());
} else {
  throw new IOException("All disks failed. "
  + dirsHandler.getDisksHealthReport(false));
}
  // TODO handle ExitCodeException separately?
  } catch (FSError fe) {
exception = fe;
  } catch (Exception e) {
exception = e;
  } finally {
if (exception != null) {
  LOG.info("Localizer failed", exception);
  // On error, report failure to Container and signal ABORT
  // Notify resource of failed localization
  ContainerId cId = context.getContainerId();
  dispatcher.getEventHandler().handle(new ContainerResourceFailedEvent(
  cId, null, exception.getMessage()));
}
for (LocalizerResourceRequestEvent event : scheduled.values()) {
  event.getResource().unlock();
}
delService.delete(null, nmPrivateCTokensPath, new Path[] {});
  }
{code}

startLocalizer won't return until the localizer process exits, so when it 
iterates the {{scheduled}} map in the finally block to unlock the resources we 
can issue deletions for the local resource paths at the same time.


> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, 
> YARN-2902.07.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-10-09 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951148#comment-14951148
 ] 

Varun Saxena commented on YARN-2902:


[~jlowe],
bq. The key with the new proposal is that the LocalizerRunner thread is the one 
issuing the deletes and only after the localizer process exits.
+1. This makes sense. I cant think of any case where this won't work.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, 
> YARN-2902.07.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4247) Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing events

2015-10-09 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951161#comment-14951161
 ] 

Anubhav Dhoot commented on YARN-4247:
-

Looking at the jstack here is the deadlock between FS and RMAppAttemptImpl
The first thread has a lock on FSAppAttempt and is waiting on the 
RMAppAttemptImpl lock
The second thread RMAppAttemptImpl.getApplicationResourceUsageReport has taken 
a readlock and waiting on FSAppAttempt
This causes other threads (eg. third thread) such as the AsyncDispatcher 
threads to get  blocked causing RM to stop processing events and then crash 
with OOM because of the backlog of events.

{noformat}
"IPC Server handler 49 on 8030" #239 daemon prio=5 os_prio=0 
tid=0x01093000 nid=0x8206 waiting on condition [0x7f930b2da000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00071719e0f0> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.*RMAppAttemptImpl*.getMasterContainer(RMAppAttemptImpl.java:747)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.isWaitingForAMContainer(SchedulerApplicationAttempt.java:482)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:938)
- locked <0x000715932d98> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.*FSAppAttempt*)
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:529)
- locked <0x0007171a5328> (a 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)


"IPC Server handler 9 on 8032" #253 daemon prio=5 os_prio=0 
tid=0x00e2e800 nid=0x8214 waiting for monitor entry [0x7f930a4cd000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceUsageReport(SchedulerApplicationAttempt.java:570)
- waiting to lock <0x000715932d98> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getAppResourceUsageReport(AbstractYarnScheduler.java:241)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics.getAggregateAppResourceUsage(RMAppAttemptMetrics.java:114)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.*RMAppAttemptImpl*.getApplicationResourceUsageReport(RMAppAttemptImpl.java:798)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:655)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:330)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:170)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:401)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ip

[jira] [Updated] (YARN-4247) Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing events

2015-10-09 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-4247:

Attachment: YARN-4247.001.patch

Fix removes need for locking from FSAppAttempt to RMAppAttemptImpl.

> Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing 
> events
> -
>
> Key: YARN-4247
> URL: https://issues.apache.org/jira/browse/YARN-4247
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
> Attachments: YARN-4247.001.patch
>
>
> We see this deadlock in our testing where events do not get processed and we 
> see this in the logs before the RM dies of OOM {noformat} 2015-10-08 
> 04:48:01,918 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of 
> event-queue is 1488000 2015-10-08 04:48:01,918 INFO 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 1488000 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3912) Fix typos in hadoop-yarn-project module

2015-10-09 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-3912:
-
Assignee: Neelesh Srinivas Salian  (was: Ray Chiang)

> Fix typos in hadoop-yarn-project module
> ---
>
> Key: YARN-3912
> URL: https://issues.apache.org/jira/browse/YARN-3912
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 2.7.1
>Reporter: Ray Chiang
>Assignee: Neelesh Srinivas Salian
>Priority: Minor
>  Labels: supportability
> Attachments: YARN-3912.001.patch
>
>
> Fix a bunch of typos in comments, strings, variable names, and method names 
> in the hadoop-yarn-project module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4247) Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing events

2015-10-09 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-4247:

Attachment: YARN-4247.001.patch

retrigger jenkins

> Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing 
> events
> -
>
> Key: YARN-4247
> URL: https://issues.apache.org/jira/browse/YARN-4247
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
> Attachments: YARN-4247.001.patch, YARN-4247.001.patch
>
>
> We see this deadlock in our testing where events do not get processed and we 
> see this in the logs before the RM dies of OOM {noformat} 2015-10-08 
> 04:48:01,918 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of 
> event-queue is 1488000 2015-10-08 04:48:01,918 INFO 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 1488000 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4235) FairScheduler PrimaryGroup does not handle empty groups returned for a user

2015-10-09 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951331#comment-14951331
 ] 

Anubhav Dhoot commented on YARN-4235:
-

Thanks [~rohithsharma] for review and commit!

> FairScheduler PrimaryGroup does not handle empty groups returned for a user 
> 
>
> Key: YARN-4235
> URL: https://issues.apache.org/jira/browse/YARN-4235
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Fix For: 2.8.0
>
> Attachments: YARN-4235.001.patch
>
>
> We see NPE if empty groups are returned for a user. This causes a NPE and 
> cause RM to crash as below
> {noformat}
> 2015-09-22 16:51:52,780  FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ADDED to the scheduler
> java.lang.IndexOutOfBoundsException: Index: 0
>   at java.util.Collections$EmptyList.get(Collections.java:3212)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule$PrimaryGroup.getQueueForApp(QueuePlacementRule.java:149)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:689)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:595)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1180)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-09-22 16:51:52,797  INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3912) Fix typos in hadoop-yarn-project module

2015-10-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951392#comment-14951392
 ] 

Hadoop QA commented on YARN-3912:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744765/YARN-3912.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4f6e842 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9395/console |


This message was automatically generated.

> Fix typos in hadoop-yarn-project module
> ---
>
> Key: YARN-3912
> URL: https://issues.apache.org/jira/browse/YARN-3912
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 2.7.1
>Reporter: Ray Chiang
>Assignee: Neelesh Srinivas Salian
>Priority: Minor
>  Labels: supportability
> Attachments: YARN-3912.001.patch
>
>
> Fix a bunch of typos in comments, strings, variable names, and method names 
> in the hadoop-yarn-project module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4201) AMBlacklist does not work for minicluster

2015-10-09 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951393#comment-14951393
 ] 

zhihai xu commented on YARN-4201:
-

+1 for the latest patch. I will wait for one or two days before committing for 
others to look at the patch.

> AMBlacklist does not work for minicluster
> -
>
> Key: YARN-4201
> URL: https://issues.apache.org/jira/browse/YARN-4201
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4021.001.patch, YARN-4201.002.patch, 
> YARN-4201.003.patch
>
>
> For minicluster (scheduler.include-port-in-node-name is set to TRUE), 
> AMBlacklist does not work. It is because RM just puts host to AMBlacklist 
> whether scheduler.include-port-in-node-name is set or not. In fact RM should 
> put "host + port" to AMBlacklist when it is set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-10-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951401#comment-14951401
 ] 

Wangda Tan commented on YARN-4162:
--

bq. well i can avoid the if block but csqinfo.label itself cannot be set to the 
default Partition as its also been used as flag to determine to show the leaf 
queue in the normal way or the partition way.
I think it's fine if there's no regression for existing UI.

bq. isExclusiveNodeLabel is the check we had earlier in 
CapacitySchedulerInfo.getQueues, basically to avoid displaying the queues which 
is not accessible to a given NodeLabelPartition.
Make sense, I can remember that now :)

bq. Well shall i update in all places displayed in UI or only in REST ?
It's not necessary to update all places, but if all web UI is using a same 
final field, they will be updated automatically.
And I think if {{<...>}} is removed, we need a separated JIRA to node labels 
manager and client to make sure "DEFAULT_PARTITION" is a reserved partition 
name that no one should add/remove or use it (you cannot say I want to 
associate one node to the "DEFAULT_PARTITION").
What do you think?

> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch, YARN-4162.v2.001.patch, 
> YARN-4162.v2.002.patch, YARN-4162.v2.003.patch, restAndJsonOutput.zip
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-10-09 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4140:
-
Component/s: (was: api)
 (was: client)

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep 
> "root.b.b1" | wc -l
> 500
> {code}
>  
> (Consumes about 6 minutes)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-10-09 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4140:
-
Component/s: (was: resourcemanager)
 scheduler

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep 
> "root.b.b1" | wc -l
> 500
> {code}
>  
> (Consumes about 6 minutes)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-10-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951423#comment-14951423
 ] 

Wangda Tan commented on YARN-3216:
--

Hi [~sunilg],

I think the changes to AbstractCSQueue may not be necessary, could you take a 
look at my previous comment (copied here) and let me know your thoughts?
bq. AbstractCSQueue: Instead of adding AM-used-resource to parentQueue, I think 
we may only need to calculate AM-used-resource on LeafQueueu and user. 
Currently we don't have limitation of AM-used-resource on parentQueue, so the 
aggregated resource may not be very useful. We can add it along the hierachy if 
we want to limit max-am-percent on parentQueue in the future.

If max-am-percent for queue-partitions isn't set, I think it should use 
queue.max-am-percent instead of 0 to avoid painful of configuration. (admin has 
to set max-am-percent after add a new partition)

I found the logic in your patch is: if max-am-percent for partition-x is not 
set, partition-x's am-limit equals to default-partition's am-limit, which is 
not correct to me. am-limit under each partition should be calculated 
independently, since total resource for different partitions varies.

If you agree, could you merge the am-limit computation logic of default 
partition and specific partition?

Thoughts?

Thanks,

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch, 0004-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-10-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951429#comment-14951429
 ] 

Hadoop QA commented on YARN-4248:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 57s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   8m 13s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |  10m 32s | The applied patch generated  3  
additional warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 57s | The applied patch generated  
57 new checkstyle issues (total was 40, now 97). |
| {color:red}-1{color} | whitespace |   0m  7s | The patch has 5  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 33s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  57m 24s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 101m 13s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765889/YARN-4248.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4f6e842 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/9394/artifact/patchprocess/diffJavadocWarnings.txt
 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9394/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9394/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9394/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9394/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9394/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9394/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9394/console |


This message was automatically generated.

> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3880) Writing more RM side app-level metrics

2015-10-09 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951437#comment-14951437
 ] 

Naganarasimha G R commented on YARN-3880:
-

Hi [~sjlee0], [~gtCarrera] & [~djp],
Regarding this jira i had few queries, 
# *runningContainers* it makes more sense to capture only when we periodically 
collect this stats when app is in running state. Is that the intention ?  
*memorySeconds & vcoreSeconds* is already being captured in V1 as well as V2 
when application is finished.  Are these planned to be captured periodically? I 
feel current Container Metrics which are getting aggregated  also gives some 
thing in the same lines also if captured in RM periodically for all apps, then 
load on the RM will be more. But if required, similar to the ContainerMonitor i 
can create AppMonitor and collect these stats and publish them to ATSV2 
# preemptedResourceMB and other info related to preemption are not captured, 
but again like previous query do we need to capture periodically or is it 
sufficient to capture it at the end of the app?


> Writing more RM side app-level metrics
> --
>
> Key: YARN-3880
> URL: https://issues.apache.org/jira/browse/YARN-3880
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Naganarasimha G R
>
> In YARN-3044, we implemented an analog of metrics publisher for ATS v1. While 
> it helps to write app/attempt/container life cycle events, it really doesn't 
> write  as many app-level system metrics that RM are now having.  Just list 
> the metrics that I found missing:
> * runningContainers
> * memorySeconds
> * vcoreSeconds
> * preemptedResourceMB
> * preemptedResourceVCores
> * numNonAMContainerPreempted
> * numAMContainerPreempted
> Please feel fee to add more into the list if you find it's not covered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-10-09 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951445#comment-14951445
 ] 

Naganarasimha G R commented on YARN-4162:
-

bq. It's not necessary to update all places, but if all web UI is using a same 
final field, they will be updated automatically.
Well was thinking to avoid it by having 2 constants one for the 
web("") and one for PartitionName as  "DEFAULT_PARTITION".  
And the Web One making use of PartitionName.

bq. And I think if <...> is removed, we need a separated JIRA to node labels 
manager and client to make sure DEFAULT_PARTITION" is a reserved partition name 
that no one should add/remove or use it
Well this seems to be a valid point but irrespective of the above modification 
do i need to do raise it ?

> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch, YARN-4162.v2.001.patch, 
> YARN-4162.v2.002.patch, YARN-4162.v2.003.patch, restAndJsonOutput.zip
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4129) Refactor the SystemMetricPublisher in RM to better support newer events

2015-10-09 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951458#comment-14951458
 ] 

Sangjin Lee commented on YARN-4129:
---

Sorry [~Naganarasimha] it took me a while to review this.

I'm generally +1 on the effort here to reduce unnecessary layers (getting rid 
of event types) and additional flexibility you mentioned.

I know also there is a discussion on whether we should set createdTime, 
modifiedTime, etc. on the entities themselves (forgot the JIRA id), and it has 
some implications on this. I will chime in there later, but IMO it'd be good to 
set things like createdTime directly on the entities to have consistent and 
uniform access to those important times. We can make those changes (if we 
agree) in that JIRA, though.

(ResourceManager.java)
- l.396: the service is being added twice (another in l.275); I would say 
remove l.396
- l.514: I'm slightly confused that (apart from l.396) the publisher is 
registered once to the RM itself and another time here to the RMActiveServices. 
Is it needed? How would the service stop work (since these are composite 
services)?

(SystemMetricsPublisher.java)
- l.27: nit: space before the brace

(TimelineServiceV2Publisher.java)
- l.80: normally we call {{super.serviceStart()}} at the end rather than at the 
beginning, right?
- l.155: Are you referring to the issue of having the app level timeline 
collector hanging around to process late-coming writes? If so, we should add 
the JIRA id here in the comment so we that we can keep track. If not, could you 
please explain the TODO comment here?


> Refactor the SystemMetricPublisher in RM to better support newer events
> ---
>
> Key: YARN-4129
> URL: https://issues.apache.org/jira/browse/YARN-4129
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4129.YARN-2928.001.patch
>
>
> Currently to add new timeline event/ entity in RM side, one has to add a 
> method in publisher and a method in handler and create a new event class 
> which looks cumbersome and redundant. also further all the events might not 
> be required to be published in V1 & V2. So adopting the approach similar to 
> what was adopted in YARN-3045(NM side)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4247) Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing events

2015-10-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951491#comment-14951491
 ] 

Hadoop QA commented on YARN-4247:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 28s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 13s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 34s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 19s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 34s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  57m  5s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  96m 55s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765925/YARN-4247.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / def374e |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9396/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9396/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9396/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9396/console |


This message was automatically generated.

> Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing 
> events
> -
>
> Key: YARN-4247
> URL: https://issues.apache.org/jira/browse/YARN-4247
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
> Attachments: YARN-4247.001.patch, YARN-4247.001.patch
>
>
> We see this deadlock in our testing where events do not get processed and we 
> see this in the logs before the RM dies of OOM {noformat} 2015-10-08 
> 04:48:01,918 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of 
> event-queue is 1488000 2015-10-08 04:48:01,918 INFO 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 1488000 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951494#comment-14951494
 ] 

Hudson commented on YARN-4140:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #8604 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8604/])
YARN-4140. RM container allocation delayed incase of app submitted to (wangda: 
rev def374e666ed0c1d665aeb1b7307e09769448138)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java


> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.P

[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951499#comment-14951499
 ] 

Hudson commented on YARN-4140:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1244 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1244/])
YARN-4140. RM container allocation delayed incase of app submitted to (wangda: 
rev def374e666ed0c1d665aeb1b7307e09769448138)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* hadoop-yarn-project/CHANGES.txt


> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.Paren

[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-09 Thread Bob (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951510#comment-14951510
 ] 

Bob commented on YARN-4041:
---

Hi, [~sunilg]  , Any update or idea on this issue? 

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951525#comment-14951525
 ] 

Hudson commented on YARN-4140:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #518 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/518/])
YARN-4140. RM container allocation delayed incase of app submitted to (wangda: 
rev def374e666ed0c1d665aeb1b7307e09769448138)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java


> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capa

[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951544#comment-14951544
 ] 

Hudson commented on YARN-4140:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #507 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/507/])
YARN-4140. RM container allocation delayed incase of app submitted to (wangda: 
rev def374e666ed0c1d665aeb1b7307e09769448138)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* hadoop-yarn-project/CHANGES.txt


> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.sche

[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-10-09 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951554#comment-14951554
 ] 

Bibin A Chundatt commented on YARN-4140:


 Thank you [~leftnoteasy] for review and committing 

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep 
> "root.b.b1" | wc -l
> 500
> {code}
>  
> (Consumes about 6 minutes)
>  



--
This message was sent

[jira] [Commented] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml

2015-10-09 Thread forrestchen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951560#comment-14951560
 ] 

forrestchen commented on YARN-4022:
---

Could anyone help review this issue please?

> queue not remove from webpage(/cluster/scheduler) when delete queue in 
> xxx-scheduler.xml
> 
>
> Key: YARN-4022
> URL: https://issues.apache.org/jira/browse/YARN-4022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: forrestchen
>  Labels: scheduler
> Attachments: YARN-4022.001.patch, YARN-4022.002.patch, 
> YARN-4022.003.patch, YARN-4022.004.patch
>
>
> When I delete an existing queue by modify the xxx-schedule.xml, I can still 
> see the queue information block in webpage(/cluster/scheduler) though the 
> 'Min Resources' items all become to zero and have no item of 'Max Running 
> Applications'.
> I can still submit an application to the deleted queue and the application 
> will run using 'root.default' queue instead, but submit to an un-exist queue 
> will cause an exception.
> My expectation is the deleted queue will not displayed in webpage and submit 
> application to the deleted queue will act just like the queue doesn't exist.
> PS: There's no application running in the queue I delete.
> Some related config in yarn-site.xml:
> {code}
> 
> yarn.scheduler.fair.user-as-default-queue
> false
> 
> 
> yarn.scheduler.fair.allow-undeclared-pools
> false
> 
> {code}
> a related question is here: 
> http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951561#comment-14951561
 ] 

Hudson commented on YARN-4140:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2452 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2452/])
YARN-4140. RM container allocation delayed incase of app submitted to (wangda: 
rev def374e666ed0c1d665aeb1b7307e09769448138)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java


> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capa

[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951565#comment-14951565
 ] 

Hudson commented on YARN-4140:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #479 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/479/])
YARN-4140. RM container allocation delayed incase of app submitted to (wangda: 
rev def374e666ed0c1d665aeb1b7307e09769448138)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java


> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capa

[jira] [Created] (YARN-4249) Many options in "yarn application" command is not documents

2015-10-09 Thread nijel (JIRA)
nijel created YARN-4249:
---

 Summary: Many options in "yarn application" command is not 
documents
 Key: YARN-4249
 URL: https://issues.apache.org/jira/browse/YARN-4249
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel


in document only few options are specified.
{code}
Usage: `yarn application [options] `

| COMMAND\_OPTIONS | Description |
|: |: |
| -appStates \ | Works with -list to filter applications based on 
input comma-separated list of application states. The valid application state 
can be one of the following:  ALL, NEW, NEW\_SAVING, SUBMITTED, ACCEPTED, 
RUNNING, FINISHED, FAILED, KILLED |
| -appTypes \ | Works with -list to filter applications based on input 
comma-separated list of application types. |
| -list | Lists applications from the RM. Supports optional use of -appTypes to 
filter applications based on application type, and -appStates to filter 
applications based on application state. |
| -kill \ | Kills the application. |
| -status \ | Prints the status of the application. |
{code}


some options are missing like
-appId  Specify Application Id to be operated
-help   Displays help for all commands.
-movetoqueueMoves the application to a different queue.
-queue  Works with the movetoqueue command to specify 
which queue to move an application to.
-updatePriority   update priority of an application.ApplicationId 
can be passed using 'appId' option.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4249) Many options in "yarn application" command is not documented

2015-10-09 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-4249:

Summary: Many options in "yarn application" command is not documented  
(was: Many options in "yarn application" command is not documents)

> Many options in "yarn application" command is not documented
> 
>
> Key: YARN-4249
> URL: https://issues.apache.org/jira/browse/YARN-4249
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: nijel
>Assignee: nijel
>
> in document only few options are specified.
> {code}
> Usage: `yarn application [options] `
> | COMMAND\_OPTIONS | Description |
> |: |: |
> | -appStates \ | Works with -list to filter applications based on 
> input comma-separated list of application states. The valid application state 
> can be one of the following:  ALL, NEW, NEW\_SAVING, SUBMITTED, ACCEPTED, 
> RUNNING, FINISHED, FAILED, KILLED |
> | -appTypes \ | Works with -list to filter applications based on 
> input comma-separated list of application types. |
> | -list | Lists applications from the RM. Supports optional use of -appTypes 
> to filter applications based on application type, and -appStates to filter 
> applications based on application state. |
> | -kill \ | Kills the application. |
> | -status \ | Prints the status of the application. |
> {code}
> some options are missing like
> -appId  Specify Application Id to be operated
> -help   Displays help for all commands.
> -movetoqueueMoves the application to a different queue.
> -queue  Works with the movetoqueue command to specify 
> which queue to move an application to.
> -updatePriority   update priority of an 
> application.ApplicationId can be passed using 'appId' option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-10-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951658#comment-14951658
 ] 

Hudson commented on YARN-4140:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2417 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2417/])
YARN-4140. RM container allocation delayed incase of app submitted to (wangda: 
rev def374e666ed0c1d665aeb1b7307e09769448138)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* hadoop-yarn-project/CHANGES.txt


> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.Paren