[jira] [Updated] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2015-12-18 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2962:
---
Hadoop Flags: Incompatible change

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch
>
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4488) CapacityScheduler: Compute per-container allocation latency and roll up to get per-application and per-queue

2015-12-18 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-4488:


Assignee: Wangda Tan

> CapacityScheduler: Compute per-container allocation latency and roll up to 
> get per-application and per-queue
> 
>
> Key: YARN-4488
> URL: https://issues.apache.org/jira/browse/YARN-4488
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Karthik Kambatla
>Assignee: Wangda Tan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4485) Capture per-application and per-queue container allocation latency

2015-12-18 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-4485:
--

Assignee: (was: Karthik Kambatla)

Leaving the umbrella JIRA unassigned. 

> Capture per-application and per-queue container allocation latency
> --
>
> Key: YARN-4485
> URL: https://issues.apache.org/jira/browse/YARN-4485
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>  Labels: supportability, tuning
>
> Per-application and per-queue container allocation latencies would go a long 
> way towards help with tuning scheduler queue configs. 
> This umbrella JIRA tracks adding these metrics. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4485) [Umbrella] Capture per-application and per-queue container allocation latency

2015-12-18 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4485:
---
Summary: [Umbrella] Capture per-application and per-queue container 
allocation latency  (was: Capture per-application and per-queue container 
allocation latency)

> [Umbrella] Capture per-application and per-queue container allocation latency
> -
>
> Key: YARN-4485
> URL: https://issues.apache.org/jira/browse/YARN-4485
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>  Labels: supportability, tuning
>
> Per-application and per-queue container allocation latencies would go a long 
> way towards help with tuning scheduler queue configs. 
> This umbrella JIRA tracks adding these metrics. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4488) CapacityScheduler: Compute per-container allocation latency and roll up to get per-application and per-queue

2015-12-18 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-4488:
--

 Summary: CapacityScheduler: Compute per-container allocation 
latency and roll up to get per-application and per-queue
 Key: YARN-4488
 URL: https://issues.apache.org/jira/browse/YARN-4488
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Karthik Kambatla






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4487) FairScheduler: Compute per-container allocation latency and roll up to get per-application and per-queue

2015-12-18 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-4487:
--

 Summary: FairScheduler: Compute per-container allocation latency 
and roll up to get per-application and per-queue
 Key: YARN-4487
 URL: https://issues.apache.org/jira/browse/YARN-4487
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4486) Add requestTime to ResourceRequest

2015-12-18 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-4486:
--

 Summary: Add requestTime to ResourceRequest
 Key: YARN-4486
 URL: https://issues.apache.org/jira/browse/YARN-4486
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


Add a field requestTime to ResourceRequest. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4485) Capture per-application and per-queue container allocation latency

2015-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065253#comment-15065253
 ] 

Karthik Kambatla commented on YARN-4485:


One potential approach is to add requestTime to ResourceRequest and use it to 
compute the difference at allocation time to get per-container allocation 
latency. This could be optionally rolled up at application and queue level. 

> Capture per-application and per-queue container allocation latency
> --
>
> Key: YARN-4485
> URL: https://issues.apache.org/jira/browse/YARN-4485
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: supportability, tuning
>
> Per-application and per-queue container allocation latencies would go a long 
> way towards help with tuning scheduler queue configs. 
> This umbrella JIRA tracks adding these metrics. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4485) Capture per-application and per-queue container allocation latency

2015-12-18 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-4485:
--

 Summary: Capture per-application and per-queue container 
allocation latency
 Key: YARN-4485
 URL: https://issues.apache.org/jira/browse/YARN-4485
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.7.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


Per-application and per-queue container allocation latencies would go a long 
way towards help with tuning scheduler queue configs. 

This umbrella JIRA tracks adding these metrics. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4032) Corrupted state from a previous version can still cause RM to fail with NPE due to same reasons as YARN-2834

2015-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065249#comment-15065249
 ] 

Karthik Kambatla commented on YARN-4032:


[~jianhe] - are you working on this? If not, I would like to take this up. 

> Corrupted state from a previous version can still cause RM to fail with NPE 
> due to same reasons as YARN-2834
> 
>
> Key: YARN-4032
> URL: https://issues.apache.org/jira/browse/YARN-4032
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-4032.prelim.patch
>
>
> YARN-2834 ensures in 2.6.0 there will not be any inconsistent state. But if 
> someone is upgrading from a previous version, the state can still be 
> inconsistent and then RM will still fail with NPE after upgrade to 2.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065243#comment-15065243
 ] 

Hadoop QA commented on YARN-4290:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 39s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 40s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 146m 56s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.8.0_66 Timed out junit tests | 
org.apache.hadoop.yarn.client.cli.TestYarnCLI |
|   | org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestNMClient |
| JDK v1.7.0_91 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.7.0_91 Timed out junit tests | 
org.apache.hadoop.yarn.client.cli.TestYarnCLI |
|   | org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.cli

[jira] [Commented] (YARN-4480) Clean up some inappropriate imports

2015-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065242#comment-15065242
 ] 

Hadoop QA commented on YARN-4480:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 59s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 34s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
3s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
3s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 22s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 58s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 27s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 26s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 26s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 223m 5s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.web.TestWebHDFS |
|   | hadoop.hdfs.server.datanode.TestDataNodeMetrics |
|   | hadoop.hdfs.server.datanode.TestBlockScanner |
|   | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
| JDK v1.7.0_91 Faile

[jira] [Commented] (YARN-110) AM releases too many containers due to the protocol

2015-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065236#comment-15065236
 ] 

Karthik Kambatla commented on YARN-110:
---

Got it. See the value in fixing it. Proposals on how to? 

> AM releases too many containers due to the protocol
> ---
>
> Key: YARN-110
> URL: https://issues.apache.org/jira/browse/YARN-110
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Attachments: YARN-110.patch
>
>
> - AM sends request asking 4 containers on host H1.
> - Asynchronously, host H1 reaches RM and gets assigned 4 containers. RM at 
> this point, sets the value against H1 to
> zero in its aggregate request-table for all apps.
> - In the mean-while AM gets to need 3 more containers, so a total of 7 
> including the 4 from previous request.
> - Today, AM sends the absolute number of 7 against H1 to RM as part of its 
> request table.
> - RM seems to be overriding its earlier value of zero against H1 to 7 against 
> H1. And thus allocating 7 more
> containers.
> - AM already gets 4 in this scheduling iteration, but gets 7 more, a total of 
> 11 instead of the required 7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4476) Matcher for complex node label expresions

2015-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065233#comment-15065233
 ] 

Hadoop QA commented on YARN-4476:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s 
{color} | {color:red} Patch generated 121 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 0, now 121). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
20s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 47s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91
 with JDK v1.7.0_91 generated 2 new issues (was 2, now 4). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 2s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 28s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 24s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 137m 28s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
|   | hadoop.yarn.server.resour

[jira] [Commented] (YARN-4476) Matcher for complex node label expresions

2015-12-18 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065210#comment-15065210
 ] 

Chris Douglas commented on YARN-4476:
-

bq. do you think is it better to place this module to 
org.apache.hadoop.yarn.server.resourcemanager.nodelabels.matcher (or evaluator) 
for better organization?

I thought about it, but:
# It's a (potential) internal detail of the node label implementation, with 
other classes in the package
# The {{nodelabel}} package is sparse right now
# None of these classes are user-facing, so they're easy to move

So I put in in the {{nodelabels}} package, but don't have a strong opinion.

> Matcher for complex node label expresions
> -
>
> Key: YARN-4476
> URL: https://issues.apache.org/jira/browse/YARN-4476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: YARN-4476-0.patch, YARN-4476-1.patch
>
>
> Implementation of a matcher for complex node label expressions based on a 
> [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4476) Matcher for complex node label expresions

2015-12-18 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065210#comment-15065210
 ] 

Chris Douglas edited comment on YARN-4476 at 12/19/15 3:40 AM:
---

bq. do you think is it better to place this module to 
org.apache.hadoop.yarn.server.resourcemanager.nodelabels.matcher (or evaluator) 
for better organization?

I thought about it, but:
# It's a (potential) internal detail of the node label implementation, with 
other classes in the package
# The {{nodelabels}} package is sparse right now
# None of these classes are user-facing, so they're easy to move

So I put in in the {{nodelabels}} package, but don't have a strong opinion.


was (Author: chris.douglas):
bq. do you think is it better to place this module to 
org.apache.hadoop.yarn.server.resourcemanager.nodelabels.matcher (or evaluator) 
for better organization?

I thought about it, but:
# It's a (potential) internal detail of the node label implementation, with 
other classes in the package
# The {{nodelabel}} package is sparse right now
# None of these classes are user-facing, so they're easy to move

So I put in in the {{nodelabels}} package, but don't have a strong opinion.

> Matcher for complex node label expresions
> -
>
> Key: YARN-4476
> URL: https://issues.apache.org/jira/browse/YARN-4476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: YARN-4476-0.patch, YARN-4476-1.patch
>
>
> Implementation of a matcher for complex node label expressions based on a 
> [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails for V2 scenarios

2015-12-18 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065209#comment-15065209
 ] 

Naganarasimha G R commented on YARN-4350:
-

Yes even i was thinking about addiing this issue in it, though the issue 
initially been mentioned that jira (YARN-4385?) is not reproduce able i can 
have this issue there and try to fix it ! 

> TestDistributedShell fails for V2 scenarios
> ---
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch, 
> YARN-4350-feature-YARN-2928.002.patch, YARN-4350-feature-YARN-2928.003.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4476) Matcher for complex node label expresions

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065195#comment-15065195
 ] 

Wangda Tan commented on YARN-4476:
--

Hi [~chris.douglas],

It will be very useful to evaluate node label expressions, thanks! 

And do you think is it better to place this module to 
org.apache.hadoop.yarn.server.resourcemanager.nodelabels.matcher (or evaluator) 
for better organization?

> Matcher for complex node label expresions
> -
>
> Key: YARN-4476
> URL: https://issues.apache.org/jira/browse/YARN-4476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: YARN-4476-0.patch, YARN-4476-1.patch
>
>
> Implementation of a matcher for complex node label expressions based on a 
> [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065191#comment-15065191
 ] 

Wangda Tan commented on YARN-4304:
--

[~sunilg],

bq. I was slightly confused by earlier comment. Ideally we can make use of 
users, but we may need to get the first user and get his AM Limit. This is 
perfectly fine for now until we have per-user-am-limit.
Agree! And I think if there's no users in the queue, we can use queue's 
am-limit directly (instead of "N/A" or 0, etc.).

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, 
> 0005-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065189#comment-15065189
 ] 

Wangda Tan commented on YARN-4195:
--

Hi [~curino],

Thanks for explanation, I can understand the proposal now.

When there are more than one property of nodes need to be shared by queues with 
specified capacity, this proposal will be very useful. For example, if both of 
the PUBLICIP and GPU are all required to be shared by queues with specified 
capacity. 

And I'm also thinking about how user to configure the cluster when this feature 
is enabled:
1. User added partition A/B
2. User configure capacity for partition = \{A_B, A, B and \} to queues.
3. User assign A/B partition to nodes
4. Submit job with partitions
(2/3 could be swapped) 

But if user doesn't configure capacity for A_B and:
- Submit a job with A_B, what should we do?  
- Assign A/B to one node, what should we do?

And a cluster with N different "atomic" partitions could produce N^N "actual" 
partitions, how can we avoid such dimension explosion happens?

If any of the property doesn't need guaranteed capacity for sharing, we can 
make it to be a simple constraint (YARN-3409), which will be FCFS and arbitrary 
combination of constraints could be supported. 

> Support of node-labels in the ReservationSystem "Plan"
> --
>
> Key: YARN-4195
> URL: https://issues.apache.org/jira/browse/YARN-4195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4195.patch
>
>
> As part of YARN-4193 we need to enhance the InMemoryPlan (and related 
> classes) to track the per-label available resources, as well as the per-label
> reservation-allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4481) negative pending resource of queues lead to applications in accepted status inifnitly

2015-12-18 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065186#comment-15065186
 ] 

Varun Saxena commented on YARN-4481:


[~sunilg], we do not have AM debug logs. And RM debug logs are after the event 
so all we get from it is that pending resources are negative which leads to the 
log guchi mentioned above. Let us see if we get something more from code.

> negative pending resource of queues lead to applications in accepted status 
> inifnitly
> -
>
> Key: YARN-4481
> URL: https://issues.apache.org/jira/browse/YARN-4481
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.7.2
>Reporter: gu-chi
>Priority: Critical
> Attachments: jmx.txt
>
>
> Met a scenario of negative pending resource with capacity scheduler, in jmx, 
> it shows:
> {noformat}
> "PendingMB" : -4096,
> "PendingVCores" : -1,
> "PendingContainers" : -1,
> {noformat}
> full jmx infomation attached.
> this is not just a jmx UI issue, the actual pending resource of queue is also 
> negative as I see the debug log of
> bq. DEBUG | ResourceManager Event Processor | Skip this queue=root, because 
> it doesn't need more resource, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY 
> node-partition= | ParentQueue.java
> this lead to the {{NULL_ASSIGNMENT}}
> The background is submitting hundreds of applications and consume all cluster 
> resource and reservation happen. While running, network fault injected by 
> some tool, injection types are delay,jitter
> ,repeat,packet loss and disorder. And then kill most of the applications 
> submitted.
> Anyone also facing negative pending resource, or have idea of how this happen?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4481) negative pending resource of queues lead to applications in accepted status inifnitly

2015-12-18 Thread gu-chi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065181#comment-15065181
 ] 

gu-chi commented on YARN-4481:
--

I added some extra log to trace, do you have any idea how can probably 
reproduce?

> negative pending resource of queues lead to applications in accepted status 
> inifnitly
> -
>
> Key: YARN-4481
> URL: https://issues.apache.org/jira/browse/YARN-4481
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.7.2
>Reporter: gu-chi
>Priority: Critical
> Attachments: jmx.txt
>
>
> Met a scenario of negative pending resource with capacity scheduler, in jmx, 
> it shows:
> {noformat}
> "PendingMB" : -4096,
> "PendingVCores" : -1,
> "PendingContainers" : -1,
> {noformat}
> full jmx infomation attached.
> this is not just a jmx UI issue, the actual pending resource of queue is also 
> negative as I see the debug log of
> bq. DEBUG | ResourceManager Event Processor | Skip this queue=root, because 
> it doesn't need more resource, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY 
> node-partition= | ParentQueue.java
> this lead to the {{NULL_ASSIGNMENT}}
> The background is submitting hundreds of applications and consume all cluster 
> resource and reservation happen. While running, network fault injected by 
> some tool, injection types are delay,jitter
> ,repeat,packet loss and disorder. And then kill most of the applications 
> submitted.
> Anyone also facing negative pending resource, or have idea of how this happen?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling

2015-12-18 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065177#comment-15065177
 ] 

Tao Jie commented on YARN-4477:
---

Failed test cases are irrelevant to this patch, and work in my local 
environment.

> FairScheduler: encounter infinite loop in attemptScheduling
> ---
>
> Key: YARN-4477
> URL: https://issues.apache.org/jira/browse/YARN-4477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Tao Jie
>Assignee: Tao Jie
> Attachments: YARN-4477.001.patch, YARN-4477.002.patch, 
> YARN-4477.003.patch
>
>
> This problem is introduced by YARN-4270 which add limitation on reservation.  
> In FSAppAttempt.reserve():
> {code}
> if (!reservationExceedsThreshold(node, type)) {
>   LOG.info("Making reservation: node=" + node.getNodeName() +
>   " app_id=" + getApplicationId());
>   if (!alreadyReserved) {
> getMetrics().reserveResource(getUser(), container.getResource());
> RMContainer rmContainer =
> super.reserve(node, priority, null, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   } else {
> RMContainer rmContainer = node.getReservedContainer();
> super.reserve(node, priority, rmContainer, container);
> node.reserveResource(this, priority, rmContainer);
> setReservation(node);
>   }
> }
> {code}
> If reservation over threshod, current node will not set reservation.
> But in attemptScheduling in FairSheduler:
> {code}
>   while (node.getReservedContainer() == null) {
> boolean assignedContainer = false;
> if (!queueMgr.getRootQueue().assignContainer(node).equals(
> Resources.none())) {
>   assignedContainers++;
>   assignedContainer = true;
>   
> }
> 
> if (!assignedContainer) { break; }
> if (!assignMultiple) { break; }
> if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
>   }
> {code}
> assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
> equals to Resources.none().
> As a result, if multiple assign is enabled and maxAssign is unlimited, this 
> while loop would never break.
> I suppose that assignContainer(node) should return Resource.none rather than 
> CONTAINER_RESERVED when the attempt doesn't take the reservation because of 
> the limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4481) negative pending resource of queues lead to applications in accepted status inifnitly

2015-12-18 Thread gu-chi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065175#comment-15065175
 ] 

gu-chi commented on YARN-4481:
--

Same using DRC.
:( Debug Log was only enabled after I saw the issue, so before that, no debug 
infomation. 
I got RM log, but several GB with hundreds applications.

> negative pending resource of queues lead to applications in accepted status 
> inifnitly
> -
>
> Key: YARN-4481
> URL: https://issues.apache.org/jira/browse/YARN-4481
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.7.2
>Reporter: gu-chi
>Priority: Critical
> Attachments: jmx.txt
>
>
> Met a scenario of negative pending resource with capacity scheduler, in jmx, 
> it shows:
> {noformat}
> "PendingMB" : -4096,
> "PendingVCores" : -1,
> "PendingContainers" : -1,
> {noformat}
> full jmx infomation attached.
> this is not just a jmx UI issue, the actual pending resource of queue is also 
> negative as I see the debug log of
> bq. DEBUG | ResourceManager Event Processor | Skip this queue=root, because 
> it doesn't need more resource, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY 
> node-partition= | ParentQueue.java
> this lead to the {{NULL_ASSIGNMENT}}
> The background is submitting hundreds of applications and consume all cluster 
> resource and reservation happen. While running, network fault injected by 
> some tool, injection types are delay,jitter
> ,repeat,packet loss and disorder. And then kill most of the applications 
> submitted.
> Anyone also facing negative pending resource, or have idea of how this happen?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2015-12-18 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065168#comment-15065168
 ] 

Varun Saxena commented on YARN-4238:


bq. If its in this way, then will it be useful for the client ? may be even it 
will be difficult for the users to understand it ?
Yeah that is what I wanted to highlight. That specifically for metrics, even 
the cell timestamps are not really timestamps for Put.

> createdTime and modifiedTime is not reported while publishing entities to 
> ATSv2
> ---
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4238-YARN-2928.01.patch, 
> YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, 
> YARN-4238-feature-YARN-2928.02.patch
>
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-12-18 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4290:
--
Attachment: 0003-YARN-4290.patch

Thanks [~leftnoteasy] and [~Naganarasimha Garla]
Yes, one test case is related. Uploading a new patch. Other test case failures 
are knows and tracked via separate tickets.

> "yarn nodes -list" should print all nodes reports information
> -
>
> Key: YARN-4290
> URL: https://issues.apache.org/jira/browse/YARN-4290
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0002-YARN-4290.patch, 0003-YARN-4290.patch
>
>
> Currently, "yarn nodes -list" command only shows 
> - "Node-Id", 
> - "Node-State", 
> - "Node-Http-Address",
> - "Number-of-Running-Containers"
> I think we need to show more information such as used resource, just like 
> "yarn nodes -status" command.
> Maybe we can add a parameter to -list, such as "-show-details" to enable 
> printing all detailed information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2015-12-18 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065167#comment-15065167
 ] 

Naganarasimha G R commented on YARN-4238:
-

Thanks [~varun_saxena] for summarizing it,
Me and varun synced offline to on this and i feel the summary is fine. and yes 
[~sjlee0], even i feel your opinion is right we can just keep it as part of 
entity object avoid having it as part of filter and complicate it.

bq.  The cell timestamps for metrics are filled based on what 's reported from 
client
If its in this way, then will it be useful for the client ? may be even it will 
be difficult for the users to understand it ?

> createdTime and modifiedTime is not reported while publishing entities to 
> ATSv2
> ---
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4238-YARN-2928.01.patch, 
> YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, 
> YARN-4238-feature-YARN-2928.02.patch
>
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4164) Retrospect update ApplicationPriority API return type

2015-12-18 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065162#comment-15065162
 ] 

Rohith Sharma K S commented on YARN-4164:
-

Thanks [~jianhe] for review and committing patch

> Retrospect update ApplicationPriority API return type
> -
>
> Key: YARN-4164
> URL: https://issues.apache.org/jira/browse/YARN-4164
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4164.patch, 0002-YARN-4164.patch, 
> 0003-YARN-4164.patch, 0004-YARN-4164.patch
>
>
> Currently {{ApplicationClientProtocol#updateApplicationPriority()}} API 
> returns empty UpdateApplicationPriorityResponse response.
> But RM update priority to the cluster.max-priority if the given priority is 
> greater than cluster.max-priority. In this scenarios, need to intimate back 
> to client that updated  priority rather just keeping quite where client 
> assumes that given priority itself is taken.
> During application submission also has same scenario can happen, but I feel 
> when 
> explicitly invoke via ApplicationClientProtocol#updateApplicationPriority(), 
> response should have updated priority in response. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2015-12-18 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065160#comment-15065160
 ] 

Varun Saxena commented on YARN-4238:


[~sjlee0], if modified time is only for debugging, is there any need to filter 
rows based on it ?
Maybe we can remove it.

We can simply fill it on the basis of cell timestamps then and return it back 
in response.
One thing though. The cell timestamps for metrics are filled based on what 's 
reported from client

> createdTime and modifiedTime is not reported while publishing entities to 
> ATSv2
> ---
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4238-YARN-2928.01.patch, 
> YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, 
> YARN-4238-feature-YARN-2928.02.patch
>
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4476) Matcher for complex node label expresions

2015-12-18 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-4476:

Attachment: YARN-4476-1.patch

Add ASF license headers, fix findbugs warnings, address some of the checkstyle 
issues.

> Matcher for complex node label expresions
> -
>
> Key: YARN-4476
> URL: https://issues.apache.org/jira/browse/YARN-4476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: YARN-4476-0.patch, YARN-4476-1.patch
>
>
> Implementation of a matcher for complex node label expressions based on a 
> [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4484) Available Resource calculation for a queue is not correct when used with labels

2015-12-18 Thread Sunil G (JIRA)
Sunil G created YARN-4484:
-

 Summary: Available Resource calculation for a queue is not correct 
when used with labels
 Key: YARN-4484
 URL: https://issues.apache.org/jira/browse/YARN-4484
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacity scheduler
Affects Versions: 2.7.1
Reporter: Sunil G
Assignee: Sunil G


To calculate available resource for a queue, we have to get the total resource 
allocated for all labels in queue compare to its usage. 
Also address the comments given in 
[YARN-4304-comments|https://issues.apache.org/jira/browse/YARN-4304?focusedCommentId=15064874&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15064874
 ] given by [~leftnoteasy] for same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-12-18 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065147#comment-15065147
 ] 

Sunil G commented on YARN-4304:
---

bq.I think the longer term fix should be, add a by-partition info to queue 
metrics, including max/guaranteed/available/used, etc. I can help to review 
proposal/patches.
Yes. This looks fine for me. I will track this with a different JIRA. This new 
ticket will track the cluster metrics total memory when used with labels.

bq.How about call them calculateAndGetAMResourceLimitPerPartition and 
getAMResourceLimitPerPartition?
+1. Since we have this calculated information, we can make use of same. I will 
make the changes.

I will upload REST/UI screen shots along with updated patch

bq.Is there any concern of you for this approach?
I was slightly confused by earlier comment. Ideally we can make use of 
{{users}}, but we may need to get the first user and get his AM Limit. This is 
perfectly fine for now until we have per-user-am-limit.  

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, 
> 0005-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager

2015-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065119#comment-15065119
 ] 

Karthik Kambatla commented on YARN-914:
---

bq. On the other hand, there are additional details and component level designs 
that the JIRA design document not necessarily discuss or touch. 
Are you able to share these details in an "augmented" design doc? Agreeing on 
the design would greatly help with review/commits later.

As far as implementation goes, it is recommended to create subtasks as you see 
fit. Note that it is easier to review smaller chunks of code. Also, since you 
guys have implemented it already, can you comment on how much of the code 
changes are in frequently updated parts? If not much, it might make sense to 
develop on a branch and merge it to trunk. 

> (Umbrella) Support graceful decommission of nodemanager
> ---
>
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: graceful
>Affects Versions: 2.0.4-alpha
>Reporter: Luke Lu
>Assignee: Junping Du
> Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
> Gracefully Decommission of NodeManager (v2).pdf, 
> GracefullyDecommissionofNodeManagerv3.pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4480) Clean up some inappropriate imports

2015-12-18 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated YARN-4480:

Attachment: YARN-4480-v2.patch

Thanks Daniel. I manually edit it and uploaded a new version.

> Clean up some inappropriate imports
> ---
>
> Key: YARN-4480
> URL: https://issues.apache.org/jira/browse/YARN-4480
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Kai Zheng
> Attachments: YARN-4480-v1.patch, YARN-4480-v2.patch
>
>
> It was noticed there are some unnecessary dependency into Directory classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065077#comment-15065077
 ] 

Wangda Tan commented on YARN-4292:
--

Committed to branch-2.8.

> ResourceUtilization should be a part of NodeInfo REST API
> -
>
> Key: YARN-4292
> URL: https://issues.apache.org/jira/browse/YARN-4292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4292.patch, 0002-YARN-4292.patch, 
> 0003-YARN-4292.patch, 0004-YARN-4292.patch, 0005-YARN-4292.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4398) Yarn recover functionality causes the cluster running slowly and the cluster usage rate is far below 100

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065078#comment-15065078
 ] 

Wangda Tan commented on YARN-4398:
--

Committed to branch-2.8.

> Yarn recover functionality causes the cluster running slowly and the cluster 
> usage rate is far below 100
> 
>
> Key: YARN-4398
> URL: https://issues.apache.org/jira/browse/YARN-4398
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: NING DING
>Assignee: NING DING
> Fix For: 2.7.3
>
> Attachments: YARN-4398.2.patch, YARN-4398.3.patch, YARN-4398.4.patch
>
>
> In my hadoop cluster, the resourceManager recover functionality is enabled 
> with FileSystemRMStateStore.
> I found this cause the yarn cluster running slowly and cluster usage rate is 
> just 50 even there are many pending Apps. 
> The scenario is below.
> In thread A, the RMAppImpl$RMAppNewlySavingTransition is calling 
> storeNewApplication method defined in RMStateStore. This storeNewApplication 
> method is synchronized.
> {code:title=RMAppImpl.java|borderStyle=solid}
>   private static final class RMAppNewlySavingTransition extends 
> RMAppTransition {
> @Override
> public void transition(RMAppImpl app, RMAppEvent event) {
>   // If recovery is enabled then store the application information in a
>   // non-blocking call so make sure that RM has stored the information
>   // needed to restart the AM after RM restart without further client
>   // communication
>   LOG.info("Storing application with id " + app.applicationId);
>   app.rmContext.getStateStore().storeNewApplication(app);
> }
>   }
> {code}
> {code:title=RMStateStore.java|borderStyle=solid}
> public synchronized void storeNewApplication(RMApp app) {
> ApplicationSubmissionContext context = app
> 
> .getApplicationSubmissionContext();
> assert context instanceof ApplicationSubmissionContextPBImpl;
> ApplicationStateData appState =
> ApplicationStateData.newInstance(
> app.getSubmitTime(), app.getStartTime(), context, app.getUser());
> dispatcher.getEventHandler().handle(new RMStateStoreAppEvent(appState));
>   }
> {code}
> In thread B, the FileSystemRMStateStore is calling 
> storeApplicationStateInternal method. It's also synchronized.
> This storeApplicationStateInternal method saves an ApplicationStateData into 
> HDFS and it normally costs 90~300 milliseconds in my hadoop cluster.
> {code:title=FileSystemRMStateStore.java|borderStyle=solid}
> public synchronized void storeApplicationStateInternal(ApplicationId appId,
>   ApplicationStateData appStateDataPB) throws Exception {
> Path appDirPath = getAppDir(rmAppRoot, appId);
> mkdirsWithRetries(appDirPath);
> Path nodeCreatePath = getNodePath(appDirPath, appId.toString());
> LOG.info("Storing info for app: " + appId + " at: " + nodeCreatePath);
> byte[] appStateData = appStateDataPB.getProto().toByteArray();
> try {
>   // currently throw all exceptions. May need to respond differently for 
> HA
>   // based on whether we have lost the right to write to FS
>   writeFileWithRetries(nodeCreatePath, appStateData, true);
> } catch (Exception e) {
>   LOG.info("Error storing info for app: " + appId, e);
>   throw e;
> }
>   }
> {code}
> Think thread B firstly comes into 
> FileSystemRMStateStore.storeApplicationStateInternal method, then thread A 
> will be blocked for a while because of synchronization. In ResourceManager 
> there is only one RMStateStore instance. In my cluster it's 
> FileSystemRMStateStore type.
> Debug the RMAppNewlySavingTransition.transition method, the thread stack 
> shows it's called form AsyncDispatcher.dispatch method. This method code is 
> as below. 
> {code:title=AsyncDispatcher.java|borderStyle=solid}
>   protected void dispatch(Event event) {
> //all events go thru this loop
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Dispatching the event " + event.getClass().getName() + "."
>   + event.toString());
> }
> Class type = event.getType().getDeclaringClass();
> try{
>   EventHandler handler = eventDispatchers.get(type);
>   if(handler != null) {
> handler.handle(event);
>   } else {
> throw new Exception("No handler for registered for " + type);
>   }
> } catch (Throwable t) {
>   //TODO Maybe log the state of the queue
>   LOG.fatal("Error in dispatcher thread", t);
>   // If serviceStop is called, we should exit this thread gracefully.
>   if (exitOnDispatchException
>   && (ShutdownHookManager.get().isShutdownIn

[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065072#comment-15065072
 ] 

Wangda Tan commented on YARN-4405:
--

Committed to branch-2.8.

> Support node label store in non-appendable file system
> --
>
> Key: YARN-4405
> URL: https://issues.apache.org/jira/browse/YARN-4405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.8.0
>
> Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, 
> YARN-4405.4.patch
>
>
> Existing node label file system store implementation uses append to write 
> edit logs. However, some file system doesn't support append, we need add an 
> implementation to support such non-appendable file systems as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4422) Generic AHS sometimes doesn't show started, node, or logs on App page

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065065#comment-15065065
 ] 

Wangda Tan commented on YARN-4422:
--

Committed to branch-2.8.

> Generic AHS sometimes doesn't show started, node, or logs on App page
> -
>
> Key: YARN-4422
> URL: https://issues.apache.org/jira/browse/YARN-4422
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0, 2.8.0, 2.7.3
>
> Attachments: AppAttemptPage no container or node.jpg, AppPage no logs 
> or node.jpg, YARN-4422.001.patch
>
>
> Sometimes the AM container for an app isn't able to start the JVM. This can 
> happen if bogus JVM options are given to the AM container ( 
> {{-Dyarn.app.mapreduce.am.command-opts=-InvalidJvmOption}}) or when 
> misconfiguring the AM container's environment variables 
> ({{-Dyarn.app.mapreduce.am.env="JAVA_HOME=/foo/bar/baz}})
> When the AM container for an app isn't able to start the JVM, the Application 
> page for that application shows {{N/A}} for the {{Started}}, {{Node}}, and 
> {{Logs}} columns. It _does_ have links for each app attempt, and if you click 
> on one of them, you go to the Application Attempt page, where you can see all 
> containers with links to their logs and nodes, including the AM container. 
> But none of that shows up for the app attempts on the Application page.
> Also, on the Application Attempt page, in the {{Application Attempt 
> Overview}} section, the {{AM Container}} value is {{null}} and the {{Node}} 
> value is {{N/A}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4424) Fix deadlock in RMAppImpl

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065063#comment-15065063
 ] 

Wangda Tan commented on YARN-4424:
--

Committed to branch-2.8.

> Fix deadlock in RMAppImpl
> -
>
> Key: YARN-4424
> URL: https://issues.apache.org/jira/browse/YARN-4424
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Jian He
>Priority: Blocker
> Fix For: 2.7.2, 2.6.3
>
> Attachments: YARN-4424.1.patch
>
>
> {code}
> yarn@XXX:/mnt/hadoopqe$ /usr/hdp/current/hadoop-yarn-client/bin/yarn 
> application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
> 15/12/04 21:59:54 INFO impl.TimelineClientImpl: Timeline service address: 
> http://XXX:8188/ws/v1/timeline/
> 15/12/04 21:59:54 INFO client.RMProxy: Connecting to ResourceManager at 
> XXX/0.0.0.0:8050
> 15/12/04 21:59:55 INFO client.AHSProxy: Connecting to Application History 
> server at XXX/0.0.0.0:10200
> {code}
> {code:title=RM log}
> 2015-12-04 21:59:19,744 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 237000
> 2015-12-04 22:00:50,945 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 238000
> 2015-12-04 22:02:22,416 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 239000
> 2015-12-04 22:03:53,593 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 24
> 2015-12-04 22:05:24,856 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 241000
> 2015-12-04 22:06:56,235 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 242000
> 2015-12-04 22:08:27,510 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 243000
> 2015-12-04 22:09:58,786 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 244000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time

2015-12-18 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065066#comment-15065066
 ] 

zhihai xu commented on YARN-4440:
-

yes, thanks [~leftnoteasy] for committing it to branch-2.8!

> FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
> -
>
> Key: YARN-4440
> URL: https://issues.apache.org/jira/browse/YARN-4440
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.8.0
>
> Attachments: YARN-4440.001.patch, YARN-4440.002.patch, 
> YARN-4440.003.patch
>
>
> It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} 
> method
> {code}
> // default level is NODE_LOCAL
> if (! allowedLocalityLevel.containsKey(priority)) {
>   allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL);
>   return NodeType.NODE_LOCAL;
> }
> {code}
> If you first invoke this method, it doesn't init  time in 
> lastScheduledContainer and this will lead to execute these code for next 
> invokation:
> {code}
> // check waiting time
> long waitTime = currentTimeMs;
> if (lastScheduledContainer.containsKey(priority)) {
>   waitTime -= lastScheduledContainer.get(priority);
> } else {
>   waitTime -= getStartTime();
> }
> {code}
> the waitTime will subtract to FsApp startTime, and this will be easily more 
> than the delay time and allowedLocality degrade. Because FsApp startTime will 
> be start earlier than currentTimeMs. So we should add the initial time of 
> priority to prevent comparing with FsApp startTime and allowedLocalityLevel 
> degrade. And this problem will have more negative influence for small-jobs. 
> The YARN-4399 also discuss some problem in aspect of locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4392) ApplicationCreatedEvent event time resets after RM restart/failover

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065067#comment-15065067
 ] 

Wangda Tan commented on YARN-4392:
--

Committed to branch-2.8.

> ApplicationCreatedEvent event time resets after RM restart/failover
> ---
>
> Key: YARN-4392
> URL: https://issues.apache.org/jira/browse/YARN-4392
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Naganarasimha G R
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-4392-2015-11-24.patch, YARN-4392.1.patch, 
> YARN-4392.2.patch, YARN-4392.3.patch
>
>
> {code}2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - 
> Finished time 1437453994768 is ahead of started time 1440308399674 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437454008244 is ahead of started time 1440308399676 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444305171 is ahead of started time 1440308399653 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444293115 is ahead of started time 1440308399647 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444379645 is ahead of started time 1440308399656 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444361234 is ahead of started time 1440308399655 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444342029 is ahead of started time 1440308399654 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444323447 is ahead of started time 1440308399654 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 143730006 is ahead of started time 1440308399660 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 143715698 is ahead of started time 1440308399659 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 143719060 is ahead of started time 1440308399658 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444393931 is ahead of started time 1440308399657
> {code} . 
> From ATS logs, we would see a large amount of 'stale alerts' messages 
> periodically



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065058#comment-15065058
 ] 

Wangda Tan commented on YARN-4418:
--

Committed to branch-2.8.

> AM Resource Limit per partition can be updated to ResourceUsage as well
> ---
>
> Key: YARN-4418
> URL: https://issues.apache.org/jira/browse/YARN-4418
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4418.patch, 0002-YARN-4418.patch, 
> 0003-YARN-4418.patch, 0004-YARN-4418.patch, 0005-YARN-4418.patch
>
>
> AMResourceLimit is now extended to all partitions after YARN-3216. Its also 
> better to track this ResourceLimit in existing {{ResourceUsage}} so that REST 
> framework can be benefited to avail this information easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3946) Update exact reason as to why a submitted app is in ACCEPTED state to app's diagnostic message

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065060#comment-15065060
 ] 

Wangda Tan commented on YARN-3946:
--

Committed to branch-2.8.

> Update exact reason as to why a submitted app is in ACCEPTED state to app's 
> diagnostic message
> --
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, 
> YARN-3946.v1.007.patch, YARN-3946.v1.008.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4309) Add container launch related debug information to container logs when a container fails

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065059#comment-15065059
 ] 

Wangda Tan commented on YARN-4309:
--

Committed to branch-2.8.

> Add container launch related debug information to container logs when a 
> container fails
> ---
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: YARN-4309.001.patch, YARN-4309.002.patch, 
> YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, 
> YARN-4309.006.patch, YARN-4309.007.patch, YARN-4309.008.patch, 
> YARN-4309.009.patch, YARN-4309.010.patch
>
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065053#comment-15065053
 ] 

Wangda Tan commented on YARN-4416:
--

Committed to branch-2.8.

> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, YARN-4416.v2.003.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Hence we need to ensure following :
> # queueCapacity, resource-usage has their own read/write lock hence 
> synchronization is not req
> # numContainers is volatile hence synchronization is not req.
> # read/write lock could be added to Ordering Policy. Read operations don't 
> need synchronized. So {{getNumApplications}} doesn't need synchronized. 
> (First 2 will be handled in this jira and the third will be handled in 
> YARN-4443)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065054#comment-15065054
 ] 

Wangda Tan commented on YARN-4225:
--

Committed to branch-2.8.

> Add preemption status to yarn queue -status for capacity scheduler
> --
>
> Key: YARN-4225
> URL: https://issues.apache.org/jira/browse/YARN-4225
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-4225.001.patch, YARN-4225.002.patch, 
> YARN-4225.003.patch, YARN-4225.004.patch, YARN-4225.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065057#comment-15065057
 ] 

Wangda Tan commented on YARN-4440:
--

Committed to branch-2.8.

> FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
> -
>
> Key: YARN-4440
> URL: https://issues.apache.org/jira/browse/YARN-4440
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.8.0
>
> Attachments: YARN-4440.001.patch, YARN-4440.002.patch, 
> YARN-4440.003.patch
>
>
> It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} 
> method
> {code}
> // default level is NODE_LOCAL
> if (! allowedLocalityLevel.containsKey(priority)) {
>   allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL);
>   return NodeType.NODE_LOCAL;
> }
> {code}
> If you first invoke this method, it doesn't init  time in 
> lastScheduledContainer and this will lead to execute these code for next 
> invokation:
> {code}
> // check waiting time
> long waitTime = currentTimeMs;
> if (lastScheduledContainer.containsKey(priority)) {
>   waitTime -= lastScheduledContainer.get(priority);
> } else {
>   waitTime -= getStartTime();
> }
> {code}
> the waitTime will subtract to FsApp startTime, and this will be easily more 
> than the delay time and allowedLocality degrade. Because FsApp startTime will 
> be start earlier than currentTimeMs. So we should add the initial time of 
> priority to prevent comparing with FsApp startTime and allowedLocalityLevel 
> degrade. And this problem will have more negative influence for small-jobs. 
> The YARN-4399 also discuss some problem in aspect of locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4293) ResourceUtilization should be a part of yarn node CLI

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065055#comment-15065055
 ] 

Wangda Tan commented on YARN-4293:
--

Committed to branch-2.8.

> ResourceUtilization should be a part of yarn node CLI
> -
>
> Key: YARN-4293
> URL: https://issues.apache.org/jira/browse/YARN-4293
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4293.patch, 0002-YARN-4293.patch, 
> 0003-YARN-4293.patch
>
>
> In order to get resource utilization information easier, "yarn node" CLI 
> should include resource utilization on the node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-12-18 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065043#comment-15065043
 ] 

Jun Gong commented on YARN-3480:


Thanks for review and suggestion!

{quote}
regarding this logic, it is possible that a particular attempt is not persisted 
in the store because of some connection failures. so the app.nextAttemptId - 
app.firstAttemptIdInStateStore does not necessarily indicate the number of 
attempts.
{quote}
If RMStateStore fails to persist any attempt, it will transition to state 
'RMStateStoreState.FENCED'. There will be no operations performed if 
RMStateStore is in this state. So it will not be a problem?

{quote}
LevelDBRMStateStore#removeApplicationAttemptInternal does not need to use batch 
operation, as it only has one operation

Could you also add a test case in RMStateStoreTestBase#testRMAppStateStore that 
the loading part also works correctly? i.e. loading an app with partial 
attempts works correctly.
{quote}
Thanks, I will fix them.

> Recovery may get very slow with lots of services with lots of app-attempts
> --
>
> Key: YARN-3480
> URL: https://issues.apache.org/jira/browse/YARN-3480
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3480.01.patch, YARN-3480.02.patch, 
> YARN-3480.03.patch, YARN-3480.04.patch, YARN-3480.05.patch, 
> YARN-3480.06.patch, YARN-3480.07.patch, YARN-3480.08.patch, 
> YARN-3480.09.patch, YARN-3480.10.patch
>
>
> When RM HA is enabled and running containers are kept across attempts, apps 
> are more likely to finish successfully with more retries(attempts), so it 
> will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
> it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
> RM recover process much slower. It might be better to set max attempts to be 
> stored in RMStateStore.
> BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4405) Support node label store in non-appendable file system

2015-12-18 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4405:
--
Fix Version/s: 2.8.0

> Support node label store in non-appendable file system
> --
>
> Key: YARN-4405
> URL: https://issues.apache.org/jira/browse/YARN-4405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.8.0
>
> Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, 
> YARN-4405.4.patch
>
>
> Existing node label file system store implementation uses append to write 
> edit logs. However, some file system doesn't support append, we need add an 
> implementation to support such non-appendable file systems as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"

2015-12-18 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064956#comment-15064956
 ] 

Carlo Curino commented on YARN-4195:


[~leftnoteasy], I think you are guessing right (with a wrong example)... The 
user during job/reservation submission can say {{GPU}} and internally the 
system will translate this in {{GPU_PUBLICIP OR GPU_NOT-PUBLICIP}} and thus 
match any container from either of the two underlying partitions. 

Even nicer would be to allow each node to carry an arbitrary set of labels 
({{GPU}}, {{PUBLICIP}}), and the system automatically infer partitions (from 
node label specification). A configuration helper tool could show the Admin the 
list partitions, and their capacity and help configure queues by specifying 
capacity allocations per-partition (or per-label with some validation happening 
behind the scene). As the number of "active" partitions (vs the number of all 
possible partitions) is typically much smaller (and bound by the number of 
nodes), this should be generally feasible. 

Speaking with [~kasha], this would also go very well with some of the ideas for 
a schedule refactoring / support for node labels in the {{FairScheduler}} he is 
considering.

> Support of node-labels in the ReservationSystem "Plan"
> --
>
> Key: YARN-4195
> URL: https://issues.apache.org/jira/browse/YARN-4195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4195.patch
>
>
> As part of YARN-4193 we need to enhance the InMemoryPlan (and related 
> classes) to track the per-label available resources, as well as the per-label
> reservation-allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-18 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064931#comment-15064931
 ] 

Sangjin Lee commented on YARN-4224:
---

Thanks [~gtCarrera9] for the update!

Then I'd like to put forward a proposal more formally (it's not a new proposal).
- adopt Li's original proposal (2nd approach mentioned in [this 
comment|https://issues.apache.org/jira/browse/YARN-4224?focusedCommentId=15052865&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15052865])
- permit omitting part of the path that can be omitted (need documentation on 
the permitted cases)
- also support a UID-based URL as a shorthand for the long path-based URL, but 
*clearly document what type of queries support UIDs*

I am still not 100% convinced that we should not make composing UIDs public (so 
that clients themselves can compose them, instead of a server-based end point).

This is my proposal/opinion, so obviously yours may be different. Thoughts? 
Comments?

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4164) Retrospect update ApplicationPriority API return type

2015-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064915#comment-15064915
 ] 

Hudson commented on YARN-4164:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #8999 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8999/])
YARN-4164. Changed updateApplicationPriority API to return the updated (jianhe: 
rev 85c24660481f33684a42a7f6d665d3117577c780)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityResponsePBImpl.java


> Retrospect update ApplicationPriority API return type
> -
>
> Key: YARN-4164
> URL: https://issues.apache.org/jira/browse/YARN-4164
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4164.patch, 0002-YARN-4164.patch, 
> 0003-YARN-4164.patch, 0004-YARN-4164.patch
>
>
> Currently {{ApplicationClientProtocol#updateApplicationPriority()}} API 
> returns empty UpdateApplicationPriorityResponse response.
> But RM update priority to the cluster.max-priority if the given priority is 
> greater than cluster.max-priority. In this scenarios, need to intimate back 
> to client that updated  priority rather just keeping quite where client 
> assumes that given priority itself is taken.
> During application submission also has same scenario can happen, but I feel 
> when 
> explicitly invoke via ApplicationClientProtocol#updateApplicationPriority(), 
> response should have updated priority in response. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064917#comment-15064917
 ] 

Hadoop QA commented on YARN-4234:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
7s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 56s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 21s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 21s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s 
{color} | {color:red} Patch generated 12 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 237, now 246). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
55s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 55s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 6s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 54s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 20s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 7s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not gen

[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2015-12-18 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064910#comment-15064910
 ] 

Sangjin Lee commented on YARN-4238:
---

Thanks [~varun_saxena] for the summary, and [~Naganarasimha] for bringing up 
the matter.

To elaborate on a couple of points, although the modified time was introduced 
early in the data model, I don't think we had things like queries based on the 
modified time in mind. It was suggested more for the usefulness in terms of 
troubleshooting ("when was the entity last written to?"), but not much more. 
One analogy may be the often-present modified timestamp columns in SQL tables.

Other than that, I'm generally agreeing with Varun's summary. At minimum, the 
modified timestamp should not be set by clients. And we should probably drop 
the modified time from the schema in general. We can leave it in the data 
model, but I'm still not 100% sure if we even need to bother with that. I'd 
like to hear your thoughts.

> createdTime and modifiedTime is not reported while publishing entities to 
> ATSv2
> ---
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4238-YARN-2928.01.patch, 
> YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, 
> YARN-4238-feature-YARN-2928.02.patch
>
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4483) HDP 2.2.9 expands the scope of AMBARI-11358.

2015-12-18 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-4483.
--
Resolution: Invalid

[~dmtucker],

AMBARI-11358 should handle this already, I think there's nothing needs to be 
done in YARN side. Closing as invalid.

> HDP 2.2.9 expands the scope of AMBARI-11358.
> 
>
> Key: YARN-4483
> URL: https://issues.apache.org/jira/browse/YARN-4483
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
> Environment: HDP 2.2.9.0-3393, Ambari 1.7.0
>Reporter: David Tucker
>
> YARN will not start until 
> yarn.scheduler.capacity.root.accessible-node-labels.default.capacity and 
> yarn.scheduler.capacity.root.accessible-node-labels.default.maximum-capacity 
> are changed from their default value (-1).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4257) Move scheduler validateConf method to AbstractYarnScheduler and make it protected

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064886#comment-15064886
 ] 

Wangda Tan commented on YARN-4257:
--

Hi [~rhaase],

Thanks for working on this patch,

Patch looks good to me, could you make the "minimum allocation should > 0" test 
to a paramterized test? So we don't have to duplicate it for 3 different 
schedulers. Maybe we could add it to TestAbstractYarnScheduler?

> Move scheduler validateConf method to AbstractYarnScheduler and make it 
> protected
> -
>
> Key: YARN-4257
> URL: https://issues.apache.org/jira/browse/YARN-4257
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Swapnil Daingade
>Assignee: Rich Haase
>  Labels: easyfix
> Attachments: YARN-4257.patch
>
>
> Currently FairScheduler, CapacityScheduler and FifoScheduler each have a 
> method private void validateConf(Configuration conf).
> All three methods validate the minimum and maximum scheduler allocations for 
> cpu and memory (with minor difference). FairScheduler supports 0 as minimum 
> allocation for cpu and memory, while CapacityScheduler and FifoScheduler do 
> not. We can move this code to AbstractYarnScheduler (avoids code duplication) 
> and make it protected for individual schedulers to override.
> Why do we care about a minimum allocation of 0 for cpu and memory?
> We contribute to a project called Apache Myriad that run yarn on mesos. 
> Myriad supports a feature call fine grained scaling (fgs). In fgs, a NM is 
> launched with zero capacity (0 cpu and 0 mem). When a yarn container is to be 
> run on the NM, a mesos offer for that node is accepted and the NM capacity is 
> dynamically scaled up to match the accepted mesos offer. On completion of the 
> yarn container, resources are returned back to Mesos and the NM capacity is 
> scaled down back to zero (cpu & mem). 
> In ResourceTrackerService.registerNodeManager, yarn checks if the NM capacity 
> is at-least as much as yarn.scheduler.minimum-allocation-mb and 
> yarn.scheduler.minimum-allocation-vcores. These values can be set to 0 in 
> yarn-site.xml (so a zero capacity NM is possible). However, the validateConf 
> methods in CapacityScheduler and FifoScheduler do not allow for 0 values for 
> these properties (The FairScheduler one does allow for 0). This behaviour 
> should be consistent or at-least be override able.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064880#comment-15064880
 ] 

Hadoop QA commented on YARN-4138:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 249, now 249). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 47s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 42s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 157m 32s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12778547/YARN-4138.3

[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064874#comment-15064874
 ] 

Wangda Tan commented on YARN-4304:
--

Hi [~sunilg],

1) Changes for the max-available-to-a-queue may be separated to a separated 
patch. The major concern is 
- performance: For every allocated container, we need to iterate all labels to 
get a total resources.
- I think the longer term fix should be, add a by-partition info to queue 
metrics, including max/guaranteed/available/used, etc. I can help to review 
proposal/patches.

2) There're several methods are using:
{code}
  public synchronized Resource getAMResourceLimitPerPartition(
  String nodePartition)
{code} 

I think after we have YARN-4418, we don't need to calculate 
AMResourceLimitPerPartition everytime. So I will suggest to split the method to 
calculate-and-get and read-only methods. How about call them 
calculateAndGetAMResourceLimitPerPartition and getAMResourceLimitPerPartition? 
{{getPendingAppDiagnosticMessage}}/REST-API will use read-only interface.


Agree? 

3) Could you upload screenshots/REST api responses? 

To your queston:
bq. I am giving AM Resource limit for user AM limit, This is not really 
correct. But to get this, we need to have some round about way. I gave this 
implementation based on below comment. Is this what you also expect?
Is it possible to use 
{{org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerLeafQueueInfo#users}}
 to get AM Resource Limit instead? Is there any concern of you for this 
approach? I think using same queue's limit for queue and user may not be 
correct to me.

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, 
> 0005-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4483) HDP 2.2.9 expands the scope of AMBARI-11358.

2015-12-18 Thread David Tucker (JIRA)
David Tucker created YARN-4483:
--

 Summary: HDP 2.2.9 expands the scope of AMBARI-11358.
 Key: YARN-4483
 URL: https://issues.apache.org/jira/browse/YARN-4483
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
 Environment: HDP 2.2.9.0-3393, Ambari 1.7.0
Reporter: David Tucker


YARN will not start until 
yarn.scheduler.capacity.root.accessible-node-labels.default.capacity and 
yarn.scheduler.capacity.root.accessible-node-labels.default.maximum-capacity 
are changed from their default value (-1).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4164) Retrospect update ApplicationPriority API return type

2015-12-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064854#comment-15064854
 ] 

Jian He commented on YARN-4164:
---

lgtm, +1

> Retrospect update ApplicationPriority API return type
> -
>
> Key: YARN-4164
> URL: https://issues.apache.org/jira/browse/YARN-4164
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4164.patch, 0002-YARN-4164.patch, 
> 0003-YARN-4164.patch, 0004-YARN-4164.patch
>
>
> Currently {{ApplicationClientProtocol#updateApplicationPriority()}} API 
> returns empty UpdateApplicationPriorityResponse response.
> But RM update priority to the cluster.max-priority if the given priority is 
> greater than cluster.max-priority. In this scenarios, need to intimate back 
> to client that updated  priority rather just keeping quite where client 
> assumes that given priority itself is taken.
> During application submission also has same scenario can happen, but I feel 
> when 
> explicitly invoke via ApplicationClientProtocol#updateApplicationPriority(), 
> response should have updated priority in response. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-18 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064816#comment-15064816
 ] 

Xuan Gong commented on YARN-4234:
-

upload a patch to fix the checkstyle issue

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, 
> YARN-4234.2015-12-18.1.patch, YARN-4234.2015-12-18.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4473) Add version information for the application and the application attempts

2015-12-18 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola reassigned YARN-4473:
--

Assignee: Giovanni Matteo Fumarola  (was: Marco Rabozzi)

> Add version information for the application and the application attempts
> 
>
> Key: YARN-4473
> URL: https://issues.apache.org/jira/browse/YARN-4473
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>
> In order to allow to upgrade an application master across different attempts, 
> we need to keep track of different attempts versions and provide a mean to 
> temporarily store the upgrade information until the upgrade completes.
> Concretely we would add:
> - A version identifier for each attempt
> - A temporary upgrade context for each application



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-18 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4234:

Attachment: YARN-4234.2015-12-18.1.patch

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, 
> YARN-4234.2015-12-18.1.patch, YARN-4234.2015-12-18.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-18 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4234:

Attachment: YARN-4234.2015-12-18.patch

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, 
> YARN-4234.2015-12-18.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-18 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4234:

Attachment: YARN-4234.2015-11-18.patch

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, 
> YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, 
> YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064812#comment-15064812
 ] 

Wangda Tan commented on YARN-4195:
--

Hi [~curino],

Thanks for working on this, I've took a look at your description and patch.

Could you provide an example about GPU/PUBLICIP queue configuration and 
resource request *when the proposal completed*? I may not understand it 
competely, I guess it may look like:
- Since the system still uses partition to track resources, there're still 4 
partitions: GPU_PUBLICIP, GPU_NOT-PUBLICIP, NOT-GPU_PUBLICIP, 
NOT-GPU_NOT-PUBLICIP. Admin still needs to configure capacity of queues for the 
4 partitions.
- Node label expression of Resource Request will be updated to partition, let's 
say an expression = GPU, it becomes two resource requests internally: 
GPU_PUBLICIP and PUBLICIP.

> Support of node-labels in the ReservationSystem "Plan"
> --
>
> Key: YARN-4195
> URL: https://issues.apache.org/jira/browse/YARN-4195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4195.patch
>
>
> As part of YARN-4193 we need to enhance the InMemoryPlan (and related 
> classes) to track the per-label available resources, as well as the per-label
> reservation-allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-12-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064794#comment-15064794
 ] 

Jian He commented on YARN-3480:
---

thanks for updating,
- regarding this logic, it is possible that a particular attempt is not 
persisted in the store because of some connection failures. so the 
{{app.nextAttemptId - app.firstAttemptIdInStateStore}} does not necessarily 
indicate the number of attempts.
{code}
while (app.nextAttemptId - app.firstAttemptIdInStateStore
> 
app.maxAppAttempts) {
{code}
- LevelDBRMStateStore#removeApplicationAttemptInternal does not need to use 
batch operation, as it only has one operation
- Could you also add a test case in RMStateStoreTestBase#testRMAppStateStore 
that the loading part also works correctly? i.e. loading an app with partial 
attempts works correctly.

> Recovery may get very slow with lots of services with lots of app-attempts
> --
>
> Key: YARN-3480
> URL: https://issues.apache.org/jira/browse/YARN-3480
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3480.01.patch, YARN-3480.02.patch, 
> YARN-3480.03.patch, YARN-3480.04.patch, YARN-3480.05.patch, 
> YARN-3480.06.patch, YARN-3480.07.patch, YARN-3480.08.patch, 
> YARN-3480.09.patch, YARN-3480.10.patch
>
>
> When RM HA is enabled and running containers are kept across attempts, apps 
> are more likely to finish successfully with more retries(attempts), so it 
> will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
> it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
> RM recover process much slower. It might be better to set max attempts to be 
> stored in RMStateStore.
> BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064783#comment-15064783
 ] 

Wangda Tan commented on YARN-4290:
--

Thanks [~Naganarasimha], 

[~sunilg] could you check failed tests and I will commit in a few days.

> "yarn nodes -list" should print all nodes reports information
> -
>
> Key: YARN-4290
> URL: https://issues.apache.org/jira/browse/YARN-4290
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0002-YARN-4290.patch
>
>
> Currently, "yarn nodes -list" command only shows 
> - "Node-Id", 
> - "Node-State", 
> - "Node-Http-Address",
> - "Number-of-Running-Containers"
> I think we need to show more information such as used resource, just like 
> "yarn nodes -status" command.
> Maybe we can add a parameter to -list, such as "-show-details" to enable 
> printing all detailed information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064774#comment-15064774
 ] 

Hadoop QA commented on YARN-4290:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
8s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
8s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 19s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 31s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
30s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 143m 23s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.8.0_66 Timed out junit tests | 
org.apache.hadoop.yarn.client.cli.TestYarnCLI |
|   | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestNMClient |
| JDK v1.7.0_91 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.7.0_91 Timed out junit tests | 
org.apache.hadoop.yarn.client.cli.TestYarnCLI |
|   | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.ha

[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager

2015-12-18 Thread Daniel Zhi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064737#comment-15064737
 ] 

Daniel Zhi commented on YARN-914:
-

Thanks. Always commit to trunk first makes lots of sense to me. We would need 
to port the code to trunk and likely build AMI image with it so to leverage our 
internal verification tests system.

Our implementation is much in sync with the architecture and idea in the JIRA 
design document. On the other hand, there are additional details and component 
level designs that the JIRA design document not necessarily discuss or touch. 
These details naturally surfaced up during the development iterations and the 
corresponding design became matured and stabilized. One example is the 
DecommissioningNodeWatcher, which embedded in ResourceTrackingService, tracks 
DECOMMISSIONING nodes status automatically and asynchronously after 
client/admin made the graceful decommission request. Another example is per 
node decommission timeout support, which is useful to decommission node that 
will be terminated soon.

> (Umbrella) Support graceful decommission of nodemanager
> ---
>
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: graceful
>Affects Versions: 2.0.4-alpha
>Reporter: Luke Lu
>Assignee: Junping Du
> Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
> Gracefully Decommission of NodeManager (v2).pdf, 
> GracefullyDecommissionofNodeManagerv3.pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064680#comment-15064680
 ] 

Hadoop QA commented on YARN-2934:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 21s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 38s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 34s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 0s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 21s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 21s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 279, now 279). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 51s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 1s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 30s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {colo

[jira] [Updated] (YARN-4482) Default values of several config parameters are missing

2015-12-18 Thread Tianyin Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianyin Xu updated YARN-4482:
-
Description: 
In {{yarn-default.xml}}, the default values of the following parameters are 
commented out, 
{{yarn.client.failover-max-attempts}}
{{yarn.client.failover-sleep-base-ms}}
{{yarn.client.failover-sleep-max-ms}}

Are these default values changed (I suppose so)? If so, we should update the 
new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" 
values...

(yarn-default.xml)
https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Thanks!


  was:
In {{yarn-default.xml}}, the default values of the following parameters are 
commented out, 
{{yarn.client.failover-max-attempts}}
{{yarn.client.failover-sleep-base-ms}}
{{yarn.client.failover-sleep-max-ms}}

Are these default values changed (I suppose so)? If so, we should update the 
new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" 
values...

Thanks!



> Default values of several config parameters are missing 
> 
>
> Key: YARN-4482
> URL: https://issues.apache.org/jira/browse/YARN-4482
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.6.2, 2.6.3
>Reporter: Tianyin Xu
>Priority: Minor
>
> In {{yarn-default.xml}}, the default values of the following parameters are 
> commented out, 
> {{yarn.client.failover-max-attempts}}
> {{yarn.client.failover-sleep-base-ms}}
> {{yarn.client.failover-sleep-max-ms}}
> Are these default values changed (I suppose so)? If so, we should update the 
> new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" 
> values...
> (yarn-default.xml)
> https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
> https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4482) Default values of several config parameters are missing

2015-12-18 Thread Tianyin Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianyin Xu updated YARN-4482:
-
Description: 
In {{yarn-default.xml}}, the default values of the following parameters are 
commented out, 
{{yarn.client.failover-max-attempts}}
{{yarn.client.failover-sleep-base-ms}}
{{yarn.client.failover-sleep-max-ms}}

Are these default values changed (I suppose so)? If so, we should update the 
new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" 
values...

(yarn-default.xml)
https://hadoop.apache.org/docs/r2.6.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Thanks!


  was:
In {{yarn-default.xml}}, the default values of the following parameters are 
commented out, 
{{yarn.client.failover-max-attempts}}
{{yarn.client.failover-sleep-base-ms}}
{{yarn.client.failover-sleep-max-ms}}

Are these default values changed (I suppose so)? If so, we should update the 
new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" 
values...

(yarn-default.xml)
https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Thanks!



> Default values of several config parameters are missing 
> 
>
> Key: YARN-4482
> URL: https://issues.apache.org/jira/browse/YARN-4482
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.6.2, 2.6.3
>Reporter: Tianyin Xu
>Priority: Minor
>
> In {{yarn-default.xml}}, the default values of the following parameters are 
> commented out, 
> {{yarn.client.failover-max-attempts}}
> {{yarn.client.failover-sleep-base-ms}}
> {{yarn.client.failover-sleep-max-ms}}
> Are these default values changed (I suppose so)? If so, we should update the 
> new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" 
> values...
> (yarn-default.xml)
> https://hadoop.apache.org/docs/r2.6.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
> https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4482) Default values of several config parameters are missing

2015-12-18 Thread Tianyin Xu (JIRA)
Tianyin Xu created YARN-4482:


 Summary: Default values of several config parameters are missing 
 Key: YARN-4482
 URL: https://issues.apache.org/jira/browse/YARN-4482
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 2.6.2, 2.6.3
Reporter: Tianyin Xu
Priority: Minor


In {{yarn-default.xml}}, the default values of the following parameters are 
commented out, 
{{yarn.client.failover-max-attempts}}
{{yarn.client.failover-sleep-base-ms}}
{{yarn.client.failover-sleep-max-ms}}

Are these default values changed (I suppose so)? If so, we should update the 
new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" 
values...

Thanks!




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable

2015-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064569#comment-15064569
 ] 

Hadoop QA commented on YARN-4428:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
57s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 11s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 6m 21s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server-jdk1.7.0_91 
with JDK v1.7.0_91 generated 1 new issues (was 7, now 7). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 22s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server (total was 163, now 163). 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 28s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 18s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 149m 36s

[jira] [Commented] (YARN-4454) NM to nodelabel mapping going wrong after RM restart

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064513#comment-15064513
 ] 

Wangda Tan commented on YARN-4454:
--

[~bibinchundatt], thanks for reporting and looking at the issue. 

The root cause of this issue is, when the RM restart first time, it will 
generate a mirror file which has a complete node->label mappings:
{code}
node1:port=x 
node1=y
{code}

And when we restart the RM again, we will load the mapping, but node1:port 
loaded first, so node1=y will overwrite the previous one.

In: 
{{org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager#checkReplaceLabelsOnNode}}

Instead of directly iterate the map:
{code}
for (Entry> entry : replaceLabelsToNode.entrySet()) {
  NodeId nodeId = entry.getKey();
{code}
We should sort the map so that the node without port should be handled first 
before node with port specified to avoid overwriting happens.

Is it make sense to you?

> NM to nodelabel mapping going wrong after RM restart
> 
>
> Key: YARN-4454
> URL: https://issues.apache.org/jira/browse/YARN-4454
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: test.patch
>
>
> *Nodelabel mapping with NodeManager  is going wrong if combination of 
> hostname and then NodeId is used to update nodelabel mapping*
> *Steps to reproduce*
> 1.Create cluster with 2 NM
> 2.Add label X,Y to cluster
> 3.replace  Label of node  1 using ,x
> 4.replace label for node 1 by ,y
> 5.Again replace label of node 1 by ,x
> Check cluster label mapping HOSTNAME1 will be mapped with X 
> Now restart RM 2 times NODE LABEL mapping of HOSTNAME1:PORT changes to Y
> {noformat}
> 2015-12-14 17:17:54,901 INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
> [,]
> 2015-12-14 17:17:54,905 INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: REPLACE labels on 
> nodes:
> 2015-12-14 17:17:54,906 INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:   
> NM=host-10-19-92-188:64318, labels=[ResourcePool_1]
> 2015-12-14 17:17:54,906 INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:   
> NM=host-10-19-92-188:0, labels=[ResourcePool_null]
> 2015-12-14 17:17:54,906 INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:   
> NM=host-10-19-92-187:64318, labels=[ResourcePool_null]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-18 Thread MENG DING (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-4138:

Attachment: YARN-4138.3.patch

Attach latest patch that addresses [~jianhe] and [~sandflee]'s comments.

I think the issue brought up by [~jianhe] is about race conditions between a 
normal resource decrease and a resource rollback. The proposed fix is to guard 
resource rollback with the same sequence of locks as the normal resource 
decrease, i.e., lock on application first, then on scheduler.

So with the proposed fix, we can walk through the original example:
1. AM asks increase 2G -> 8G, and is approved by RM
2. AM does not increase the container, AM asks to decrease to 1G, and in the 
same time, increase expiration logic is triggered:
* If the normal decrease is processed first: RM decrease 8G -> 1G (allocated 
and lastConfirmed are now set to 1G), and then rollback is processed: RM 
rollback 1G -> 1G (skip)
* If rollback is processed first: RM rollback 8G -> 2G (allocated and 
lastConfirmed are now set to 2G), and then normal decrease is processed: RM 
decrease 2G -> 1G


> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails for V2 scenarios

2015-12-18 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064457#comment-15064457
 ] 

Sangjin Lee commented on YARN-4350:
---

Thanks for digging deep into this [~Naganarasimha]. There might be a little 
bigger issue with the non-test code then, and it might be not limited to our 
branch.

I'm OK with fixing the issues that are specific to our branch here, and handle 
the bigger issue in a trunk JIRA (YARN-4385?). Thoughts?

> TestDistributedShell fails for V2 scenarios
> ---
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch, 
> YARN-4350-feature-YARN-2928.002.patch, YARN-4350-feature-YARN-2928.003.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4198) CapacityScheduler locking / synchronization improvements

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064443#comment-15064443
 ] 

Wangda Tan commented on YARN-4198:
--

[~atumanov],

Could you rebase your patch against latest trunk?

Thanks,

> CapacityScheduler locking / synchronization improvements
> 
>
> Key: YARN-4198
> URL: https://issues.apache.org/jira/browse/YARN-4198
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Alexey Tumanov
> Attachments: YARN-4198-v1.patch
>
>
> In the context of YARN-4193 (which stresses the RM/CS performance) we found 
> several performance problems with  in the locking/synchronization of the 
> CapacityScheduler, as well as inconsistencies that do not normally surface 
> (incorrect locking-order of queues protected by CS locks etc). This JIRA 
> proposes several refactoring that improve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-12-18 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064438#comment-15064438
 ] 

Naganarasimha G R commented on YARN-4290:
-

+1 patch LGTM !

> "yarn nodes -list" should print all nodes reports information
> -
>
> Key: YARN-4290
> URL: https://issues.apache.org/jira/browse/YARN-4290
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0002-YARN-4290.patch
>
>
> Currently, "yarn nodes -list" command only shows 
> - "Node-Id", 
> - "Node-State", 
> - "Node-Http-Address",
> - "Number-of-Running-Containers"
> I think we need to show more information such as used resource, just like 
> "yarn nodes -status" command.
> Maybe we can add a parameter to -list, such as "-show-details" to enable 
> printing all detailed information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-12-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064413#comment-15064413
 ] 

Wangda Tan commented on YARN-4290:
--

[~sunilg], 

Patch looks good to me! 

> "yarn nodes -list" should print all nodes reports information
> -
>
> Key: YARN-4290
> URL: https://issues.apache.org/jira/browse/YARN-4290
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0002-YARN-4290.patch
>
>
> Currently, "yarn nodes -list" command only shows 
> - "Node-Id", 
> - "Node-State", 
> - "Node-Http-Address",
> - "Number-of-Running-Containers"
> I think we need to show more information such as used resource, just like 
> "yarn nodes -status" command.
> Maybe we can add a parameter to -list, such as "-show-details" to enable 
> printing all detailed information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2934) Improve handling of container's stderr

2015-12-18 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2934:

Attachment: YARN-2934.v2.003.patch

[~jira.shegalov],
Incorporating changes for ??Right now we are blindly grabbing file 0. It would 
however make much more sense to grab the most recent (highest mtime) non-empty 
file??, 
Please review & Earlier Test cases are not related to the modifications in this 
patch and its locally passing 

> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch, 
> YARN-2934.v1.006.patch, YARN-2934.v1.007.patch, YARN-2934.v1.008.patch, 
> YARN-2934.v2.001.patch, YARN-2934.v2.002.patch, YARN-2934.v2.003.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-18 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064364#comment-15064364
 ] 

Junping Du commented on YARN-4234:
--

Thanks [~xgong] for updating the patch. The latest patch LGTM in overall. 
However, it seems to be many checkstyle issues reported by Jenkins are related 
to this patch. May be we should fix them before we get patch in?

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.2015-12-09.patch, 
> YARN-4234.2015-12-17.1.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster

2015-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064322#comment-15064322
 ] 

Hadoop QA commented on YARN-4389:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
9s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 0s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 42s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
59s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 52s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 16s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s 
{color} | {color:red} Patch generated 5 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 161, now 166). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 0s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 0s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 21s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 42s {color} 
| {color:

[jira] [Commented] (YARN-4003) ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is not consistent

2015-12-18 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064234#comment-15064234
 ] 

Sunil G commented on YARN-4003:
---

Hi [~curino]
Thank you for the clarification. Yes, there is no cleaner solution. But I 
think, we cud calculate the AMUsed capacity of all ReservationQueue under one 
parent (PlanQueue)

So could we have an api like below in {{PlanQueue}} and use along with 
{{getAMResourceLimit}} to ensure that we do not cross the limit of parent 
queue. I might be wrong, pls correct me if so.
{code}
  public synchronized Resource sumOfChildAMUsedCapacities() {
Resource ret = Resources.createResource(0);
for (CSQueue l : childQueues) {
  Resources.addTo(ret, l.getQueueResourceUsage().getAMUsed());
}
return ret;
  }
{code}

> ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is 
> not consistent
> 
>
> Key: YARN-4003
> URL: https://issues.apache.org/jira/browse/YARN-4003
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4003.patch
>
>
> The inherited behavior from LeafQueue (limit AM % based on capacity) is not a 
> good fit for ReservationQueue (that have highly dynamic capacity). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4290) "yarn nodes -list" should print all nodes reports information

2015-12-18 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4290:
--
Attachment: 0002-YARN-4290.patch

Uploading a new patch as related tickets are committed

Sample o/p
{noformat}
root@sunil-Inspiron-3543:/opt/hadoop/trunk/hadoop-3.0.0-SNAPSHOT/bin# ./yarn 
node -list -showDetails
15/12/18 21:24:21 INFO client.RMProxy: Connecting to ResourceManager at 
/127.0.0.1:25001
Total Nodes:1
 Node-Id Node-State Node-Http-Address   
Number-of-Running-Containers
 localhost:25006RUNNING   localhost:25008   
   0
 Detailed Node Information : 
Configured Resources : 
Allocated Resources : 
Resource Utilization by Node : PMem:4884 MB, VMem:4884 MB, 
VCores:2.5824726
Resource Utilization by Containers : PMem:0 MB, VMem:0 MB, VCores:0.0
Node-Labels :
{noformat}
[~leftnoteasy] and [~Naganarasimha Garla], please help to share your thoughts.

> "yarn nodes -list" should print all nodes reports information
> -
>
> Key: YARN-4290
> URL: https://issues.apache.org/jira/browse/YARN-4290
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0002-YARN-4290.patch
>
>
> Currently, "yarn nodes -list" command only shows 
> - "Node-Id", 
> - "Node-State", 
> - "Node-Http-Address",
> - "Number-of-Running-Containers"
> I think we need to show more information such as used resource, just like 
> "yarn nodes -status" command.
> Maybe we can add a parameter to -list, such as "-show-details" to enable 
> printing all detailed information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable

2015-12-18 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4428:
---
Attachment: YARN-4428.2.2.patch

.2.2 addressed the whitespace issue

> Redirect RM page to AHS page when AHS turned on and RM page is not avaialable
> -
>
> Key: YARN-4428
> URL: https://issues.apache.org/jira/browse/YARN-4428
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4428.1.2.patch, YARN-4428.1.patch, 
> YARN-4428.2.2.patch, YARN-4428.2.patch
>
>
> When AHS is turned on, if we can't view application in RM page, RM page 
> should redirect us to AHS page. For example, when you go to 
> cluster/app/application_1, if RM no longer remember the application, we will 
> simply get "Failed to read the application application_1", but it will be 
> good for RM ui to smartly try to redirect to AHS ui 
> /applicationhistory/app/application_1 to see if it's there. The redirect 
> usage already exist for logs in nodemanager UI.
> Also, when AHS is enabled, WebAppProxyServlet should redirect to AHS page on 
> fall back of RM not remembering the app. YARN-3975 tried to do this only when 
> original tracking url is not set. But there are many cases, such as when app 
> failed at launch, original tracking url will be set to point to RM page, so 
> redirect to AHS page won't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails for V2 scenarios

2015-12-18 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064170#comment-15064170
 ] 

Naganarasimha G R commented on YARN-4350:
-

Hi [~varun_saxena], As discussed offline, this seems to be a problem with the 
Distributed shell AM. {{TestDistributedShell.checkTimelineV1}} checks whether 
only 2 (requested) containers are being launched. But in reality more than 2 
are getting launched. 
possible reasons for it are :
* when RM has assigned additional containers and the Distributed shell AM is 
launching it. I had observed similar behavior of over assigning in MR also but 
MR AM takes care returning the extra apps assigned by the RM. Similar approach 
should exist in Distributed shell AM too.
* RM has killed for some reason and extra Container is reached

Not sure which of these cases is causing the assigning of additional 
containers, to analyze this we require more RM and AM logs which test case logs 
are not providing and further its not related to the fixes of this issue. IMO 
its also possible to come in trunk too. So i think we can raise another jira to 
track this !

> TestDistributedShell fails for V2 scenarios
> ---
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch, 
> YARN-4350-feature-YARN-2928.002.patch, YARN-4350-feature-YARN-2928.003.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4480) Clean up some inappropriate imports

2015-12-18 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064145#comment-15064145
 ] 

Daniel Templeton commented on YARN-4480:


Thanks, [~drankye].  Loogs good in general.  Your indentation is off in the 
{{Strings}} line:

{code}
-  !Strings.isEmpty(rr.getResourceName()) ? rr
+  !Strings.isNullOrEmpty(rr.getResourceName()) ? rr
{code}

The original line was correctly indented.

> Clean up some inappropriate imports
> ---
>
> Key: YARN-4480
> URL: https://issues.apache.org/jira/browse/YARN-4480
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Kai Zheng
> Attachments: YARN-4480-v1.patch
>
>
> It was noticed there are some unnecessary dependency into Directory classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-18 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064134#comment-15064134
 ] 

MENG DING commented on YARN-4138:
-

Hi, [~sandflee]

1. Yes, this is the expected behavior. If you take a look at the discussion 
from the beginning of this thread, we have decided that if multiple increase 
tokens are granted by RM in a row for a container before AM uses any of the 
token, the last token will take effect, and any previous tokens will be 
effectively cancelled. If RM sees a difference between its own number and the 
number reported from NM, it will consider it as an *unconfirmed* state, and 
won't set the lastConfirmed value. Besides, if AM issues multiple increase 
requests, but doesn't use the last token, it is considered an user error.

2. If I understand your question correctly, then you are right that you should 
pass container B. In fact, the container B you are talking about is technically 
still container A, as uniquely identified by the container ID. When resource 
increase request of container A is granted by RM, RM still sends back container 
A, but with updated resource and token. As an Application Master developer, you 
are expected to track all live containers in AM, and in the 
onContainersResourceChanged(List changedContainers) callback 
function, you need to replace the original container A with the updated 
container A.

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4467) Shell.checkIsBashSupported swallowed an interrupted exception

2015-12-18 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated YARN-4467:
--
Description: 
Edit: move this JIRA from HADOOP to YARN, as Shell.checkIsBashSupported() is 
used, and only used in YARN.

Shell.checkIsBashSupported() creates a bash shell command to verify if the 
system supports bash. However, its error message is misleading, and the logic 
should be updated.

If the shell command throws an IOException, it does not imply the bash did not 
run successfully. If the shell command process was interrupted, its internal 
logic throws an InterruptedIOException, which is a subclass of IOException.
{code:title=Shell.checkIsBashSupported|borderStyle=solid}
ShellCommandExecutor shexec;
boolean supported = true;
try {
  String[] args = {"bash", "-c", "echo 1000"};
  shexec = new ShellCommandExecutor(args);
  shexec.execute();
} catch (IOException ioe) {
  LOG.warn("Bash is not supported by the OS", ioe);
  supported = false;
}
{code}
An example of it appeared in a recent jenkins job
https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/

The test logic in TestRPCWaitForProxy.testInterruptedWaitForProxy starts a 
thread, wait it for 1 second, and interrupt the thread, expecting the thread to 
terminate. However, the method Shell.checkIsBashSupported swallowed the 
interrupt, and therefore failed.
{noformat}
2015-12-16 21:31:53,797 WARN  util.Shell (Shell.java:checkIsBashSupported(718)) 
- Bash is not supported by the OS
java.io.InterruptedIOException: java.lang.InterruptedException
at org.apache.hadoop.util.Shell.runCommand(Shell.java:930)
at org.apache.hadoop.util.Shell.run(Shell.java:838)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
at org.apache.hadoop.util.Shell.checkIsBashSupported(Shell.java:716)
at org.apache.hadoop.util.Shell.(Shell.java:705)
at org.apache.hadoop.util.StringUtils.(StringUtils.java:79)
at 
org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:639)
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:803)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:773)
at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:646)
at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:397)
at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:350)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:330)
at 
org.apache.hadoop.ipc.TestRPCWaitForProxy$RpcThread.run(TestRPCWaitForProxy.java:115)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at java.lang.UNIXProcess.waitFor(UNIXProcess.java:264)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:920)
... 15 more
{noformat}

The original design is not desirable, as it swallowed a potential interrupt, 
causing TestRPCWaitForProxy.testInterruptedWaitForProxy to fail. Unfortunately, 
Java does not allow this static method to throw exception. We should removed 
the static member variable, so that the method can throw the interrupt 
exception. The node manager should call the static method, instead of using the 
static member variable.

This fix has an associated benefit: the tests could run faster, because it will 
no longer need to spawn a bash process when it uses a Shell static method 
variable (which happens quite often for checking what operating system Hadoop 
is running on)

  was:
Shell.checkIsBashSupported() creates a bash shell command to verify if the 
system supports bash. However, its error message is misleading, and the logic 
should be updated.

If the shell command throws an IOException, it does not imply the bash did not 
run successfully. If the shell command process was interrupted, its internal 
logic throws an InterruptedIOException, which is a subclass of IOException.
{code:title=Shell.checkIsBashSupported|borderStyle=solid}
ShellCommandExecutor shexec;
boolean supported = true;
try {
  String[] args = {"bash", "-c", "echo 1000"};
  shexec = new ShellCommandExecutor(args);
  shexec.execute();
} catch (IOException ioe) {
  LOG.warn("Bash is not supported by the OS", ioe);
  supported = false;
}
{code}
An example of it appeared in a recent jenkins job
https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apach

[jira] [Updated] (YARN-4467) Shell.checkIsBashSupported swallowed an interrupted exception

2015-12-18 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated YARN-4467:
--
Description: 
Shell.checkIsBashSupported() creates a bash shell command to verify if the 
system supports bash. However, its error message is misleading, and the logic 
should be updated.

If the shell command throws an IOException, it does not imply the bash did not 
run successfully. If the shell command process was interrupted, its internal 
logic throws an InterruptedIOException, which is a subclass of IOException.
{code:title=Shell.checkIsBashSupported|borderStyle=solid}
ShellCommandExecutor shexec;
boolean supported = true;
try {
  String[] args = {"bash", "-c", "echo 1000"};
  shexec = new ShellCommandExecutor(args);
  shexec.execute();
} catch (IOException ioe) {
  LOG.warn("Bash is not supported by the OS", ioe);
  supported = false;
}
{code}
An example of it appeared in a recent jenkins job
https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/

The test logic in TestRPCWaitForProxy.testInterruptedWaitForProxy starts a 
thread, wait it for 1 second, and interrupt the thread, expecting the thread to 
terminate. However, the method Shell.checkIsBashSupported swallowed the 
interrupt, and therefore failed.
{noformat}
2015-12-16 21:31:53,797 WARN  util.Shell (Shell.java:checkIsBashSupported(718)) 
- Bash is not supported by the OS
java.io.InterruptedIOException: java.lang.InterruptedException
at org.apache.hadoop.util.Shell.runCommand(Shell.java:930)
at org.apache.hadoop.util.Shell.run(Shell.java:838)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
at org.apache.hadoop.util.Shell.checkIsBashSupported(Shell.java:716)
at org.apache.hadoop.util.Shell.(Shell.java:705)
at org.apache.hadoop.util.StringUtils.(StringUtils.java:79)
at 
org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:639)
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:803)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:773)
at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:646)
at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:397)
at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:350)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:330)
at 
org.apache.hadoop.ipc.TestRPCWaitForProxy$RpcThread.run(TestRPCWaitForProxy.java:115)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at java.lang.UNIXProcess.waitFor(UNIXProcess.java:264)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:920)
... 15 more
{noformat}

The original design is not desirable, as it swallowed a potential interrupt, 
causing TestRPCWaitForProxy.testInterruptedWaitForProxy to fail. Unfortunately, 
Java does not allow this static method to throw exception. We should removed 
the static member variable, so that the method can throw the interrupt 
exception. The node manager should call the static method, instead of using the 
static member variable.

This fix has an associated benefit: the tests could run faster, because it will 
no longer need to spawn a bash process when it uses a Shell static method 
variable (which happens quite often for checking what operating system Hadoop 
is running on)

  was:
Shell.checkIsBashSupported() creates a bash shell command to verify if the 
system supports bash. However, its error message is misleading, and the logic 
should be updated.

If the shell command throws an IOException, it does not imply the bash did not 
run successfully. If the shell command process was interrupted, its internal 
logic throws an InterruptedIOException, which is a subclass of IOException.
{code:title=Shell.checkIsBashSupported|borderStyle=solid}
ShellCommandExecutor shexec;
boolean supported = true;
try {
  String[] args = {"bash", "-c", "echo 1000"};
  shexec = new ShellCommandExecutor(args);
  shexec.execute();
} catch (IOException ioe) {
  LOG.warn("Bash is not supported by the OS", ioe);
  supported = false;
}
{code}
An example of it appeared in a recent jenkins job
https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/

The test logic in TestRPCWaitForProxy.testInt

[jira] [Updated] (YARN-4467) Shell.checkIsBashSupported swallowed an interrupted exception

2015-12-18 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated YARN-4467:
--
Description: 
Shell.checkIsBashSupported() creates a bash shell command to verify if the 
system supports bash. However, its error message is misleading, and the logic 
should be updated.

If the shell command throws an IOException, it does not imply the bash did not 
run successfully. If the shell command process was interrupted, its internal 
logic throws an InterruptedIOException, which is a subclass of IOException.
{code:title=Shell.checkIsBashSupported|borderStyle=solid}
ShellCommandExecutor shexec;
boolean supported = true;
try {
  String[] args = {"bash", "-c", "echo 1000"};
  shexec = new ShellCommandExecutor(args);
  shexec.execute();
} catch (IOException ioe) {
  LOG.warn("Bash is not supported by the OS", ioe);
  supported = false;
}
{code}
An example of it appeared in a recent jenkins job
https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/

The test logic in TestRPCWaitForProxy.testInterruptedWaitForProxy starts a 
thread, wait it for 1 second, and interrupt the thread, expecting the thread to 
terminate. However, the method Shell.checkIsBashSupported swallowed the 
interrupt, and therefore failed.
{noformat}
2015-12-16 21:31:53,797 WARN  util.Shell (Shell.java:checkIsBashSupported(718)) 
- Bash is not supported by the OS
java.io.InterruptedIOException: java.lang.InterruptedException
at org.apache.hadoop.util.Shell.runCommand(Shell.java:930)
at org.apache.hadoop.util.Shell.run(Shell.java:838)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
at org.apache.hadoop.util.Shell.checkIsBashSupported(Shell.java:716)
at org.apache.hadoop.util.Shell.(Shell.java:705)
at org.apache.hadoop.util.StringUtils.(StringUtils.java:79)
at 
org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:639)
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:803)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:773)
at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:646)
at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:397)
at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:350)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:330)
at 
org.apache.hadoop.ipc.TestRPCWaitForProxy$RpcThread.run(TestRPCWaitForProxy.java:115)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at java.lang.UNIXProcess.waitFor(UNIXProcess.java:264)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:920)
... 15 more
{noformat}

The original design is not desirable, as it swallowed a potential interrupt, 
causing TestRPCWaitForProxy.testInterruptedWaitForProxy to fail. Unfortunately, 
Java does not allow this static method to throw exception. We should removed 
the static member variable, so that the method can throw the interrupt 
exception.

  was:
Shell.checkIsBashSupported() creates a bash shell command to verify if the 
system supports bash. However, its error message is misleading, and the logic 
should be updated.

If the shell command throws an IOException, it does not imply the bash did not 
run successfully. If the shell command process was interrupted, its internal 
logic throws an InterruptedIOException, which is a subclass of IOException.
{code:title=Shell.checkIsBashSupported|borderStyle=solid}
ShellCommandExecutor shexec;
boolean supported = true;
try {
  String[] args = {"bash", "-c", "echo 1000"};
  shexec = new ShellCommandExecutor(args);
  shexec.execute();
} catch (IOException ioe) {
  LOG.warn("Bash is not supported by the OS", ioe);
  supported = false;
}
{code}
An example of it appeared in a recent jenkins job
https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/

The test logic in TestRPCWaitForProxy.testInterruptedWaitForProxy starts a 
thread, wait it for 1 second, and interrupt the thread, expecting the thread to 
terminate. However, the method Shell.checkIsBashSupported swallowed the 
interrupt, and therefore failed.
{noformat}
2015-12-16 21:31:53,797 WARN  util.Shell (Shell.java:checkIsBashSupported(718)) 
- Bash is not supported by the OS
j

[jira] [Updated] (YARN-4467) Shell.checkIsBashSupported swallowed an interrupted exception

2015-12-18 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated YARN-4467:
--
Description: 
Shell.checkIsBashSupported() creates a bash shell command to verify if the 
system supports bash. However, its error message is misleading, and the logic 
should be updated.

If the shell command throws an IOException, it does not imply the bash did not 
run successfully. If the shell command process was interrupted, its internal 
logic throws an InterruptedIOException, which is a subclass of IOException.
{code:title=Shell.checkIsBashSupported|borderStyle=solid}
ShellCommandExecutor shexec;
boolean supported = true;
try {
  String[] args = {"bash", "-c", "echo 1000"};
  shexec = new ShellCommandExecutor(args);
  shexec.execute();
} catch (IOException ioe) {
  LOG.warn("Bash is not supported by the OS", ioe);
  supported = false;
}
{code}
An example of it appeared in a recent jenkins job
https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/

The test logic in TestRPCWaitForProxy.testInterruptedWaitForProxy starts a 
thread, wait it for 1 second, and interrupt the thread, expecting the thread to 
terminate. However, the method Shell.checkIsBashSupported swallowed the 
interrupt, and therefore failed.
{noformat}
2015-12-16 21:31:53,797 WARN  util.Shell (Shell.java:checkIsBashSupported(718)) 
- Bash is not supported by the OS
java.io.InterruptedIOException: java.lang.InterruptedException
at org.apache.hadoop.util.Shell.runCommand(Shell.java:930)
at org.apache.hadoop.util.Shell.run(Shell.java:838)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
at org.apache.hadoop.util.Shell.checkIsBashSupported(Shell.java:716)
at org.apache.hadoop.util.Shell.(Shell.java:705)
at org.apache.hadoop.util.StringUtils.(StringUtils.java:79)
at 
org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:639)
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:803)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:773)
at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:646)
at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:397)
at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:350)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:330)
at 
org.apache.hadoop.ipc.TestRPCWaitForProxy$RpcThread.run(TestRPCWaitForProxy.java:115)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at java.lang.UNIXProcess.waitFor(UNIXProcess.java:264)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:920)
... 15 more
{noformat}

The original design is not desirable, as it swallowed a potential interrupt, 
causing TestRPCWaitForProxy.testInterruptedWaitForProxy to fail. Unfortunately, 
Java does not allow this static method to throw exception. We should removed 
the static member variable, so that the method can throw the interrupt 
exception. The node manager should call the static method, instead of using the 
static member variable.

  was:
Shell.checkIsBashSupported() creates a bash shell command to verify if the 
system supports bash. However, its error message is misleading, and the logic 
should be updated.

If the shell command throws an IOException, it does not imply the bash did not 
run successfully. If the shell command process was interrupted, its internal 
logic throws an InterruptedIOException, which is a subclass of IOException.
{code:title=Shell.checkIsBashSupported|borderStyle=solid}
ShellCommandExecutor shexec;
boolean supported = true;
try {
  String[] args = {"bash", "-c", "echo 1000"};
  shexec = new ShellCommandExecutor(args);
  shexec.execute();
} catch (IOException ioe) {
  LOG.warn("Bash is not supported by the OS", ioe);
  supported = false;
}
{code}
An example of it appeared in a recent jenkins job
https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/

The test logic in TestRPCWaitForProxy.testInterruptedWaitForProxy starts a 
thread, wait it for 1 second, and interrupt the thread, expecting the thread to 
terminate. However, the method Shell.checkIsBashSupported swallowed the 
interrupt, and therefore failed.
{noformat}
2015-12-16 21:31:53,79

[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager

2015-12-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064054#comment-15064054
 ] 

Jason Lowe commented on YARN-914:
-

[~danzhi] the patch should be against trunk.  We always commit first against 
trunk and then backport to prior releases in reverse release order (e.g.: 
trunk->branch-2->branch-2.8->branch-2.7) so we avoid a situation where a 
feature or fix is in a release but disappears in a subsequently released 
version.  See the [How to 
Contribute|http://wiki.apache.org/hadoop/HowToContribute] page for more 
information including details on preparing and naming the patch, etc.

Is this implementation inline with the design document on this JIRA or is it 
using a different approach?

> (Umbrella) Support graceful decommission of nodemanager
> ---
>
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: graceful
>Affects Versions: 2.0.4-alpha
>Reporter: Luke Lu
>Assignee: Junping Du
> Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
> Gracefully Decommission of NodeManager (v2).pdf, 
> GracefullyDecommissionofNodeManagerv3.pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4454) NM to nodelabel mapping going wrong after RM restart

2015-12-18 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4454:
---
Attachment: test.patch

Attaching testcode to reproduce the same

> NM to nodelabel mapping going wrong after RM restart
> 
>
> Key: YARN-4454
> URL: https://issues.apache.org/jira/browse/YARN-4454
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: test.patch
>
>
> *Nodelabel mapping with NodeManager  is going wrong if combination of 
> hostname and then NodeId is used to update nodelabel mapping*
> *Steps to reproduce*
> 1.Create cluster with 2 NM
> 2.Add label X,Y to cluster
> 3.replace  Label of node  1 using ,x
> 4.replace label for node 1 by ,y
> 5.Again replace label of node 1 by ,x
> Check cluster label mapping HOSTNAME1 will be mapped with X 
> Now restart RM 2 times NODE LABEL mapping of HOSTNAME1:PORT changes to Y
> {noformat}
> 2015-12-14 17:17:54,901 INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
> [,]
> 2015-12-14 17:17:54,905 INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: REPLACE labels on 
> nodes:
> 2015-12-14 17:17:54,906 INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:   
> NM=host-10-19-92-188:64318, labels=[ResourcePool_1]
> 2015-12-14 17:17:54,906 INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:   
> NM=host-10-19-92-188:0, labels=[ResourcePool_null]
> 2015-12-14 17:17:54,906 INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:   
> NM=host-10-19-92-187:64318, labels=[ResourcePool_null]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster

2015-12-18 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4389:
--
Attachment: 0004-YARN-4389.patch

Patch has gone stale. Rebasing the same. [~djp], could you please help to check 
the same.

> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be app specific 
> rather than a setting for whole YARN cluster
> ---
>
> Key: YARN-4389
> URL: https://issues.apache.org/jira/browse/YARN-4389
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Reporter: Junping Du
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4389.patch, 0002-YARN-4389.patch, 
> 0003-YARN-4389.patch, 0004-YARN-4389.patch
>
>
> "yarn.am.blacklisting.enabled" and 
> "yarn.am.blacklisting.disable-failure-threshold" should be application 
> specific rather than a setting in cluster level, or we should't maintain 
> amBlacklistingEnabled and blacklistDisableThreshold in per rmApp level. We 
> should allow each am to override this config, i.e. via submissionContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-12-18 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064004#comment-15064004
 ] 

Sunil G commented on YARN-4304:
---

Test case failures are not related. Except from known test fails, others passed 
locally. [~leftnoteasy] could you please help to take a look on latest patch.
As mentioned above.
{noformat}
("Max Application Master Resources Per User:",
  resourceUsages.getAMResourceLimit().toString());
{noformat}
I am giving AM Resource limit for user AM limit, This is not really correct. 
But to get this, we need to have some round about way.  I gave this 
implementation based on below comment. Is this what you also expect?
bq. I suggest to remove it and we can use amResourceLimit of first user of 
queues to show on UI.

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, 
> 0005-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling

2015-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063959#comment-15063959
 ] 

Hadoop QA commented on YARN-4477:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
43s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 56s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 38s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 36s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 159m 46s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
 |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/a

  1   2   >